### Data clean up
There is a lot of unecessary data that comes with the Web of Science data downloads. Many of these are blank columns for data that we did not request so we need to clean up the dataframes before analysis

In [19]:
import pandas as pd
import pathlib as plb

In [37]:
def clean_up(file):
    cols = ['AU', 'TI', 'SO', 'PM','PD', 'PY', 'DI', 'DL']
    pubs = pd.read_csv(file, index_col=0, usecols=cols).reset_index()
    pubs = pubs.rename(columns={'AU': 'Author', 'TI':'Title', 'SO': 'Journal', 'DI':'DOI', 'DL':'DOI Link', 'PM':'PMID',
                                   'PD':'Pub date', 'PY':'Pub year'})
    pubs.fillna(0, inplace=True)

    pubs['PMID'] = pubs['PMID'].astype(int)
    pubs['Pub year'] = pubs['Pub year'].astype(int)
    return pubs
    
    

Reading in the data collected by other team members and combining it into two dataframes. One dataframe contains publications of studies with C. elegans, the other contains studies of other nematodes.

In [38]:
# assign directory
directory = plb.Path('/Users/Emily/Desktop/comp_meta_analysis/pubs/')
 
# iterate over files in
# that directory
other_nems = pd.DataFrame()
cel = pd.DataFrame()

for f in directory.glob('*.csv'):
    if "other_nematode" in f.name.lower():
        hold = clean_up(f)
        other_nems = other_nems.append(hold)

    elif ("cel" in f.name.lower()) or ("elegans" in f.name.lower()):
        hold = clean_up(f)

        cel = cel.append(hold)
    else:
        pass
print(len(cel))
print(len(other_nems))

2
18


Data was also collected from PubMed. We want to cross reference the PubMed and Web of Science datasets to remove any duplicates. To do that we look for duplicate PubMed IDs.

In [None]:


ef = pd.read_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/cln_nems_ef.csv', index_col=0).reset_index()
other_nems = other_nems.append(ef)
other_nems.fillna(0, inplace=True)
other_nems['PMID']=other_nems['PMID'].astype(int)
print(len(other_nems))

In [41]:
ef_cel = pd.read_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/cln_cel_ef.csv', index_col=0).reset_index()
cel = cel.append(ef_cel)

cel.fillna(0, inplace=True)
cel['PMID']=cel['PMID'].astype(int)
print(len(cel))

53


In [42]:
pubmed_nems = pd.read_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/Pmed_nematode_comps.csv', 
                         index_col=0)
all_nems = other_nems.merge(pubmed_nems, on='PMID', how='right')
all_nems = all_nems.loc[all_nems['Relevant'] != 'N']
print('Pmed nems: ' + str(len(pubmed_nems)) + ' Crowdsourced: '+ str(len(all_nems)))

Pmed nems: 91 Crowdsourced: 77


In [44]:
pubmed_cel = pd.read_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/Pmed_Cel_comps.csv', 
                         index_col=0).reset_index()
all_cel = cel.merge(pubmed_cel, on='PMID', how='right')
all_cel = all_cel.loc[all_cel['Relevant'] != 'N']
print('Pmed cel: ' + str(len(pubmed_cel)) + ' Crowdsourced: '+ str(len(all_cel)))

Pmed cel: 56 Crowdsourced: 49


In [45]:
all_cel.to_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/cln_cel.csv')
all_nems.to_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/other_nems.csv')

Not all of the publications that were pulled down from the initial search were considered "relevant" and need to be removed from the list.

In [62]:
anotated_cel = pd.read_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/all_cel.csv',
                          index_col=0)
print('Before filter: ' + str(len(anoted_cel)))
anotated_cel = anotated_cel.loc[anotated_cel['Relevant'] != 'N']
print('After filter: ' + str(len(anotated_cel)))

Before filter: 56
After filter: 33


In [63]:
anotated_nems = pd.read_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/all_nems.csv',
                            index_col=0)
print('Before filter: ' + str(len(anotated_nems)))

anotated_nems = anotated_nems.loc[anotated_nems['Relevant'] != 'N']
print('After filter: ' + str(len(anotated_nems)))

Before filter: 77
After filter: 28


In [64]:
anotated_cel.to_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/anotated_cel.csv')
anotated_nems.to_csv('/Users/Emily/Desktop/comp_meta_analysis/pubs/cleaned/anotated_nems.csv')

In [65]:
anotated_cel.head()

Unnamed: 0,Author,Title_x,Journal_x,Pub date,Pub year,DOI,DOI Link,PMID,Relevant,Chemotaxis,Other nems,"If Y to L, which ones",Other animals,Notes,index,Compound_y,CAS,Title_y,Journal_y
0,"Alexpandi, R; Prasanth, MI; Ravi, AV; Balamuru...",Protective effect of neglected plant Diplocycl...,JOURNAL OF PHOTOCHEMISTRY AND PHOTOBIOLOGY B-B...,DEC,2019.0,10.1016/j.jphotobiol.2019.111637,http://dx.doi.org/10.1016/j.jphotobiol.2019.11...,31706086,Y,N,N,0,0,Aging,0,Phytol,150-86-7,Protective effect of neglected plant Diplocycl...,"Journal of photochemistry and photobiology. B,..."
1,"Sathya, S; Shanmuganathan, B; Balasubramaniam,...",Phytol loaded PLGA nanoparticles regulate the ...,FOOD AND CHEMICAL TOXICOLOGY,FEB,2020.0,10.1016/j.fct.2019.110962,http://dx.doi.org/10.1016/j.fct.2019.110962,31734340,Y,Y*,N,0,0,Assesses how phytol affects chemotaxis toward ...,1,Phytol,150-86-7,Phytol loaded PLGA nanoparticles regulate the ...,Food and chemical toxicology : an internationa...
2,"Wetchakul, P; Goon, JA; Adekoya, AE; Olatunji,...","Traditional tonifying polyherbal infusion, Jat...",BMC COMPLEMENTARY AND ALTERNATIVE MEDICINE,13-Aug,2019.0,10.1186/s12906-019-2626-1,http://dx.doi.org/10.1186/s12906-019-2626-1,31409340,Y,N,N,0,0,0,0,Ellagic acid,476-66-4,"Traditional tonifying polyherbal infusion, Jat...",BMC complementary and alternative medicine
3,"Navarro-Hortal, MD; Romero-Marquez, JM; Esteba...",Strawberry (Fragaria x ananassa cv. Romina) me...,FOOD CHEMISTRY,15-Mar,2022.0,10.1016/j.foodchem.2021.131272,http://dx.doi.org/10.1016/j.foodchem.2021.131272,34628121,Y,N,N,0,0,0,1,Ellagic acid,476-66-4,Strawberry (Fragaria × ananassa cv. Romina) me...,Food chemistry
4,"Ndjonka, D; Abladam, ED; Djafsia, B; Ajonina-E...",Anthelmintic activity of phenolic acids from t...,JOURNAL OF HELMINTHOLOGY,DEC,2014.0,10.1017/S0022149X1300045X,http://dx.doi.org/10.1017/S0022149X1300045X,23768773,Y,N,Y,0,0,0,2,Ellagic acid,476-66-4,Anthelmintic activity of phenolic acids from t...,Journal of helminthology
