# CA Prop 65 Chemical categorization
California Proposition 65 requires labeling for chemical acknowledged in the state of California as causing cancer or reproductive/developmental toxicity. This enables us to categorize the chemicals on the Prop 65 list as instances of 'carcinogen', 'reproductive toxicant', 'male reproductive toxicant', 'female reproductive toxicant', and 'developmental toxicant' in Wikidata adding some very basic (and somewhat indirect) chem-disease relationships

Note that California's OEHHA allows users to download a pre-exported .csv file of chemicals listed under Prop 65 (does not include de-listed chemicals). Alternatively, users can export the complete list of chemicals from OEHHA which will include chemicals that are under consideration, currently listed, or formerly listed.

This notebook partially explores both exports for the best way for loading the data into Wikidata. The final bot will likely NOT include both methods.

In [2]:
from wikidataintegrator import wdi_core, wdi_login, wdi_helpers
from wikidataintegrator.ref_handlers import update_retrieved_if_new_multiple_refs
import pandas as pd
from pandas import read_csv
import requests
from tqdm.notebook import trange, tqdm
import ipywidgets 
import widgetsnbextension
import xml.etree.ElementTree as et 
import time


In [3]:
## Note that the property start date is used for list date.
## When placed in the references, Deltabot moved it out as a qualifier

from datetime import datetime
import copy
def create_reference(prop65_url):
    refStatedIn = wdi_core.WDItemID(value="Q28455381", prop_nr="P248", is_reference=True)
    timeStringNow = datetime.now().strftime("+%Y-%m-%dT00:00:00Z")
    refRetrieved = wdi_core.WDTime(timeStringNow, prop_nr="P813", is_reference=True)
    refURL = wdi_core.WDUrl(value=prop65_url, prop_nr="P854", is_reference=True)
    return [refStatedIn, refRetrieved, refURL]

In [4]:
print("Logging in...")
import wdi_user_config ## Credentials stored in a wdi_user_config file
login_dict = wdi_user_config.get_credentials()
login = wdi_login.WDLogin(login_dict['WDUSER'], login_dict['WDPASS'])


Logging in...
https://www.wikidata.org/w/api.php
Successfully logged in as Gtsulab


# Get the most recent csv file and parse it

## CA OEHHA clean up method
The manually triggered export of chemical list from the OEHHA site has less header junk, random title notes, and other things that disrupt the structure which would make the name convers easier. Additionally, the data on the cancer, reproductive toxicity, etc. is more structured, and doesn't have random blank spaces

In [5]:
datasrc = 'data/OEHHA-2019-11-1.csv'

chem_list = read_csv(datasrc, encoding = 'unicode_escape', header=0) 
print(chem_list.columns)
chem_list.dropna(axis='columns', how='all',inplace=True)
chem_list.fillna("None", inplace=True)
#print(chem_list.columns.values)

## Pull out only columns of interest for our task
cols_of_interest = chem_list[['Title','CAS Number','Cancer','Cancer - Listing Mechanism',
                          'Reproductive Toxicity','Chemical listed under Proposition 65 as causing',
                          'Developmental Toxicity - Date of Listing','Developmental Toxicity - Listing Mechanism',
                          'Female Reproductive Toxicity - Date of Listing',
                          'Female Reproductive Toxicity - Listing Mechanism',
                          'Male Reproductive Toxicity - Date of Listing',
                          'Male Reproductive Toxicity - Listing Mechanism']]

## Remove entries which are not relevant
prop_65_irrelevant = cols_of_interest.loc[(cols_of_interest['Cancer'] == "None") & 
                                          (cols_of_interest['Reproductive Toxicity'] == "None") & 
                                          (cols_of_interest['Chemical listed under Proposition 65 as causing'] == "None")]
non_prop_chems = prop_65_irrelevant['Title'].tolist()
prop65_chems = cols_of_interest.loc[~cols_of_interest['Title'].isin(non_prop_chems)].copy()
#print(prop65_chems.head(n=2))

Index(['Title', 'CAS Number', 'Use(s)', 'Synonym(s)', 'Latest Criteria',
       'Inhalation Unit Risk (Î¼g/cubic meter)-1',
       'Inhalation Slope Factor (mg/kg-day)-1',
       'Oral Slope Factor (mg/kg-day)-1', 'Last Cancer Potency Revision',
       'Acute REL (Î¼g/m3)', 'Species', 'Acute REL Toxicologic Endpoint',
       'Acute REL Target Organs', 'Severity', 'Last Acute REL Revision',
       '8-Hour Inhalation REL (Î¼g/m3)', 'Last 8-Hour REL Revision',
       'Chronic Inhalation REL (Î¼g/m3)', 'Chronic Toxicologic Endpoint',
       'Chronic Target Organs', 'Human Data', 'Health Risk Category',
       'Cancer Risk at PHG', 'MCL value (mg/L)', 'Cancer Risk at MCL',
       'Notification Level (Î¼g/L)', 'Public Health Goal (mg/L)',
       'Last PHG Revision', 'No Significant Risk Level (NSRL) - Inhalation',
       'No Significant Risk Level (NSRL) - Oral',
       'Maximum Allowable Dose Level (MADL) for chemicals causing reproductive toxicity - Inhalation',
       'Maximum Allowable D

### Chemical names to URL conversion for mapping and/or mix n match
The property in Wikidata uses the URL stub as ID so we'll need to convert the Chemical names to url stubs that work with prop65 website. The urls will then be mapped to Wikidata entries with the property that were added via Mix N match. Normally, urls can be tested, but CA Prop 65 website has captcha protection and blocks scrapers.

Example conversion:
"OEHHA listing" --> "OEHHA url" | "Prop 65 listing" --> "Prop 65 url"
* "Amino-alpha-carboline" --> "amino-alpha-carboline" | "A-alpha-C (2-Amino-9H-pyrido[2,3-b]indole)" --> "alpha-c-2-amino-9h-pyrido23-bindole"
* "Allyl Chloride" --> "allyl-chloride" | not listed
* "alpha-Methylstyrene" --> "alpha-methylstyrene" | "?-Methyl styrene (alpha-Methylstyrene)" --> "methyl-styrene-alpha-methylstyrene"
* "MeIQx (2-Amino-3,8-dimethylimidazo[4,5-f]quinoxaline)" --> "meiqx-2-amino-38-dimethylimidazo45-fquinoxaline" | "MeIQx (2-Amino-3,8-dimethylimidazo[4,5-f]quinoxaline)" --> "meiqx-2-amino-38-dimethylimidazo45-fquinoxaline"
* "2-Mercaptobenzothiazole" --> "2-mercaptobenzothiazole" | "2?Mercaptobenzothiazole" --> "2-mercaptobenzothiazole"
* "Trp-P-1 (Tryptophan-P-1)" --> "trp-p-1-tryptophan-p-1" | "Trp-P-1 (Tryptophan-P-1)" --> "trp-p-1-tryptophan-p-1"
* "Alcoholic beverages, when associated with alcohol abuse" --> "alcoholic-beverages" | "Alcoholic beverages, when associated with alcohol abuse" --> "alcoholic-beverages"
* "Aspirin" --> "aspirin" | "Aspirin (NOTE:  It is especially  important not to use aspirin during the last three months of pregnancy,  unless specifically directed to do so by a physician because it may cause  problems in the unborn child or  complications during delivery.)" --> "aspirin"

In [6]:
## To convert the title to a url stub, lower case it, strip out parenthesis, brackets, and commas, and replace spaces with dashes
prop65_chems['url_stub'] = prop65_chems['Title'].str.lower().str.replace("[","").str.replace("]","").str.replace(",","").str.replace("(","").str.replace(")","").str.strip("]").str.replace(" ","-")
#print(prop65_chems.head())

## Check the look of the url stub
#print(prop65_chems.loc[prop65_chems['Title']=="Allyl Chloride"])
#print(prop65_chems.loc[prop65_chems['Title']=="Trp-P-1 (Tryptophan-P-1)"])
#print(prop65_chems.loc[prop65_chems['Title']=="MeIQx (2-Amino-3,8-dimethylimidazo[4,5-f]quinoxaline)"])
#print(prop65_chems.head(n=2))

#print(prop65_chems.head(n=2))
mixnmatch_cat = prop65_chems[['url_stub','Title','CAS Number']].copy()
mixnmatch_cat.rename(columns={'url_stub':'Entry ID','Title':'Entry name'}, inplace=True)
mixnmatch_cat['Entry description'] = mixnmatch_cat['Entry name'].astype(str).str.cat(mixnmatch_cat['CAS Number'].astype(str),sep=", CAS Number: ")
#mixnmatch_cat.drop('CAS Number',axis=1,inplace=True)
print(mixnmatch_cat.head(n=2))

#mixnmatch_cat.to_csv('data/mixnmatch_cat.tsv',sep='\t', header=True)

              Entry ID           Entry name   CAS Number  \
0  abiraterone-acetate  Abiraterone acetate  154229-18-2   
2         acetaldehyde         Acetaldehyde      75-07-0   

                              Entry description  
0  Abiraterone acetate, CAS Number: 154229-18-2  
2             Acetaldehyde, CAS Number: 75-07-0  


### Mapping Items to Wikidata based on CAS RN values
For entries that can be mapped via CAS ID go ahead and add the Prop65 ID

In [7]:
sparqlQuery = "SELECT * WHERE {?item wdt:P231 ?CAS}"
result = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery)
cas_in_wd_list = []

i=0
while i < len(result["results"]["bindings"]):
    cas_id = result["results"]["bindings"][i]["CAS"]["value"]
    wdid = result["results"]["bindings"][i]["item"]["value"].replace("http://www.wikidata.org/entity/", "")
    cas_in_wd_list.append({'WDID':wdid,'CAS Number':cas_id})
    i=i+1

cas_in_wd = pd.DataFrame(cas_in_wd_list)
cas_in_wd.drop_duplicates(subset='CAS Number',keep=False,inplace=True)
cas_in_wd.drop_duplicates(subset='WDID',keep=False,inplace=True)
print(cas_in_wd.head(n=2))

   CAS Number   WDID
0  12385-13-6   Q556
1  53850-35-4  Q1232


In [8]:
prop_65_matches = mixnmatch_cat.merge(cas_in_wd,on='CAS Number',how='inner')
print(prop_65_matches.head(n=2))
print(len(prop_65_matches))
#prop_65_matches.to_csv('data/mixnmatch_cat_with_cas.tsv',sep='\t', header=True)

              Entry ID           Entry name   CAS Number  \
0  abiraterone-acetate  Abiraterone acetate  154229-18-2   
1         acetaldehyde         Acetaldehyde      75-07-0   

                              Entry description       WDID  
0  Abiraterone acetate, CAS Number: 154229-18-2  Q27888393  
1             Acetaldehyde, CAS Number: 75-07-0     Q61457  
798


In [9]:
## Pull things matched via mix n match
sparqlQuery = "SELECT ?item ?CA65 WHERE {?item wdt:P7524 ?CA65}"
result = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery)
CA65_in_wd_list = []

i=0
while i < len(result["results"]["bindings"]):
    CA65_id = result["results"]["bindings"][i]["CA65"]["value"]
    wdid = result["results"]["bindings"][i]["item"]["value"].replace("http://www.wikidata.org/entity/", "")
    CA65_in_wd_list.append({'WDID':wdid,'Entry ID':CA65_id})
    i=i+1

CA65_in_wd = pd.DataFrame(CA65_in_wd_list)
print(len(CA65_in_wd))

107


In [10]:
## Remove items matched via mix n match from update
#print(CA65_in_wd.head(n=2))
prop_65_less_mixnmatch = prop_65_matches.loc[~prop_65_matches['Entry ID'].isin(CA65_in_wd['Entry ID'].tolist())]
print(prop_65_less_mixnmatch.head(n=2))

     Entry ID Entry name  CAS Number                  Entry description  \
58   auramine   Auramine    492-80-8     Auramine, CAS Number: 492-80-8   
59  auranofin  Auranofin  34031-32-8  Auranofin, CAS Number: 34031-32-8   

         WDID  
58  Q26840770  
59    Q421230  


In [11]:
prop65_to_add = prop_65_less_mixnmatch[0:10]
url_base = 'https://oehha.ca.gov/chemicals/'
list_prop = "P7524" 

for i in tqdm(range(len(prop65_to_add))):
    prop_65_qid = prop65_to_add.iloc[i]['WDID']
    prop_65_id = prop65_to_add.iloc[i]['Entry ID']
    prop_65_url = url_base+prop_65_id
    reference = create_reference(prop_65_url)
    prop65_statement = [wdi_core.WDString(value=prop_65_id, prop_nr=list_prop, 
                               references=[copy.deepcopy(reference)])]
    item = wdi_core.WDItemEngine(wd_item_id=prop_65_qid, data=prop65_statement, append_value=list_prop,
                               global_ref_mode='CUSTOM', ref_handler=update_retrieved_if_new_multiple_refs)
    item.write(login, edit_summary="added CA prop 65 id")
    print(prop_65_id, prop_65_qid, prop_65_url)
    

HBox(children=(IntProgress(value=0, max=10), HTML(value='')))

auramine Q26840770 https://oehha.ca.gov/chemicals/auramine
auranofin Q421230 https://oehha.ca.gov/chemicals/auranofin
avermectin-b1-abamectin Q305345 https://oehha.ca.gov/chemicals/avermectin-b1-abamectin
azacitidine Q416451 https://oehha.ca.gov/chemicals/azacitidine
azaserine Q4832281 https://oehha.ca.gov/chemicals/azaserine
azathioprine Q18939 https://oehha.ca.gov/chemicals/azathioprine
azobenzene Q8884513 https://oehha.ca.gov/chemicals/azobenzene
benthiavalicarb-isopropyl Q27155585 https://oehha.ca.gov/chemicals/benthiavalicarb-isopropyl
benzidine Q410066 https://oehha.ca.gov/chemicals/benzidine
benzofuran Q410089 https://oehha.ca.gov/chemicals/benzofuran



## Clean up the data for chemical typing (chem/disease relations)
The addition of Prop 65 chemical to cancer and reproductive harm relationships depends on mappings based on the presence of the CA Prop 65 property and identifier.

In [12]:
## Run sparql query to pull all entities with Prop 65 ID (Read Only Run)
sparqlQuery = "SELECT ?item ?CA65 WHERE {?item wdt:P7524 ?CA65}"
result = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery)
CA65_in_wd_list = []

i=0
while i < len(result["results"]["bindings"]):
    CA65_id = result["results"]["bindings"][i]["CA65"]["value"]
    wdid = result["results"]["bindings"][i]["item"]["value"].replace("http://www.wikidata.org/entity/", "")
    CA65_in_wd_list.append({'WDID':wdid,'url_stub':CA65_id})
    i=i+1

## Inspect the results for mapping or coverage issues
CA65_in_wd = pd.DataFrame(CA65_in_wd_list)
print("resulting mapping table has: ",len(CA65_in_wd)," rows.")

resulting mapping table has:  117  rows.


In [27]:
## Perform left merge for currently listed and partially delisted items
prop_65_mapped = prop65_chems.merge(CA65_in_wd, on='url_stub', how='left')
#print(prop_65_mapped.head(n=2))

In [28]:
prop_65_mapped['devtox current'] = prop_65_mapped['Chemical listed under Proposition 65 as causing'].str.contains("Development")
prop_65_mapped['menrep current'] = prop_65_mapped['Chemical listed under Proposition 65 as causing'].str.contains("Male")
prop_65_mapped['femrep current'] = prop_65_mapped['Chemical listed under Proposition 65 as causing'].str.contains("Female")
prop_65_mapped['cancer current'] = prop_65_mapped['Cancer'].str.contains("Current")
prop_65_mapped['reptox current'] = prop_65_mapped['Reproductive Toxicity'].str.contains("Current")

In [30]:
prop_65_mapped['cancer delisted'] = prop_65_mapped['Cancer'].str.contains("Formerly")
prop_65_mapped['reptox delisted'] = prop_65_mapped['Reproductive Toxicity'].str.contains("Formerly")
prop_65_mapped.loc[(((prop_65_mapped['Developmental Toxicity - Date of Listing']!="None")|
        (prop_65_mapped['Developmental Toxicity - Listing Mechanism']!="None"))&
        (prop_65_mapped['devtox current']==False)), 'devtox delisted'] = True
prop_65_mapped.loc[(((prop_65_mapped['Female Reproductive Toxicity - Date of Listing']!="None")|
        (prop_65_mapped['Female Reproductive Toxicity - Listing Mechanism']!="None"))&
        (prop_65_mapped['femrep current']==False)), 'femrep delisted'] = True
prop_65_mapped.loc[(((prop_65_mapped['Male Reproductive Toxicity - Date of Listing']!="None")|
        (prop_65_mapped['Male Reproductive Toxicity - Listing Mechanism']!="None"))&
        (prop_65_mapped['menrep current']==False)), 'menrep delisted'] = True

prop_65_mapped.fillna(False, inplace=True)

### Retrieve existing such statements in Wikidata including whether the statement is deprecated or not
When using SPARQL queries, the 't' in 'wdt' represents 'truthy' and prioritizes higher ranks and non-deprecated items/statements. Since we're looking for statements which are deprecated, we'll use 'ps' (property statement) and 'pq' (property qualifier) for the sparql query involving deprecated items.

In [21]:
## Here are the object QIDs, assuming that a chemical is the subject
object_qid = {'femrep':'Q55427776',
              'menrep': 'Q55427774',
              'devtox': 'Q72941151',
              'cancer': 'Q187661',
              'reptox': 'Q55427767'}

In [19]:
## Sample query for deprecated item
"""
#instances of carcinogen delisted
SELECT ?item ?itemLabel {
  ?item ps:P31 wd:Q187661 .
  ?item pq:P2241 wd:Q56478729. 
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
   }
"""

sparqlQuery = "SELECT ?item {?item ps:P31 wd:Q187661. ?item pq:P2241 wd:Q56478729. }"
result = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery)
print(result)

{'head': {'vars': ['item']}, 'results': {'bindings': [{'item': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/statement/Q420473-13FC25E5-BBB9-4004-94B1-0974A51D2CAF'}}]}}


In [34]:
deprecated_results = []
current_results = []

for object_type in object_qid.keys():
    deprecated_query = "SELECT ?item {?item ps:P31 wd:"+object_qid[object_type]+". ?item pq:P2241 wd:Q56478729. }"
    depresult = wdi_core.WDItemEngine.execute_sparql_query(deprecated_query)
    i=0
    while i < len(depresult["results"]["bindings"]):
        wdi_uri = depresult["results"]["bindings"][i]["item"]["value"].replace("http://www.wikidata.org/entity/statement/", "")
        tmp = wdi_uri.split('-')
        WDID = tmp[0]
        deprecated_results.append({'WDID':WDID,'deprecated_type':object_type+' delisted'})
        i=i+1
    
    sparqlQuery = "SELECT ?item WHERE {?item wdt:P31 wd:"+object_qid[object_type]+".}"
    result = wdi_core.WDItemEngine.execute_sparql_query(sparqlQuery)
    j=0
    while j < len(result["results"]["bindings"]):
        wdid = result["results"]["bindings"][j]["item"]["value"].replace("http://www.wikidata.org/entity/", "")
        current_results.append({'WDID':wdid,'ObjectType':object_type+' current'})
        j=j+1    


In [35]:
deprecated_df = pd.DataFrame(deprecated_results)
current_df = pd.DataFrame(current_results)
print(len(deprecated_df),len(current_df))
print(deprecated_df)
print(current_df)

1 9
      WDID  deprecated_type
0  Q420473  cancer delisted
       ObjectType       WDID
0  femrep current  Q27888393
1  menrep current  Q27888393
2  devtox current  Q27888393
3  cancer current     Q52822
4  cancer current     Q61457
5  cancer current    Q229922
6  cancer current    Q410066
7  cancer current  Q55427380
8  cancer current  Q58622448


### Compare the data with retrieved results to determine what to ignore and what to write

* Anything listed as delisted in CA and deprecated in WD should be ignored (no change needed)
* Anything listed as delisted in CA but not deprecated in WD should have the rank updated (changed to deprecated due to new delisting)
* Anything listed as current in CA and current in WD should be ignored (no change needed)
* Anything listed as current in CA and not in WD should be written to WD (new CA entities)

In [None]:
### List of deprecated entities in Wikidata
cancer_deprecated = deprecated_df.loc[deprecated_df['deprecated_type']=='cancer delisted'].tolist()
reptox_deprecated = deprecated_df.loc[deprecated_df['deprecated_type']=='reptox delisted'].tolist()
femrep_deprecated = deprecated_df.loc[deprecated_df['deprecated_type']=='femrep delisted'].tolist()
menrep_deprecated = deprecated_df.loc[deprecated_df['deprecated_type']=='menrep delisted'].tolist()
devtox_deprecated = deprecated_df.loc[deprecated_df['deprecated_type']=='devtox delisted'].tolist()

### List of current entities in Wikidata
cancer_current = current_df.loc[current_df['ObjectType']=='cancer current'].tolist()
reptox_current = current_df.loc[current_df['ObjectType']=='reptox current'].tolist()
femrep_current = current_df.loc[current_df['ObjectType']=='femrep current'].tolist()
menrep_current = current_df.loc[current_df['ObjectType']=='menrep current'].tolist()
devtox_current = current_df.loc[current_df['ObjectType']=='devtox current'].tolist()

In [None]:
## List of delisted entities to write
cancer_delisted = prop_65_mapped['WDID'].loc[prop_65_mapped['cancer delisted']==True].tolist()
reptox_delisted = prop_65_mapped['WDID'].loc[prop_65_mapped['reptox delisted']==True].tolist()
femrep_delisted = prop_65_mapped['WDID'].loc[prop_65_mapped['femrep delisted']==True].tolist()
menrep_delisted = prop_65_mapped['WDID'].loc[prop_65_mapped['menrep delisted']==True].tolist()
devtox_delisted = prop_65_mapped['WDID'].loc[prop_65_mapped['devtox delisted']==True].tolist()

## List of current entities to write
cancer_current = prop_65_mapped['WDID'].loc[prop_65_mapped['cancer current']==True].tolist()
reptox_current = prop_65_mapped['WDID'].loc[prop_65_mapped['reptox current']==True].tolist()
femrep_current = prop_65_mapped['WDID'].loc[prop_65_mapped['femrep current']==True].tolist()
menrep_current = prop_65_mapped['WDID'].loc[prop_65_mapped['menrep current']==True].tolist()
devtox_current = prop_65_mapped['WDID'].loc[prop_65_mapped['devtox current']==True].tolist()

In [None]:
## all prop_65_CA WDIDs
all_entities = prop_65_mapped['WDID'].tolist()

In [None]:
edit_log = []
repcheck = ('femrep','menrep','devrep')
for eachitem in all_entities:
    new_statements = []
    new_dep_states = []
    rank_delist = []
    rank_relist = []
    for object_type in object_qid.keys():
        ## If the item is delisted
        if eachitem in prop_65_mapped['WDID'].loc[prop_65_mapped[object_type+' delisted']==True].tolist():
            ## if true, check if it's in the deprecated list
            if eachitem in deprecated_df['WDID'].loc[deprecated_df['deprecated_type']==object_type+' delisted'].tolist():
                ## if true, no write is needed, note it in the log
                edit_log.append{{'WDID':eachitem,object_type+' delisted':True,object_type+' current':False,'edits':'No Change'}}
            ## Else if the item is in the current list of wd entities
            elif eachitem in current_df['WDID'].loc[current_df['ObjectType']==object_type+' current'].tolist():
                ## if true, rank change is needed, new statement not needed
                rank_delist.append(object_type)
            ## Else the item is delisted and not in WD as such
            else:
                ## if true, A new statement is needed which includes the deprecated rank
                new_dep_states.append(object_type)
                
        ## If the item is current
        if eachitem in prop_65_mapped['WDID'].loc[prop_65_mapped[object_type+' current']==True].tolist():
            ## If the item is current, check if it's in the deprecated list
            if eachitem in deprecated_df['WDID'].loc[deprecated_df['deprecated_type']==object_type+' delisted'].tolist():
                ## if true, it appears to have been re-listed, rank change may be needed
                rank_relist.append(object_type)
            ## Else if the item is in the current list of wd entities
            elif eachitem in current_df['WDID'].loc[current_df['ObjectType']==object_type+' current'].tolist():
                ## if true, no change, no write and no new rank change needed
                edit_log.append{{'WDID':eachitem,object_type+' delisted':False,object_type+' current':True,'edits':'No Change'}}
            ## Else the item is currently listed, but not in WD as such
            else:
                ## if true, A new statement is needed 
                new_statements.append(object_type)
        
    ## Put in some logic for handling reptox vs femrep, menrep and devtox
    if repcheck.intersection(set(new_statements)) >= 1 & 'reptox' in new_statements:
        new_statements.remove('reptox')
    if repcheck.intersection(set(new_dep_states)) >= 1 & 'reptox' in new_dep_states:
        new_dep_states.remove('reptox')    
    if repcheck.intersection(set(rank_delist)) >= 1 & 'reptox' in rank_delist:
        rank_delist.remove('reptox')  
    if repcheck.intersection(set(rank_relist)) >= 1 & 'reptox' in rank_relist:
        rank_relist.remove('reptox')
    
    ## Generate new statements to write
    
    
    ## Update Ranks in previous statements
    
        
                                        

In [31]:
print(prop_65_mapped.columns)

Index(['Title', 'CAS Number', 'Cancer', 'Cancer - Listing Mechanism',
       'Reproductive Toxicity',
       'Chemical listed under Proposition 65 as causing',
       'Developmental Toxicity - Date of Listing',
       'Developmental Toxicity - Listing Mechanism',
       'Female Reproductive Toxicity - Date of Listing',
       'Female Reproductive Toxicity - Listing Mechanism',
       'Male Reproductive Toxicity - Date of Listing',
       'Male Reproductive Toxicity - Listing Mechanism', 'url_stub', 'WDID',
       'devtox current', 'menrep current', 'femrep current', 'cancer current',
       'reptox current', 'cancer delisted', 'reptox delisted',
       'devtox delisted', 'femrep delisted', 'menrep delisted'],
      dtype='object')
