Let's import the methods and data we used from the acquiring data tutorial.

In [1]:
import os
import pandas as pd

temporary_output_folder = 'temp_outputs'
tidy_outputs_folder = 'outputs'

COMPOUND_ID_COL = 'Standard_SMILES'
TAXON_GROUPING = 'accepted_species'

Catalpa_bignonioides_deduplicated_data = pd.read_csv(os.path.join(tidy_outputs_folder, 'Catalpa_bignonioides_deduplicated_data.csv'), index_col=0)

Using ChEMBL we can add an `active_chembl_compound` for those compounds appearing as active/inactive in ChEMBL antiplasmodial bioassay data.

For this, we use `InChIKey_simp` as compound id column as the InChIKey is a reliable compound ID, and stereoisomerism doesn't greatly impact antiplasmodial activity. You also use SMILES keys if you want, but would need to create a 'Standard_SMILES' column to match compounds in the given ChEMBL data with `df['Standard_SMILES'] = df['Smiles'].apply(standardise_SMILES)`.

In [4]:
from phytochempy.compound_properties import update_compound_info_from_chembl_apm_assays, add_chembl_apm_data_to_compound_df

# update_compound_info_from_chembl_apm_assays() -- this will update the data provided in the package, if required
with_chembl_data = add_chembl_apm_data_to_compound_df(Catalpa_bignonioides_deduplicated_data, output_csv=os.path.join(temporary_output_folder, 'Catalpa_bignonioides_apm_chembl.csv'), compound_id_col='InChIKey_simp')

Bioavailability rules (Lipinski & Veber), can be calculated using SMILES strings in the data.

In [6]:
from phytochempy.compound_properties import add_bioavailability_rules_to_df

with_bioavailability = add_bioavailability_rules_to_df(Catalpa_bignonioides_deduplicated_data, 'Standard_SMILES')

MAIP scores can be added to your data but this requires manual uploading to https://www.ebi.ac.uk/chembl/maip/ and saving the results. The `get_manual_MAIP_to_upload` method will output a file that you can upload and the results should be downloaded and pointed to in the `add_manual_info_files` method.

In [None]:
from phytochempy.data_compilation_utilities import get_manual_MAIP_to_upload, add_manual_info_files

get_manual_MAIP_to_upload(Catalpa_bignonioides_deduplicated_data, temporary_output_folder)

# The specified maip_output_file should be the file downloaded from MAIP.
data_with_MAIP_scores = add_manual_info_files(with_bioavailability,
                                 maip_output_file=os.path.join(tidy_outputs_folder, 'example_maip_file.csv'))