# Using the MODAC Client

In this tutorial, we will utilize the [MODAC API](https://modac.cancer.gov/swagger-ui/4.14.0/index.html#/) to download and run an existing model from https://modac.cancer.gov/. This can be useful when you are running AMPL on a remote machine, and cannot easily upload existing models.

## Configuration

In PrecisionFDA, you can declare `MODAC_USER` and `MODAC_PASS` in your `.env` file. See [PFDA.md](https://github.com/mass-matrix/AMPL/blob/master/PFDA.md#L44) for more information about the `.env` file. Alternatively, you can just set your environment like below:

In [1]:
import logging
import os

logger = logging.getLogger()
logger.setLevel(logging.INFO)

os.environ['MODAC_USER'] = 'herman@massmatrix.bio'
os.environ['MODAC_PASS'] = ''

## Downloading Existing Models

As an example, we use this [QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16](https://modac.cancer.gov/assetDetails?returnToSearch=true&&dme_data_id=NCI-DME-MS01-106689321). However, you can choose any other model from this site. Be sure to copy `ASSET PATH`. For this example it's `/NCI_DOE_Archive/ATOM/QMugs/QMugs_HOMO-LUMO_Prediction_Model`

In [2]:
from atomsci.modac import *

client = MoDaCClient()

client.download_all_files_in_collection("/NCI_DOE_Archive/ATOM/QMugs_AMPL_16/QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16")

Calling function 'download_all_files_in_collection' with args: ('/NCI_DOE_Archive/ATOM/QMugs_AMPL_16/QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16',) kwargs: {}
Calling function 'get_collection' with args: ('/NCI_DOE_Archive/ATOM/QMugs_AMPL_16/QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16',) kwargs: {}
Get collection. Making requests to https://modac.cancer.gov/api/collection//NCI_DOE_Archive/ATOM/QMugs_AMPL_16/QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16
Function 'get_collection' returned: {'collectionId': 106689321, 'collectionName': '/NCI_DOE_Archive/ATOM/QMugs_AMPL_16/QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16', 'absolutePath': '/NCI_DOE_Archive/ATOM/QMugs_AMPL_16/QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16', 'collectionParentName': '/NCI_DOE_Archive/ATOM/QMugs_AMPL_16', 'collectionOwnerName': 'ncidoesvcp2', 'collectionOwnerZone': 'ncifprodZone', 'collectionMapId': '0', 'collectionInheritance': '1', 'createdAt': 1722000570000, 'specColType': 'NORMAL', 'subCollections': [{'id': 10675716

After the download is complete, you can see all the files get downloaded inside a folder which seems to be named by the model `ASSET NAME`. 

Here we look for all the files within the directory `QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16/`

In [3]:
import os

project_dir = os.path.join(os.getcwd(), 'QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16')
os.listdir(project_dir)

['qmugs_11_curated_external_test_set_37K.csv',
 'qmugs_curated_160_dropped_duplicates.csv',
 'qmugs_curated_160_dropped_duplicates_model_809236ab-c3c5-481d-9408-55b629282e49.tar.gz']

## Import the dataset

In [4]:
import pandas as pd

test_datafile = os.path.join(project_dir, 'qmugs_11_curated_external_test_set_37K.csv')
test_data = pd.read_csv(test_datafile)
test_data.head()

Unnamed: 0,identifier,conf_id,smiles,charge,unpaired_electrons,mw,atoms,heavy_atoms,heteroatoms,rotatable_bonds,...,DFT_LUMO_ENERGY,DFT_HOMO_LUMO_GAP,SASA,PINT,base_rdkit_smiles,inchi_key,VALUE_NUM_mean,VALUE_NUM_std,Perc_Var,Remove_BadDuplicate
0,CHEMBL100109_neutral,conf_00,[H]OB(O[H])[C@@]([H])(N([H])C(=O)[C@@]1([H])N(...,0,0,432.254401,64,31,10,15,...,0.052682,0.359226,17.830416604829175|1.500035048078731|12.859508...,31.810941695385928|43.45386310879664|36.710613...,CC(=O)N[C@H](Cc1ccccc1)C(=O)N1CCC[C@H]1C(=O)N[...,UCQIHCRMWNRFNP-QYZOEREBSA-N,0.359226,,0.0,0
1,CHEMBL100165_charged,conf_00,[H]O[C@]([H])(C([H])([H])Oc1c(-c2c([H])c([H])c...,-1,0,417.208276,60,30,6,16,...,0.098425,0.237322,14.630341835594542|0.03000070096157006|14.5303...,32.777081219883705|42.296161885126466|33.17706...,CC(C)c1cc(-c2ccc(F)cc2)c(OC[C@@H](O)C[C@@H](O)...,GWJDCGLHTDEKGT-UXHICEINSA-M,0.237322,,0.0,0
2,CHEMBL100187_neutral,conf_00,[H]OB(O[H])[C@@]([H])(N([H])C(=O)C([H])([H])N(...,0,0,335.201637,50,24,8,14,...,0.044325,0.352346,11.070258654821028|1.0499888650789475|8.060188...,36.4255470210142|44.39135498757223|35.27951806...,CN(CC(=O)N[C@@H](CCCCN)B(O)O)C(=O)Cc1ccccc1,VPLBUOGKJQGYQO-AWEZNQCLSA-N,0.352346,,0.0,0
3,CHEMBL100188_charged,conf_00,[H]O[C@]([H])(/C([H])=C(\[H])C(=C(c1c([H])c([H...,-1,0,475.235079,66,35,8,16,...,0.115742,0.29473,17.160400950020662|4.370102106736038|12.440290...,30.957285769603345|37.18366077814796|35.758176...,Cc1ccc(C(=C(/C=C/[C@@H](O)C[C@@H](O)CC(=O)[O-]...,UUVPDWUBKWQFOU-AZHXHMFBSA-M,0.29473,,0.0,0
4,CHEMBL100201_charged,conf_00,[H]c1nc([H])c([C@@]2([H])SC([H])([H])c3c(C(=O)...,-1,0,622.091215,65,44,11,6,...,0.102634,0.295396,40.698443564491576|1.6500385528866082|2.240052...,25.015412042248208|42.360101750740064|45.49212...,O=C(c1ccn2c1CS[C@@H]2c1cccnc1)c1cn(C(=O)c2cccc...,WSTGIXIJIWZIHP-MGBGTMOVSA-M,0.295396,,0.0,0


## Run the model for predictions

In [9]:
import logging
import os
from atomsci.ddm.pipeline import predict_from_model as pfm

logging.getLogger().setLevel(logging.CRITICAL)

model_file = os.path.join(project_dir, 'qmugs_curated_160_dropped_duplicates_model_809236ab-c3c5-481d-9408-55b629282e49.tar.gz')
input_df = test_data # Make sure this matches your test dataset
response_col = "VALUE_NUM_mean"
compound_id = 'inchi_key'
smiles_col = "base_rdkit_smiles"
results_df = pfm.predict_from_model_file(model_path = model_file,
                            input_df = input_df,
                            smiles_col = smiles_col,
                            response_col = response_col)
results_df.head()

Standardizing SMILES strings for 27055 compounds.


INFO:atomsci.ddm.utils.model_version_utils:/home/herman/massmatrix/AMPL/atomsci/modac/examples/tutorials/QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16/qmugs_curated_160_dropped_duplicates_model_809236ab-c3c5-481d-9408-55b629282e49.tar.gz, 1.6.0
INFO:atomsci.ddm.utils.model_version_utils:Version compatible check: /home/herman/massmatrix/AMPL/atomsci/modac/examples/tutorials/QMugs_HOMO_LUMO_GAP_Prediction_Model_AMPL16/qmugs_curated_160_dropped_duplicates_model_809236ab-c3c5-481d-9408-55b629282e49.tar.gz version = "1.6", AMPL version = "1.6"
2024-08-05 16:19:29.302703: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-08-05 16:19:29.307957: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA 

Unnamed: 0,identifier,conf_id,smiles,charge,unpaired_electrons,mw,atoms,heavy_atoms,heteroatoms,rotatable_bonds,...,inchi_key,VALUE_NUM_mean,VALUE_NUM_std,Perc_Var,Remove_BadDuplicate,compound_id,orig_smiles,VALUE_NUM_mean_actual,VALUE_NUM_mean_pred,VALUE_NUM_mean_std
0,CHEMBL100109_neutral,conf_00,[H]OB(O[H])[C@@]([H])(N([H])C(=O)[C@@]1([H])N(...,0,0,432.254401,64,31,10,15,...,UCQIHCRMWNRFNP-QYZOEREBSA-N,0.359226,,0.0,0,compound_000000,CC(=O)N[C@H](Cc1ccccc1)C(=O)N1CCC[C@H]1C(=O)N[...,0.359226,0.369471,7.194597
1,CHEMBL100165_charged,conf_00,[H]O[C@]([H])(C([H])([H])Oc1c(-c2c([H])c([H])c...,-1,0,417.208276,60,30,6,16,...,GWJDCGLHTDEKGT-UXHICEINSA-M,0.237322,,0.0,0,compound_000001,CC(C)c1cc(-c2ccc(F)cc2)c(OC[C@@H](O)C[C@@H](O)...,0.237322,0.327982,5.932235
2,CHEMBL100187_neutral,conf_00,[H]OB(O[H])[C@@]([H])(N([H])C(=O)C([H])([H])N(...,0,0,335.201637,50,24,8,14,...,VPLBUOGKJQGYQO-AWEZNQCLSA-N,0.352346,,0.0,0,compound_000002,CN(CC(=O)N[C@@H](CCCCN)B(O)O)C(=O)Cc1ccccc1,0.352346,0.368177,7.198018
3,CHEMBL100188_charged,conf_00,[H]O[C@]([H])(/C([H])=C(\[H])C(=C(c1c([H])c([H...,-1,0,475.235079,66,35,8,16,...,UUVPDWUBKWQFOU-AZHXHMFBSA-M,0.29473,,0.0,0,compound_000003,Cc1ccc(C(=C(/C=C/[C@@H](O)C[C@@H](O)CC(=O)[O-]...,0.29473,0.261178,13.279189
4,CHEMBL100201_charged,conf_00,[H]c1nc([H])c([C@@]2([H])SC([H])([H])c3c(C(=O)...,-1,0,622.091215,65,44,11,6,...,WSTGIXIJIWZIHP-MGBGTMOVSA-M,0.295396,,0.0,0,compound_000004,O=C(c1ccn2c1CS[C@@H]2c1cccnc1)c1cn(C(=O)c2cccc...,0.295396,0.268533,8.945392


## Notes
- The Modac API can be unreliable at times
- You may encounter a version compatibility error. In this case, you will need to run matching versions of AMPL
For example:
```
ValueError: Version compatible check: <model>.tar.gz version: "1.4" not matching AMPL compatible version group: "1.6"
```