# Model development

The goal is to train a conditional model that predicts gene expression based on drug, dose, cell line, and mutation profile. We’ll explore generative approaches like variational inference, GANs, or others to build a predictive model that outputs gene expression under perturbed conditions, given expression in a control state.

The core idea is:

Train a model to generate gene expression under perturbation (conditioned on metadata and baseline/control expression).
Fine-tune the model to adapt to unseen contexts (e.g., normal lung cell types, brain cell types from atlases) using transfer learning.
Use the adapted model to predict gene expression changes in these new settings.
Check predictions for expected pathways up-regulated/down-regulated for given drugs (GO analysis and comparison to training set).
Looking for teammates interested in generative modeling, transfer learning, and gene expression prediction to shape this project together.



### Import libraries

In [2]:

import scanpy as sc
import pandas as pd
import numpy as np


In [5]:
import sys
print(sys.executable)

/usr/bin/python3


### Load the data

In [4]:

# read the data
adata_tahoe = sc.read_h5ad('/home/ubuntu/anatoly-tahoe-100/data/datatahoe-100m.h5ad')




In [6]:

adata_tahoe.obs


Unnamed: 0,drug,sample,BARCODE_SUB_LIB_ID,cell_line_id,moa-fine,canonical_smiles,pubchem_cid,plate,mean_gene_count_x,mean_tscp_count_x,mean_mread_count_x,mean_pcnt_mito_x,drugname_drugconc_x,mean_gene_count_y,mean_tscp_count_y,mean_mread_count_y,mean_pcnt_mito_y,drugname_drugconc_y
0,8-Hydroxyquinoline,smp_1783,01_001_052-lib_1105,CVCL_0480,unclear,C1=CC2=C(C(=C1)O)N=CC=C2,1923.0,plate4,1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]",1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]"
1,8-Hydroxyquinoline,smp_1783,01_001_105-lib_1105,CVCL_0546,unclear,C1=CC2=C(C(=C1)O)N=CC=C2,1923.0,plate4,1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]",1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]"
2,8-Hydroxyquinoline,smp_1783,01_001_165-lib_1105,CVCL_1717,unclear,C1=CC2=C(C(=C1)O)N=CC=C2,1923.0,plate4,1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]",1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]"
3,8-Hydroxyquinoline,smp_1783,01_003_094-lib_1105,CVCL_1717,unclear,C1=CC2=C(C(=C1)O)N=CC=C2,1923.0,plate4,1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]",1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]"
4,8-Hydroxyquinoline,smp_1783,01_003_164-lib_1105,CVCL_1056,unclear,C1=CC2=C(C(=C1)O)N=CC=C2,1923.0,plate4,1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]",1478.268171,2341.339094,2738.463797,0.023783,"[('8-Hydroxyquinoline', 0.05, 'uM')]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999995,(R)-Verapamil (hydrochloride),smp_1799,17_160_097-lib_1124,CVCL_1495,unclear,CC(C)C(CCCN(C)CCC1=CC(=C(C=C1)OC)OC)(C#N)C2=CC...,170014.0,plate4,1513.610111,2523.233013,2950.299370,0.069336,"[('(R)-Verapamil (hydrochloride)', 0.05, 'uM')]",1513.610111,2523.233013,2950.299370,0.069336,"[('(R)-Verapamil (hydrochloride)', 0.05, 'uM')]"
999996,(R)-Verapamil (hydrochloride),smp_1799,17_160_108-lib_1124,CVCL_1125,unclear,CC(C)C(CCCN(C)CCC1=CC(=C(C=C1)OC)OC)(C#N)C2=CC...,170014.0,plate4,1513.610111,2523.233013,2950.299370,0.069336,"[('(R)-Verapamil (hydrochloride)', 0.05, 'uM')]",1513.610111,2523.233013,2950.299370,0.069336,"[('(R)-Verapamil (hydrochloride)', 0.05, 'uM')]"
999997,(R)-Verapamil (hydrochloride),smp_1799,17_160_112-lib_1124,CVCL_0320,unclear,CC(C)C(CCCN(C)CCC1=CC(=C(C=C1)OC)OC)(C#N)C2=CC...,170014.0,plate4,1513.610111,2523.233013,2950.299370,0.069336,"[('(R)-Verapamil (hydrochloride)', 0.05, 'uM')]",1513.610111,2523.233013,2950.299370,0.069336,"[('(R)-Verapamil (hydrochloride)', 0.05, 'uM')]"
999998,(R)-Verapamil (hydrochloride),smp_1799,17_160_165-lib_1124,CVCL_0546,unclear,CC(C)C(CCCN(C)CCC1=CC(=C(C=C1)OC)OC)(C#N)C2=CC...,170014.0,plate4,1513.610111,2523.233013,2950.299370,0.069336,"[('(R)-Verapamil (hydrochloride)', 0.05, 'uM')]",1513.610111,2523.233013,2950.299370,0.069336,"[('(R)-Verapamil (hydrochloride)', 0.05, 'uM')]"


### Encode the data

### Train the models

### Save results