Use CMap on the LINC1000 data set to rank drugs and other perturbations based on their similarity to the transcriptomic signature of Alzheimer's disease.

In [9]:
import pandas as pd

We use the [query tool](https://clue.io/query/query) of the CMap portal [clue.io](https://clue.io/query).  (See an alternative tool [here](https://maayanlab.cloud/l1000cds2/#/index).)  The input data are up and down regulated genes in AD compiled by Sudhir Varma using multiple GEO datasets and an R script calculating a FDR adjusted p value for each gene's differential expression test.  The number of genes in each set corresponds to the upper size limit of gene sets for CMap.

In [7]:
%%bash
cd ~/CTNS/resources/CMap/sudhir-varma/
echo "$(wc -l up.genes.txt) up regulated genes, including:"
head up.genes.txt
echo "$(wc -l down.genes.txt) up regulated genes, including:"
head down.genes.txt

     150 up.genes.txt up regulated genes, including:
NACC2
SNRNP48
FOXF1
CRB2
INHBB
POU3F2
KLHL24
TRPM7
TFEB
NA
     149 down.genes.txt up regulated genes, including:
GALNT17
SCG5
MRPL14
MRPL32
NDRG4
REEP1
SLC1A1
MRPL44
SLC1A6
ASAH2B


## Converting BRD ID to DrugBank ID

The conversion might be difficult.  See [this](https://think-lab.github.io/d/51/) and [this](https://www.biostars.org/p/249949/) threads on old forums.

The following sheet downloaded from the [LINCS Data Portal | Small Molecules](http://lincsportal.ccs.miami.edu/SmallMolecules/catalog) might be useful for the conversion.

In [11]:
pd.read_csv('../../resources/CMap/SmallMolecule_1633383104307.csv')

Unnamed: 0,SM_Name,SM_LINCS_ID,SM_Alternative_Name,SM_PubChem_CID,SM_SMILES_Parent,SM_SMILES_Batch,SM_InChi_Parent,SM_Molecular_Mass,MOLECULAR_FORMULA,SM_ChEBI_ID
0,Pyridoxine,LSM-5324,"Lyphomed,Hexa-Betalin,Pyridoxine HCl,PYRID-20,...",1054.0,Cc1ncc(CO)c(CO)c1O,,InChI=1S/C8H11NO3/c1-5-8(12)7(4-11)6(3-10)2-9-...,169.18,C8H11NO3,"16709, 30961"
1,Vigabatrin,LSM-4959,"CPP-109,Gamma-Vinyl GABA,MDL-71754,RMI-71754,S...",5665.0,NC(CCC(=O)O)C=C,,"InChI=1S/C6H11NO2/c1-2-5(7)3-4-6(8)9/h2,5H,1,3...",129.16,C6H11NO2,63638
2,AC1N8GBV,LSM-16274,"2-Furaldehyde (4-amino-5-methyl-4H-1\,2\,4-tri...",4343351.0,Cc1nnc(NN=Cc2occc2)n1N,,InChI=1S/C8H10N6O/c1-6-11-13-8(14(6)9)12-10-5-...,206.20,C8H10N6O,104911
3,Formononetin,LSM-19000,"Formononetin,NSC-93360",5280378.0,COc1ccc(cc1)C2=COc3cc(O)ccc3C2=O,,InChI=1S/C16H12O4/c1-19-12-5-2-10(3-6-12)14-9-...,268.26,C16H12O4,18088
4,9-Cyclopentyl-6-mercaptopurine,LSM-24332,9-Cyclopentyl-6-mercaptopurine 19487,3034494.0,Sc1ncnc2c1ncn2C3CCCC3,,InChI=1S/C10H12N4S/c15-10-8-9(11-5-12-10)14(6-...,220.29,C10H12N4S,112922
...,...,...,...,...,...,...,...,...,...,...
44323,"3-(2-Thienyl)imidazo[1,5-a]pyridine",LSM-32418,,1266459.0,c1ccn2c(ncc2c1)c3cccs3,,InChI=1S/C11H8N2S/c1-2-6-13-9(4-1)8-12-11(13)1...,200.26,C11H8N2S,120975
44324,3-Aminobenzamide,LSM-4525,"3-Aminobenzamide,3-Amino-benzamide,3-Aminobenz...",1645.0,NC(=O)c1cccc(N)c1,,InChI=1S/C7H8N2O/c8-6-3-1-2-5(4-6)7(9)10/h1-4H...,136.15,C7H8N2O,64042
44325,Niacin,LSM-4676,"Niacin,Niacin TD,Niacor,Niaspan,Niaspan ER,Nia...",938.0,OC(=O)c1cccnc1,,"InChI=1S/C6H5NO2/c8-6(9)5-2-1-3-7-4-5/h1-4H,(H...",123.11,C6H5NO2,15940
44326,Amantadine,LSM-44958,Adamantamine Fumarate,,N[C@@]12C[C@@H]3C[C@@H](C[C@H](C3)C1)C2,,InChI=1S/C10H17N/c11-10-4-7-1-8(5-10)3-9(2-7)6...,151.25,C10H17N,
