Cheminformatics R wrapper code and work flows. There is no actual 'new' science here, just a list of functions for the direct purpose of comparing and clustering drugs by their substructure and CMAP transcriptional response.
This is a 5% project, i.e., just something I am interested in and I'll do what I can to build on this work but it isn't a priority.
Please cite this github repo if this code is used in your work.
pubchem_protein_only.sqlite
available_genes.txt
drugs.txt
CD_signatures_LM_42809x978.gctx
Drugs_metadata.csv
CD_signature_metadata.csv
The L1000 gene transcriptional response to drug code is in drug_clustering.R. However, this file is very much tied into my machine. Until I get around to building this into an API, you'd need to hack this file apart.
You are going to need a Net connection.
Load the R files into the R environment using:
source("CHEMAPI.R")
For a list of the most recent available (this list is old and outdated) functions execute:
listChemMethods()
CHEMIO.R
importSDFFromCID(CIDvector) : returns SDFset
loadCHEMDF(fileName) : returns data.frame
loadCIDFile(fileName) : returns data.frame (drugName, CID)
loadSDFFile(fieName) : returns SDFset
loadClusterMatrix(fileName) : returns matrix
saveCHEMDF(df,fileName)
saveClusterMatrix(simMatrix, fileName)
saveSDFFile(SDFset, fileName)
CHEMAnalysis.R
calculateFMCS(sdfObject1, sdfObject2, au=2, bu=1) : returns an fmcs object
calculateNumAtomsToMCSReference(referenceSDF, ignoreIndex, SDFset, drugNameVector, au=2, bu=1) : returns data.frame for plotting
clusterWithFMCSAromatic(SDFset, au=2, bu=1, overlapCoefficient=TRUE) : returns a matrix of similarities
clusterWithFMCSStatic(SDFset, au=2, bu=1, overlapCoefficient=TRUE) : returns a matrix of similarities
clusterWithFMCSRing(SDFset, au=2, bu=1, overlapCoefficient=TRUE) : returns a matrix of similarities
CHEMDisplay.R
displayClusterDend(simMatrix) : returns dendrogram objecct for plot()
displayClusterHeatMap(simMatrix, CIDvector, title=NULL) : returns heatmap object for plot()
displayFMCSSize(df, fileNameJPEG) : returns ggplot object
getNCBIFractionActivity(drugTargetsMat, title=NULL) : returns ggplot object
CHEMBioassay.R
getBioassayDatabase(DBLocation) : returns a db object, don't forget to close it
getEnsemblProteinDetails(uniProtIDs, attributeCharVector, filtersCharVector, drugTargets) : returns list of protein details
getProteinDrugTargets(db, CID) : returns a dataframe (within a list) showing the drug targets (protein), the number of total assay screens and the total fraction of activity
getUniProtIDs(NCBIMatrix, db) : returns a list of uniprot IDs
clusterCompoundsByActivityProfile(db, compoundCIDs) : returns hcluster object
-ChemmineR
-BioassayR
-BiomaRt
-cmapR
-PubChem
-Ensembl
-UniProt
-CLUE
-Drugbank.ca