Skip to content

acnash/ChemInformatics

Repository files navigation

ChemInformatics

Cheminformatics R wrapper code and work flows. There is no actual 'new' science here, just a list of functions for the direct purpose of comparing and clustering drugs by their substructure and CMAP transcriptional response.

This is a 5% project, i.e., just something I am interested in and I'll do what I can to build on this work but it isn't a priority.

Please cite this github repo if this code is used in your work.

Files you will need

pubchem_protein_only.sqlite
available_genes.txt
drugs.txt
CD_signatures_LM_42809x978.gctx
Drugs_metadata.csv
CD_signature_metadata.csv

IMPORTANT

The L1000 gene transcriptional response to drug code is in drug_clustering.R. However, this file is very much tied into my machine. Until I get around to building this into an API, you'd need to hack this file apart.

List of wrapper functions and load into the R environment

You are going to need a Net connection.
Load the R files into the R environment using:
source("CHEMAPI.R")

For a list of the most recent available (this list is old and outdated) functions execute:
listChemMethods()

CHEMIO.R
importSDFFromCID(CIDvector) : returns SDFset
loadCHEMDF(fileName) : returns data.frame
loadCIDFile(fileName) : returns data.frame (drugName, CID)
loadSDFFile(fieName) : returns SDFset
loadClusterMatrix(fileName) : returns matrix
saveCHEMDF(df,fileName)
saveClusterMatrix(simMatrix, fileName)
saveSDFFile(SDFset, fileName)

CHEMAnalysis.R
calculateFMCS(sdfObject1, sdfObject2, au=2, bu=1) : returns an fmcs object
calculateNumAtomsToMCSReference(referenceSDF, ignoreIndex, SDFset, drugNameVector, au=2, bu=1) : returns data.frame for plotting
clusterWithFMCSAromatic(SDFset, au=2, bu=1, overlapCoefficient=TRUE) : returns a matrix of similarities
clusterWithFMCSStatic(SDFset, au=2, bu=1, overlapCoefficient=TRUE) : returns a matrix of similarities
clusterWithFMCSRing(SDFset, au=2, bu=1, overlapCoefficient=TRUE) : returns a matrix of similarities

CHEMDisplay.R
displayClusterDend(simMatrix) : returns dendrogram objecct for plot()
displayClusterHeatMap(simMatrix, CIDvector, title=NULL) : returns heatmap object for plot()
displayFMCSSize(df, fileNameJPEG) : returns ggplot object
getNCBIFractionActivity(drugTargetsMat, title=NULL) : returns ggplot object

CHEMBioassay.R
getBioassayDatabase(DBLocation) : returns a db object, don't forget to close it
getEnsemblProteinDetails(uniProtIDs, attributeCharVector, filtersCharVector, drugTargets) : returns list of protein details
getProteinDrugTargets(db, CID) : returns a dataframe (within a list) showing the drug targets (protein), the number of total assay screens and the total fraction of activity
getUniProtIDs(NCBIMatrix, db) : returns a list of uniprot IDs
clusterCompoundsByActivityProfile(db, compoundCIDs) : returns hcluster object

Tools

-ChemmineR

-BioassayR

-BiomaRt

-cmapR

DB Access

-PubChem

-Ensembl

-UniProt

-CLUE

-Drugbank.ca

About

Cheminformatics R wrapper code and work flows.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages