Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
drug_domain_mat.txt
drug_domain_mat_case_study.txt
drug_fp_mat.txt
drug_fp_mat_case_study.txt
drug_gene_expr_mat.txt
drug_gene_expr_mat_case_study.txt
drug_gobp_mat.txt
drug_gobp_mat_case_study.txt
drug_pt_mat.txt
mlrsl_all_feature.py

README.md

multi-LRSL

This folder contains the code and data for the paper 'Learning important features from multi-view data to predict drug side effects'.

'mlrsl_all_feature.py' is the python code for the proposed algorithm. The code was tested under the environment of python 3.7.3 + numpy 1.16.3 + pandas 0.24.2 + scipy 1.2.1 + scikit-learn 0.20.0.

501 drugs with chemical structures, target domains, target gene ontology terms (biological process terms only), gene expression profiles and known side effects were used as golden standard dataset. All drugs are indicated by their DrugBank IDs.

'drug_fp_mat.txt' contains the PubChem fingerprints of the drugs. The rows of the data table are the drugs and the columns are pubchem fingerprints. The description of the substructure fingerprints could be found at ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

'drug_domain_mat.txt' contains the protein domain information of the drug targets related to human. The human target proteins of drugs were extracted from DrugBank(https://www.drugbank.ca/). The rows are drugs and the columns are the protein domains. The raw data were obtained from InterPro database(http://www.ebi.ac.uk/interpro/download.html). The description of the protein domains could also be found there.

'drug_gobp_mat.txt' contains the gene ontology terms(biological process terms only) of the drug targets related to human. The rows are drugs and the columns are the GO BP terms. The raw data were obtained from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/HUMAN_9606_idmapping_selected.tab.gz. The description of the GO BP terms could be found at http://geneontology.org/

'drug_gene_expr_mat.txt' contains the gene expression profiles related to the drugs. The rows are drugs and the columns are genes. The raw data were from LINCS L1000 projects(https://clue.io) and two related datasets were downloaded from GEO(GEO accession:GSE70138, GSE92742).The detailed description of these datasets could be found at https://clue.io/GEO-guide.

'drug_pt_mat' contains the side effects of the drugs. The rows are drugs and the columns are side effect cuis. The raw data were obtained from http://sideeffects.embl.de/download/

The other four text files 'drug_fp_mat_case_study.txt', 'drug_domain_mat_case_study.txt', 'drug_gobp_mat_case_study.txt', 'drug_gene_expr_mat_case_study.txt' contain feature matrices of 320 drugs without side effect records in SIDER database. These drugs were used in the case study of the paper.

You can’t perform that action at this time.