IMCDriver

Identifying Driver Genes for Individual Patients through Inductive Matrix Completion

Hardware configuration

RAM: 16 GB

The code requires a good amount of RAM.

Software requirements

Python 3.6
pandas 0.22.0
numpy 1.14.6
scikit-learn 0.23.2

Usage Example

Download files and directories from the repo "IMCDriver",including "data", "data_preprocess.py", "IMCDriver.py"
Unzip the Example.7z to the IMCDriver/data/ and get IMCDriver/data/Example
Set the variable cancer_folder='Example' in "data_preprocess.py" and "IMCDriver.py"
Since we have preprocessed the datasets, users can directly run the command "python IMCDriver.py" in the Terminal to implement IMC to predict driver genes for individuals, or directly run this file in the Pycharm IDE.

We have provided all the prepared files of five cancer datasets, including BRCA, HNSC, LUAD, LUSC, PRAD. If you want to perform IMCDriver with other cancer datasets from TCGA, you should firstly run the command 'python data_preprocess.py' in the Terminal to start the data pre-procession, or directly run the script of "data_preprocess.py" in the Pycharm IDE, which may take several minutes. Then, run the command "python IMCDriver.py" in the Terminal to start the personalized driver gene identification. This may take about an hour to test all the samples. The running time mainly depends on your computer and the number of samples in your dataset. The processing time that each sample takes will be printed in the console to facilitate your estimation of processing time.

Data organization

The directory of data contains the following directories and files

Additional_file5_reliable_interactions.txt: the gene correlation network file.

NCG_known_711.txt: the list of 711 known driver genes.

Example.7z: contains the prepared files of the Example.

BRCA.7z: contains the prepared files of the BRCA dataset.

HNSC.7z: contains the prepared files of the HNSC dataset.

LUAD.7z: contains the prepared files of the LUAD dataset.

LUSC.7z: contains the prepared files of the LUSC dataset.

PRAD.7z: contains the prepared files of the PRAD dataset.

The directory of each cancer dataset is identically organized as follows,

mut_similarity: saving the file of Gaussian interaction profile kernel similarity between mutated genes.

orig_data: saving the RNA-seq.txt and SomaticMutation.txt file downloaded from TCGA by Xena.
 
results: saving the file predicted by IMCDriver consisting of scores of mutated genes of each patient in the cancer dataset.
 
sample_similarity: saving the file of Gaussian interaction profile kernel similarity between samples.

We stronger suggest that the names of files and subfolders generated by the script of data_preprocess.py not to be changed. Unless the names of changed files or subfolders are also changed identically in the script "python IMCDriver.py". The original RNA-seq.txt and SomaticMutation.txt of each cancer dataset are downloaded from TCGA data through Xena.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
IMCDriver.py		IMCDriver.py
README.md		README.md
data_preprocess.py		data_preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMCDriver

Hardware configuration

Software requirements

Usage Example

Data organization

About

Releases

Packages

Languages

NWPU-903PR/IMCDriver

Folders and files

Latest commit

History

Repository files navigation

IMCDriver

Hardware configuration

Software requirements

Usage Example

Data organization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages