A novel cancer-centric multi-association prediction model based on multi-source heterogeneous network integrating node intrinsic attribute information and link information.
Please see our manuscript for more details.
- python>=3.7
- pytorch>=1.11
- pyg>=2.0
- numpy>=1.20
- networkx>=2.6
- pandas>=1.3
- scikit-learn>=1.0
- Java environment
data_preprocessing.py
:Obtaining node features and edge indices of the multi-source heterogeneous network.main.py
:Extracting edge features from the multi-source heterogeneous network.circRNA_set.csv
:A complete mapping from circRNA index IDs to circRNA names.drug_set.csv
:A complete mapping from drug index IDs to drug meshID, smile and pubchemID.cancer_set.csv
:A complete mapping from cancer index IDs to cancer meshID, cancer names.circ2cancer_assoMatrix.csv
:circRNA-cancer association matrix.circ2drug_assoMatrix.csv
:circRNA-drug association matrix.cancer2drug_assoMatrix.csv
:cancer-drug association matrix.circRNA_functional_similarity.csv
:circRNA functional similarity scores.circRNA_GIP_similarity.csv
:circRNA Gaussian Interaction Profile kernel similarity scores.circRNA_simfusion.csv
:circRNA fusion similarity scores.drug_structure_similarity.csv
:drug chemical structure similarity scores.drug_GIP_similarity.csv
:drug Gaussian Interaction Profile kernel similarity scores.drug_simfusion.csv
:drug fusion similarity scores.cancer_semantic_similarity.csv
:cancer semantic similarity scores.cancer_GIP_similarity.csv
:cancer Gaussian Interaction Profile kernel similarity scores.cancer_simfusion.csv
:cancer fusion similarity scores.circRNA_cancer_LabEdgEmbs4LTR.txt
:Input data of learning to rank for circRNA-cancer prediction.drug_cancer_LabEdgEmbs4LTR.txt
:Input data of learning to rank for drug-cancer prediction.
Note:1) The header IDs “0-476” in the mentioned dataset can be mapped to specific information about circRNAs, drugs, and cancers by the three mapping filescircRNA_set.csv
,drug_set.csv
, andcancer_set.csv
, respectively. 2) See manuscript Section 2.5 for data format details ofcircRNA_cancer_LabEdgEmbs4LTR.txt
anddrug_cancer_LabEdgEmbs4LTR.txt
.
- Here, we provide a demo for the first application scenario:"associated cancer ranking for novel queries." Depending on the specific prediction task, data for
${train.txt}
and${test.txt}
can be obtained from eitherdata/circRNA-cancer
ordata/drug-cancer
.
train:
test:>> java -jar bin/RankLib.jar -train ${train.txt} -ranker 6 -metric2t NDCG@10 -tree 1000 -shrinkage 0.1 -tc 256 -mls 1 -save model.txt
>> java -jar bin/RankLib.jar -load model.txt -rank ${test.txt} -score score.txt