QTG_Finder (version 2.0)
QTG-Finder is a machine-learning pipeline to prioritize causal genes for QTLs identified by linkage mapping. We trained QTG-Finder models for Arabidopsis, rice, sorghum, Setaria viridis based on known causal genes and orthologs of known causal genes, respectively. By utilizing additional information like polymorphisms, function annotation, co-function network, paralog copy number, the models can prioritize causal genes for QTLs identified by QTL mapping.
Authors: Fan Lin, February 2020
Environment: Python 3.7.3
The source code and input files can be found in the 'QTG2_prediction' folder. Running the 'QTG_Finder_predict.py' will require a QTL gene list provided by the user.
- Users can prepare the QTL gene list as a single column table (.csv). See "SV_height_QTL_example.csv" or "AT_Seedsize_QTL_example.csv" for a example.
|Gene1 in QTL1|
|Gene2 in QTL1|
|Gene3 in QTL1|
|Gene1 in QTL2|
|Gene2 in QTL2|
|Gene3 in QTL2|
The pre-calculated models can be downloaded from the following links:
Unzip the pre-calculated models in working directory: ./QTG2_prediction
jar xvf AT_model.dat.zip
- Usage: “QTG_Finder_predict.py -gl QTL_gene_list -sp species_abbreviation"
QTL_gene_list: this is the list of QTL genes to be ranked. See "SV_height_QTL_example.csv" for a example
species_abbreviation: "AT" for Arabidopsis; "OS" for rice; "SB" for sorghum;"SV" for Setaria viridis
As a example,
python QTG_Finder_predict.py -gl SV_height_QTL_example.csv -sp 'SV'
python QTG_Finder_predict.py -h
- “QTL_gene_rank.csv” will be the output file.
For analyses and replications
The source code and input files for cross-validation, feature importance analysis, literature validation and category analysis can be found in the 'QTG2_analysis' folder. The usage of each scripts (.py) is described at the beginning of them.