Final_Method

calculate_corr.m

Summay

Calculate Pearson Correlations between the CCLE-EXP gene and PR gene.

Input

1. Achilles_v2.11_training_phase3.gct.noheader

The original dataset downloaded here with the header (i.e. the first and second line) deleted.

2. CCLE_expression_training_phase3.gct.noheader

The original dataset downloaded here with the header (i.e. the first and second line) deleted.

Output

1. pearson_matrix_all.txt

A matrix of Pearson scores between PR genes and CCLE-EXP genes.

generate_GE_top_100.pl

Summary

Obtain the global ranking informatiom of each feature, which would be used to calculate global scores.

Input

1. prioritized_gene_list_phase3.txt

A list of genes whose essentiality need to be predicted. Download

2. pearson_matrix_all.txt

A matrix of Pearson scores between PR genes and CCLE-EXP genes. Generated by calculate_corr.m

Output

1.feature_list.txt

A two-column table containing feature names and how many times this feature's local score was top 10. This table would be used to calculated global scores.

get_top_prior_2300.pl

Summary

Use local scores and global rankings to calculate final correlation score. Then output the name of top 9 expression features and 1 copy number feature for each PR gene. One commandline parameter required. We used 0.7 in this project. Example: perl get_top_prior_2300.pl 0.7

Input

1. feature_list.txt

A two-column table containing feature names and how many times this feature's local score was top 10. Generated by generate_GE_top_100.pl.

2. CCLE_copynumber_training_phase3.gct

Unprocess copy number data. Download

3. pearson_matrix_all.txt

A matrix of Pearson scores between PR genes and CCLE-EXP genes. Generated by calculate_corr.m

Output

1. GE_train_top_10

A table of the name of the 10 predictive features of each PR gene.

extract_value_svm.pl

Summary

Generate formated SVM input file for training dataset.

Input

1. CCLE_expression_training_phase3.gct

Unprocess gene expression data. Download

2. CCLE_copynumber_training_phase3.gct

Unprocess copy number data. Download

3. GE_train_top_10

Generated by get_top_prior_2300.pl

4. prioritized_gene_list_phase3.txt

A list of genes whose essentiality need to be predicted. Download

5. Achilles_v2.11_training_phase3.gct.scaled.pos

Achilles scores scaled by min and max. See main text for more information.

Output

1. {GENE}.train.input

This is the SVM input file for training dataset.

extract_value_svm_test.pl

Summary

Similar to extract_value_svm.pl, but generate SVM input files for testing data.

Input

Same as extract_value_svm.pl.

Output

1. {GENE}.test.input

This is the SVM input file for testing dataset.

test_svm_c.pl

Summay

Use SVM to do linear regression and perform prediction on testing dataset. One commandline parameter required. We used 0.005 in this project. Example: perl test_svm_c.pl 0.005

Input

1. {GENE}.train.input

SVM input files for training data. Generated by extract_value_svm.pl.

2. {GENE}.test.input

SVM input files for testing data. Generated by extract_value_svm_test.pl.

Output

1. {GENE}.model.input

The model for a specific gene.

2. {GENE}.out.input

The predicted essenciality score of a specific gene in testing cell lines.

Cross_Validation

CV_alpha

This folder contains scripts for 5-fold cross-validation testing alternative alpha values.

CV_different_regression_methods

This folder contains scripts for 5-fold cross-validation testing different regression algorithms.

CV_3-30_features

This folder contains scripts for 5-fold cross-validation testing a bunch of alternative numbers (3,4,5...30) of features used for prediction.

CV_no_copynumber

This folder contains scripts for 5-fold cross-validation testing the performance of using only top 10 expression features as the 10 predictive features.

CV_rank_only_cn

This folder contains scripts for 5-fold cross-validation testing the performance of using only top 10 copy numbers as the 10 predictive features.

CV_rank_exp_cn

This folder contains scripts for 5-fold cross-validation testing the performance of putting copy number profile and expression profile together and use the top 10 in the mixed features as the 10 predictive features.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Cross_Validation		Cross_Validation
Final_Method		Final_Method
README.md		README.md

GuanLab/DREAM-Gene-Essentiality-Challenge

Folders and files

Latest commit

History

Repository files navigation

Final_Method

calculate_corr.m

Summay

Input

1. Achilles_v2.11_training_phase3.gct.noheader

2. CCLE_expression_training_phase3.gct.noheader

Output

1. pearson_matrix_all.txt

generate_GE_top_100.pl

Summary

Input

1. prioritized_gene_list_phase3.txt

2. pearson_matrix_all.txt

Output

1.feature_list.txt

get_top_prior_2300.pl

Summary

Input

1. feature_list.txt

2. CCLE_copynumber_training_phase3.gct

3. pearson_matrix_all.txt

Output

1. GE_train_top_10

extract_value_svm.pl

Summary

Input

1. CCLE_expression_training_phase3.gct

2. CCLE_copynumber_training_phase3.gct

3. GE_train_top_10

4. prioritized_gene_list_phase3.txt

5. Achilles_v2.11_training_phase3.gct.scaled.pos

Output

1. {GENE}.train.input

extract_value_svm_test.pl

Summary

Input

Output

1. {GENE}.test.input

test_svm_c.pl

Summay

Input

1. {GENE}.train.input

2. {GENE}.test.input

Output

1. {GENE}.model.input

2. {GENE}.out.input

Cross_Validation

CV_alpha

CV_different_regression_methods

CV_3-30_features

CV_no_copynumber

CV_rank_only_cn

CV_rank_exp_cn

About

Resources

Stars

Watchers

Forks

Languages