Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add hart2015 essentiality suite #563

Merged
merged 14 commits into from
Jun 19, 2023

Conversation

haowang-bioinfo
Copy link
Member

@haowang-bioinfo haowang-bioinfo commented May 11, 2023

Main improvements in this PR:

This PR provides infrastructure of essentiality test suite, as proposed in #390, using Hart2015 datasets with following components:

  • Hart2015_RNAseq.txt: RNA-seq sequences of Hart2015 cell lines (GEO: GSE75189)
  • Hart2015_TableS2.xlsx: Calcuated Bayes factors for CRISPR targeted genes
  • getTaskEssentialGenes: Identify genes essential for different tasks in different tINIT models
  • estimateEssentialGenes: Generate tINIT models and estimate essential genes
  • evaluateHart2015Essentiality: Evaluate and compare Hart2015 experimental fitness genes with predicted results

I hereby confirm that I have:

  • Tested my code on my own computer for running the model
  • Selected develop as a target branch
  • Any removed reactions and metabolites have been moved to the corresponding deprecated identifier lists

@mihai-sysbio mihai-sysbio changed the title feat: add hart2015essentiality suite feat: add hart2015 essentiality suite May 11, 2023
@mihai-sysbio
Copy link
Member

This is awesome! A quick thought: can the xlsx file be avoided?

@haowang-bioinfo
Copy link
Member Author

A quick thought: can the xlsx file be avoided?

no, not at the moment - would like to retain the original Hart2015 data file intact

@mihai-sysbio
Copy link
Member

retain the original Hart2015 data file intact

I understand - it makes a lot of sense. And since this file will not be modified, it's not going to continuously enlarge the repository.

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented May 11, 2023

I understand - it makes a lot of sense. And since this file will not be modified, it's not going to continuously enlarge the repository.

thanks - yes there's no need to change the content, any modification, such as converting into a plain-text version (which does make sense), would confuse people though

@mihai-sysbio
Copy link
Member

mihai-sysbio commented May 12, 2023

infrastructure of essentiality test suite

Is the aim to run the essentiality suite manually in conjunction with every release?

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 13, 2023

Is the aim to run the essentiality suite manually in conjunction with every release?

yes, this test can be applied to evaluate each release among others

@haowang-bioinfo haowang-bioinfo marked this pull request as ready for review June 13, 2023 17:54
@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 13, 2023

Updated essentiality evaluation using combined (all) Hart2015 datasets:

version TP TN FP FN accuracy sensitivity specificity F1 MCC
v1.12 40 2333 175 77 0.904000000000000 0.341880341880342 0.930223285486443 0.240963855421687 0.204768560393159
v1.13 40 2334 174 77 0.904380952380952 0.341880341880342 0.930622009569378 0.241691842900302 0.205504558241293
v1.14 40 2334 175 77 0.904036557501904 0.341880341880342 0.930251096054205 0.240963855421687 0.204787829413435
v1.15 40 2342 168 77 0.906737723639132 0.341880341880342 0.933067729083665 0.246153846153846 0.210053956889428
develop 41 2317 194 76 0.897260273972603 0.350427350427350 0.922739944245321 0.232954545454545 0.197442075842213

The results show that the accumulated curations since v1.15 have positive effect to TP, FN, but adversely affect TN and FP (by which PRs?)

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 14, 2023

now it might be better to spin out a new release, to present our recent work for community inspection.

it would also be good at some point to look into which PRs led to increased FP, probably by decreasing TN

@mihai-sysbio
Copy link
Member

to present our recent work for community inspection.

To me, this is already done in the develop branch.

it would also be good at some point to look into which PRs led to increased FP, probably by decreasing TN

Why "at some point" and not now when the changes are more fresh?

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 15, 2023

to present our recent work for community inspection.

To me, this is already done in the develop branch.

the inspection is best achieved by using the model, and community access is mainly through the releases, instead of the develop branch that involves so far a small group of people. To me, I have limited knowledge to affirm every single change. The best practice probably is to ensure release early, release often in a radically transparent way

it would also be good at some point to look into which PRs led to increased FP, probably by decreasing TN

Why "at some point" and not now when the changes are more fresh?

sure go ahead please

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 15, 2023

Code example for running essentiality analysis with Hart2015 datasets:

% load model and essential tasks
load('Human-GEM.mat');  % or: ihuman = importYaml('Human-GEM.yml');
taskStruct = parseTaskList('metabolicTasks_Essential.txt');

% generate tINIT models and estimate essential genes
eGenes = estimateEssentialGenes(ihuman, 'Hart2015_RNAseq.txt', taskStruct);

% compare model predictions with experimental data
results = evaluateHart2015Essentiality(eGenes);

it may take 2-4 hours or more, depending on computer's configuration, to complete this analysis on a laptop or desktop.

Copy link
Collaborator

@JonathanRob JonathanRob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice, good work @haowang-bioinfo. I agree it's a bit unfortunate with the large-ish .xlsx file, but as @mihai-sysbio says, it won't need to be changed.

@mihai-sysbio
Copy link
Member

Code example for running essentiality analysis with Hart2015 datasets:
it may take 2-4 hours or more, depending on computer's configuration, to complete this analysis on a laptop or desktop.

Shouldn't we add this then to an action that would run for every pull request from develop to main?

@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 18, 2023

Shouldn't we add this then to an action that would run for every pull request from develop to main?

not sure if this is suitable for GH action, because the computation load is very heavy. Probably we would know better after solving #635

@feiranl
Copy link
Collaborator

feiranl commented Jun 19, 2023

The previous steps works fine, the last one encounters an error

>> results = evaluateHart2015Essentiality(eGenes);
Unrecognized function or variable 'adjust_pvalues'.

Error in evaluateHart2015Essentiality (line 129)
PenrAdj = adjust_pvalues(Penr,'Benjamini');

feiranl and others added 3 commits June 19, 2023 10:54
@haowang-bioinfo
Copy link
Member Author

haowang-bioinfo commented Jun 19, 2023

the last one encounters an error

ah, yes - now adjust_pvalues was added from this repo for convenient use

@feiranl
Copy link
Collaborator

feiranl commented Jun 19, 2023

Successfully run and got this result

'cellLine' 'TP' 'TN' 'FP' 'FN' 'accuracy' 'sensitivity' 'specificity' 'F1' 'MCC' 'Penr' 'logPenr' 'PenrAdj' 'logPenrAdj'
'DLD1' 95 2109 143 223 0.85758755 0.29874214 0.93650089 0.34172662 0.26721557 1.04E-30 29.9838919 3.11E-30 29.5067706
'GBM' 85 2081 153 250 0.84312962 0.25373134 0.93151298 0.29668412 0.21515393 1.78E-21 20.7503043 2.67E-21 20.5742131
'HCT116' 113 2137 130 246 0.85681645 0.31476323 0.94265549 0.37541528 0.3051743 3.02E-40 39.519913 1.81E-39 38.7417618
'HELA' 86 2182 157 202 0.86334222 0.29861111 0.9328773 0.32391714 0.24962285 2.74E-27 26.5616613 5.49E-27 26.2606313
'RPE1' 63 2115 175 216 0.8478007 0.22580645 0.92358079 0.24371373 0.16031484 6.69E-13 12.1747792 6.69E-13 12.1747792
'all' 41 2308 202 76 0.89417587 0.35042735 0.91952191 0.22777778 0.19220101 2.19E-15 14.6589606 2.63E-15 14.5797793

Copy link
Collaborator

@feiranl feiranl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@haowang-bioinfo haowang-bioinfo merged commit fdd7b5c into develop Jun 19, 2023
@haowang-bioinfo haowang-bioinfo mentioned this pull request Jun 19, 2023
@mihai-sysbio mihai-sysbio deleted the feat/addHart2015essentialitySuite branch July 6, 2023 14:14
@mihai-sysbio mihai-sysbio mentioned this pull request Jul 7, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants