Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run rMATS for 3 samples/conditions at once #96

Closed
mariakondili opened this issue Mar 5, 2021 · 5 comments
Closed

Run rMATS for 3 samples/conditions at once #96

mariakondili opened this issue Mar 5, 2021 · 5 comments

Comments

@mariakondili
Copy link

mariakondili commented Mar 5, 2021

Hello! This is not an issue but more of a question on functionalities or rMATS.
I d like to know if it's possible to run for comparing the splicing events of WT vs Treatment1 vs Treatment2, all at once,
and obtain a table with Incl_Level_WT, Incl_Level_Treatment1, Incl_Level_Treatment_2, and the 2 respective Incl_Differences.
Something like having a --b3 option, for .bam files.

I saw that when I run separately WT-vs-Treat1, WT-vs-Treat2 , the Incl_Level_WT, of these two tables is not same for some events.
Moreover, I have to do data mining to find the genes and events that are common on 1st and 2nd comparison, but we are more interested in having the values of splicing for each exon along all 3 conditions.
Is there a way to run rMATS with existing options in a way that I obtain the table I want , by using only --b1, --b2 ?
e.g --b1 = WT_1,WT_2, WT_3 , --b2 Treat1a, Treat1b, Treat1c ==>and then,
the results of these over --b2 Treat2a, Treat2b, Treat2c ?
Thanks in advance for any suggestion

@EricKutschera
Copy link
Contributor

You can first process all of the input bams with one or more --task prep steps. Then combine all of the prep output into a single --task post step. Finally do two separate runs of --task stat, one for WT vs Treatment1 and another for WT vs Treatment 2. You'll get a set of output files for each run of --task stat, but they will be based on the same set of event definitions from the --task post step so you can match up the rows easily

Here is the section of the README that is related to this: https://github.com/Xinglab/rmats-turbo/tree/v4.1.1#running-the-statistical-model-separately

@mariakondili
Copy link
Author

Thanks for the help ! I got it ! For the record, briefly this is what I ran:
#PREP_step_1 : python rmats.py --b1 ${group1} --b2 ${group2} --tmp ${outDir_prep}/tmp_1/ --task prep
#PREP_step_2: python rmats.py --b1 ${group1} --b2 ${group3} --tmp ${outDir_prep}/tmp_2/ --task prep
#POST_step : python rmats.py --b1 ${group123} --tmp ${outDir_prep}/combo_tmp/ --task post
#STATS_step_1 :
python prepare_stat_inputs.py --new-output-dir ${outDir_post}/out_gr1_gr2 --old-output-dir ${outDir_post} --group-1-indices 0,1,2 --group-2-indices 3,4,5
python rmats.py --od ${outDir_post}/out_gr1_gr2 --tmp ${outDir_post}/out_gr1_gr2/tmp --task stat

#STATS_step_2 :
python prepare_stat_inputs.py --new-output-dir ${outDir_post}/out_gr1_gr3 --old-output-dir ${outDir_post} --group-1-indices 0,1,2 --group-2-indices 6,7,8
python rmats.py --od ${outDir_post}/out_gr1_gr3 --tmp ${outDir_post}/out_gr1_gr3/tmp --task stat

The ID column for the post is same in Stat-step so, we can really nicely follow up the events and their values.

@macsalvin
Copy link

Hi @EricKutschera,
I would like to compare three groups (as you mention here ), in my case KT vs KV and KT vs QT.
As a final result I would like to have the total number of SE events (I mention just SE as an example) for KT and then the differential SE events for both KT vs KV and KT vs QT. As you can see in Fig.1 and Fig.2 the rows for KT are different for the two comparisons and so the sum for getting the total events (Why there are different rows?). How can I get the total SE events for KT? Should I get it before running the statistical part?

Fig.1 (row 7 is missing)
Screen Shot 2021-05-24 at 4 35 05 PM

Fig.2 (rows 0,4,5,6 are missing)
Screen Shot 2021-05-24 at 4 34 56 PM

Here all my steps:

Running prep and post separately

Running prep
./run_rmats --b1 ALLSAMPLE.txt --gtf Mus_musculus.GRCm38.92.chr.gtf --readLength 150 --nthread 28 --od output --tmp tmp_output_prep --task prep

Running post
./run_rmats --b1 ALLSAMPLE.txt --gtf Mus_musculus.GRCm38.92.chr.gtf --readLength 150 --nthread 28 --od outputPOST --tmp tmp_output_prep --statoff --task post

Extract information for a specific comparison

KTKV
python rMATS_P/prepare_stat_inputs.py --new-output-dir outputKTKV --old-output-dir outputPOST --group-1-indices 0,1,2,3,4 --group-2-indices 5,6,7
KTQT
python rMATS_P/prepare_stat_inputs.py --new-output-dir outputKTQT --old-output-dir outputPOST --group-1-indices 0,1,2,3,4 --group-2-indices 8,9,10,11,12,13

Running the statistical part

KTKV
./run_rmats --od KTKV --tmp tmpKTKV --task stat

QTQV
./run_rmats --od QTQV --tmp tmpQTQV --task stat

Thank you so much for your help

@EricKutschera
Copy link
Contributor

When the statistical model is run (--task stat), the count files are first filtered to remove rows where there is not enough information (zero counts). Here is the code: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rmats.py#L322

It filters out a row unless:

  • group 1 has at least one count (either inclusion or skipping)
  • group 2 has at least one count (either inclusion or skipping)
  • at least one of the two groups has at least one inclusion count
  • at least one of the two groups has at least one skipping count

The count files were not filtered in the post step since you ran that step with --statoff. You can look at the count files in outputPOST to see the rows that were filtered out in --task stat

@macsalvin
Copy link

Thank you very much @EricKutschera

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants