Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The readlength of case and control files are different #83

Open
yangning116 opened this issue Jan 25, 2021 · 1 comment
Open

The readlength of case and control files are different #83

yangning116 opened this issue Jan 25, 2021 · 1 comment

Comments

@yangning116
Copy link

yangning116 commented Jan 25, 2021

What if the readlength of case and control files are different? how to confirm the readlength?
For example, the readlength of case is 150, while the control is 100

@EricKutschera
Copy link
Contributor

I'm not sure what you mean by "how to confirm the readlength?"

--readLength is used in the prep step to filter out reads from the BAM files that do not have the expected read length. --readLength is used in the post step to calculate the effective length of the isoforms which is used to calculate the IncLevel columns in the final output

There is a parameter, --variable-read-length, which is described in the help text:

Allow reads with lengths that differ from --readLength to be processed. --readLength will still be used to determine IncFormLen and SkipFormLen

--variable-read-length will cause the filter for read length in the prep step not to be applied. If you use that parameter then you can run rmats with your inputs with different read lengths and all reads will be counted

Another option is to run the prep step separately for each input using a different --readLength for each input as appropriate. In that case the filter for read length can be applied correctly for each input in the prep step: https://github.com/Xinglab/rmats-turbo#running-prep-and-post-separately

When using --variable-read-length or when combining different read lengths in the post step, you still need to pick a value for --readLength to be used to calculate the inclusion levels. You could use the average length. Here are some links to discussions about how that impacts the results:
https://groups.google.com/g/rmats-user-group/c/eKgaDfiyrAY/m/Kiry0d8gBQAJ
https://groups.google.com/g/rmats-user-group/c/ZCxjlQfP9ak/m/PaO_skpQAgAJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants