Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pychopper? #43

Closed
fburdet opened this issue Oct 14, 2022 · 9 comments
Closed

pychopper? #43

fburdet opened this issue Oct 14, 2022 · 9 comments
Labels
question Further information is requested

Comments

@fburdet
Copy link

fburdet commented Oct 14, 2022

Hello,

I successfully ran IsoQuant using the ONT reads directly from the sequencing facility.

For many other similar programs, it is recommended to run pychopper first. It seems to remove the primers and correct the direction of the reads.

Is it needed for IsoQuant? I seem to get about 40% reads uniquely mapped, this sounds a bit low (as compared to short reads where I have more experience). Is it?

Thanks in advance!

@andrewprzh
Copy link
Collaborator

Dear @fburdet

Personally, I have never used pychopper on my datasets. IsoQuant does not strongly depend on read directions or adapters, it only checks for polyA tails. I also presume that alignment is not significantly affected by adapters. However, if you happen to run IsoQuant on pychopper-processed data, it would be interesting to compare the results.

40% does seem a bit low. I cannot recall exact numbers, but ONT reads do tend to have a lot of secondary alignments, some of which appear to be correct. Does results of IsoQuant seem reasonable though? You can also send me the log if you'd like to.

Best
Andrey

@fburdet
Copy link
Author

fburdet commented Oct 17, 2022 via email

@fburdet
Copy link
Author

fburdet commented Oct 17, 2022 via email

@andrewprzh
Copy link
Collaborator

andrewprzh commented Oct 17, 2022

Dear @fburdet

Unfortunately, in the counts generated by isoquant, there seems to be only the ambiguous and no_feature numbers. Is it maybe possible to add more of the stats that are in the log?

Thank you for the suggestion, will do.

grep unique ***.fq.read_assignments.tsv | wc -l

This counts unique read-to-isoform assignments. Uniquely assigned read may not necessary be uniquely mapped, e.g. if its secondary alignment is mapped to intergenic region. Vise versa, ambiguous assignments may come from uniquely mapped reads, i.e. when the read covers only part of the gene and it is not clear which isoform it is (quite typical for truncated ONT reads). Thus, 40% for uniquely assigned reads seems to be ok.
To count uniquely mapped reads it's best to use original BAM file and samtools.

I still don't see the log (maybe it was not attached though email,but anyway, I don't think it contains a lot of useful information, except maybe read assignment statistics in the end.

Don't hesitate to ask other questions if needed.

Best
Andrey

@fburdet
Copy link
Author

fburdet commented Oct 18, 2022 via email

@andrewprzh
Copy link
Collaborator

andrewprzh commented Oct 20, 2022

@fburdet

The log looks normal, proportions of unique/ambiguous/inconsistent reads seem reasonable for ONT datasets.
Probably, the number of inconsistent reads is slightly higher - what is this organism and whant kind of annoation do you use?

By default, unique and ambiguous reads are used for quantification. You may also use set different quantification strategies (i.e. unique only) if needed.

Best
Andrey

@andrewprzh andrewprzh added the question Further information is requested label Feb 10, 2023
@andrewprzh
Copy link
Collaborator

I'll close this issue for now, please re-open or open a new one if any other questions arise.

Best
Andrey

@zpliu1126
Copy link

zpliu1126 commented Jan 4, 2024

Hi~ Andrey,

This counts unique read-to-isoform assignments. Uniquely assigned read may not necessary be uniquely mapped, e.g. if its secondary alignment is mapped to intergenic region. Vise versa, ambiguous assignments may come from uniquely mapped reads, i.e. when the read covers only part of the gene and it is not clear which isoform it is (quite typical for truncated ONT reads). Thus, 40% for uniquely assigned reads seems to be ok.
To count uniquely mapped reads it's best to use original BAM file and samtools.

Is it possible to output the statistical information of minimap results in the log?

Best
zpliu

@andrewprzh
Copy link
Collaborator

Hi @zpliu1126

I will add it on my TODO list, this could be informative for the user. At the moment you can use, for example, samtools flagstat.

P.S. You can create new issues even for minor question since comments in the closed topics might be missed.

Best
Andrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants