Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output file sometimes huge, sometimes small, regardless of number of samples??? #107

Open
Jolleboll opened this issue Jun 16, 2023 · 0 comments

Comments

@Jolleboll
Copy link

Jolleboll commented Jun 16, 2023

Hello there!

I am running a home-made pipeline where I start with .idat files that I run through MOCHA: https://github.com/freeseek/mocha

This gives me .bcf files that I parse with my own Python script to create .pfb files and sample input files for PennCNV.

My question is, sometimes the .log file is big, sometimes it's small, and sometimes the .tsv file is very small, sometimes it's HUGE - seemingly regardless of how many samples I used. See the table below. These are all Illumina human exome arrays run at different times in the last 15 or so years.

.tsv size         .log size           sample size
3.2M                15M                955
 19M                93M                800
5.9G                19M                1050
 32M               1.4M                185
5.1M                15M                680
 14G                70M                1325
223M               397M                7637
946M                14M                500

As far as I understand things, a big .tsv file implies many CNV calls, and you wrote in another issue that this implies low quality data - my smallest .tsv file is 19M. Shall I consider this "botched", or "devoid of false positives"?

When I manually look at the biggest and smallest .tsv files, to compare, I notice that the bigger file has enormous numbers of cn=0, and also the average number of "numsnp" is much lower. Is this what you mean with "low quality data"? I know for a fact that some of the arrays were much sparser than others, but I was newly employed and know few details of how these idats came to be.

Thank you so much in advance, I have no one else to ask, everyone trusts me to get this right :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant