Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QTLseqR on GBS data #52

Closed
MatteoMartina opened this issue Jun 30, 2022 · 1 comment
Closed

QTLseqR on GBS data #52

MatteoMartina opened this issue Jun 30, 2022 · 1 comment

Comments

@MatteoMartina
Copy link

MatteoMartina commented Jun 30, 2022

Hi Ben,
in a previous experiment, we run your pipeline with really nice results performing wgs on the bulks.
That's not economically feasible in species with really wide genomes (7-9Gb).

What I would like to do now is to use QTLseqR with GBS data, using a catalog from stacks as reference genome.
Is it something you think is doable with your pipeline?

I can try anylize the data again, but last time I tried I ended up with several errors.

Thanks!
Matteo

@bmansfeld
Copy link
Owner

bmansfeld commented Jul 4, 2022

Hey Matteo,
Thanks for using QTLseqr in your work! Happy to hear you were able to get some results.
Yes in theory you can use a reduced representation sequencing system with QTLseqr but you might need to be a little creative.

So first of all my question is do you have GBS data for the bulk as a bulk (ie one pooled DNA sample per bulk and then sequenced on GBS) or do you have GBS data for every single individual in the bulk (ie each of say n=15 individuals was GBSed separately for each of the bulks).

If you have 1 sample per bulk, you should be able to just call SNPs as if you were working with regular WGS and though i've personally never done this (and not sure about the biases and read depth issues with GBS and this approach) it should be able to run like the regular pipeline.

If you have the second case where you are essentially doing an in-silico pooling of individuals that were all GBSed separately then this can be done but there is not really precedence for how to do this statistically. I'm thinking about the best way to develop this perhaps in the future.

But that being said, I have recently done this by manually averaging the SNP-indeces for all the individuals in the Bulk and then setting up the dataframes to work with the QTLseqr scripts to compare the bulks. This worked for my case (single dominant gene) and we were able to map the gene. I did have to use the QTLseqr scripts source code a bit differently and not directly using the commands developed for the package to wrangle the data to work. Maybe this can help you

preprint here: https://www.biorxiv.org/content/10.1101/2022.04.13.487913v1.abstract
scripts here: https://github.com/bmansfeld/CMD2_project/blob/main/CMD2_mapping_and_phenotype_scripts.R

This might help you get were you want to go until I have time to develop this more in depth for the future.

Hope this helps let me know if you have questions hopefully I can help you!
Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants