Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choice of reference genome #32

Open
ThibauldMichel opened this issue Feb 14, 2022 · 2 comments
Open

Choice of reference genome #32

ThibauldMichel opened this issue Feb 14, 2022 · 2 comments

Comments

@ThibauldMichel
Copy link

ThibauldMichel commented Feb 14, 2022

I might have missed the information in the documentation, but I don't think there is a way to input a custom-made reference genome for the alignment step.

If I have well understood, the pipeline use the ENSEMBL-SEQUENCE wrapper to output a .fasta file based on species, build, and release parameters.
https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/reference/ensembl-sequence.html

Is there any option to input a local .fasta file as reference genome?

@ArsenaultResearch
Copy link

I was able to get around this by placing my genome fasta in the resources folder and renaming it to genome.fasta. This mirrors the output of the ENSEMBL-SEQUENCE wrapper so snakemake recognizes that step as already having been completed. I am still having issues with downstream rules so I'm unsure of how reliable this solution is.

@dlaehnemann
Copy link

Yes, the workflow downloads the reference from Ensembl based on the config file, because sadly things like the exact naming of chromosomes matters a lot for a bunch of downstream tools, especially tools like picard, GATK and the annotation tools.

So what @ArsenaultResearch describes is a possibilty to get around this automatic download, but you will run into a bunch of downstream parsing issues that you will then probably have to debug by introducing intermediate steps that fix things like the chromosome naming to what these tools expect. Classic bioinformatics, sorry... 🤷

Also, please ensure that you properly document what you are doing, optimally by writing a download rule for the reference you want to use, and optimally with the link that you download the reference from configured in the config.yaml. Otherwise, this reduces the reproducibility of your workflow, because it will be hard for others (or your future self;) to find out where you got your reference from and would require a manual download and deposition in the correct folder with the correct name. Automation is our friend... 🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants