Choice of reference genome #32

ThibauldMichel · 2022-02-14T17:15:39Z

I might have missed the information in the documentation, but I don't think there is a way to input a custom-made reference genome for the alignment step.

If I have well understood, the pipeline use the ENSEMBL-SEQUENCE wrapper to output a .fasta file based on species, build, and release parameters.
https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/reference/ensembl-sequence.html

Is there any option to input a local .fasta file as reference genome?

ArsenaultResearch · 2022-03-18T19:48:42Z

I was able to get around this by placing my genome fasta in the resources folder and renaming it to genome.fasta. This mirrors the output of the ENSEMBL-SEQUENCE wrapper so snakemake recognizes that step as already having been completed. I am still having issues with downstream rules so I'm unsure of how reliable this solution is.

dlaehnemann · 2022-03-21T09:21:39Z

Yes, the workflow downloads the reference from Ensembl based on the config file, because sadly things like the exact naming of chromosomes matters a lot for a bunch of downstream tools, especially tools like picard, GATK and the annotation tools.

So what @ArsenaultResearch describes is a possibilty to get around this automatic download, but you will run into a bunch of downstream parsing issues that you will then probably have to debug by introducing intermediate steps that fix things like the chromosome naming to what these tools expect. Classic bioinformatics, sorry... 🤷

Also, please ensure that you properly document what you are doing, optimally by writing a download rule for the reference you want to use, and optimally with the link that you download the reference from configured in the config.yaml. Otherwise, this reduces the reproducibility of your workflow, because it will be hard for others (or your future self;) to find out where you got your reference from and would require a manual download and deposition in the correct folder with the correct name. Automation is our friend... 🤖

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choice of reference genome #32

Choice of reference genome #32

ThibauldMichel commented Feb 14, 2022 •

edited

ArsenaultResearch commented Mar 18, 2022

dlaehnemann commented Mar 21, 2022

Choice of reference genome #32

Choice of reference genome #32

Comments

ThibauldMichel commented Feb 14, 2022 • edited

ArsenaultResearch commented Mar 18, 2022

dlaehnemann commented Mar 21, 2022

ThibauldMichel commented Feb 14, 2022 •

edited