Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple graphs directory with HLA-LA installation #96

Open
skchronicles opened this issue May 1, 2023 · 1 comment
Open

Decouple graphs directory with HLA-LA installation #96

skchronicles opened this issue May 1, 2023 · 1 comment

Comments

@skchronicles
Copy link

Hello there,

I hope all is going well on your side, and that you are having a wonderful day!

I just built a docker image for HLA-LA. For anyone interested, the Dockerfile is located here:
https://github.com/OpenOmics/genome-seek/blob/main/docker/genome-seek/hla/Dockerfile

And it can be pulled from here, using the command below:

docker pull skchronicles/genome-seek_hla:v0.1.0

After testing out the image, I noticed one particularity due to where I installed the graph reference files:

[Fri Apr 28 16:40:23 2023]
Job 0: Running HLA*LA on '/data/dev/genome-seek/test_subsampled_docker/BAM/HG004.sorted.bam' input file
Reason: Missing output files: /data/dev/genome-seek/test_subsampled_docker/HLA/HG004/sample/hla/R1_bestguess_G.txt


HLA-LA.pl \
    --BAM /data/dev/genome-seek/test_subsampled_docker/BAM/HG004.sorted.bam \
    --graph /data/OpenOmics/references/genome-seek/HLA-LA/graphs/PRG_MHC_GRCh38_withIMGT \
    --sampleID sample \
    --maxThreads 8 \
    --workingDir /data/dev/genome-seek/test_subsampled_docker/HLA/HG004
    
Activating singularity image /data/dev/genome-seek/test_subsampled_docker/.snakemake/singularity/ba473d7fec36d9d46ceceb85188e9863.simg
HLA-LA.pl

Identified paths:
	samtools_bin: /usr/local/bin/samtools
	bwa_bin: /usr/bin/bwa
	java_bin: /usr/bin/java
	picard_sam2fastq_bin: /opt2/picard/2.27.5/picard.jar
	General working directory: /data/dev/genome-seek/test_subsampled_docker/HLA/HG004
	Sample-specific working directory: /data/dev/genome-seek/test_subsampled_docker/HLA/HG004/sample

Graph directory /opt2/hla-la/1.0.3/HLA-LA/src/../graphs//data/OpenOmics/references/genome-seek/HLA-LA/graphs/PRG_MHC_GRCh38_withIMGT not found - valid graph names are subdirectories of the graphs directory in the HLA-LA root at /opt2/hla-la/1.0.3/HLA-LA/src/HLA-LA.pl line 247.

I cannot bundle/install the graph reference files within the docker image due to its size (~29GB). It looks the graphs directory must exist in a specific location relative to the HLA-LA installation. I am just wondering if it would be possible to decouple the graph directory (i.e. your reference files) from the HLA-LA installation. This extra flexibility would be great for docker/singularity users, as it would allow using the tool without any complicated binding. It would also give sysadmins more flexibility regarding how/where to install the tool.

Right now, I am running your tool within a Snakemake pipeline. Snakemake abstracts away some of the commands related to running X tool within a docker container, as long as you provide a bind list up-front. With that being said, it keeps the commands clean and interoperable with other software management tools. This allows any given step/command in your pipeline to run using environment modules, conda, or docker/singularity. I could directly call singularity within my pipeline here and bind the host graphs path to the containers graph path (relative to where HLA-LA is installed); however, the command would only be compatible with docker/singularity (and it would no longer work with environment modules or conda).

With that being said, if it would not be too much trouble, could you please update the --graphs option so it will work with reference files installed in another location? There is no immediate rush. I am just hoping this can be added in the next release or whenever you have some free time.

Please let me know what you think.

Best Regards,
@skchronicles

skchronicles added a commit to OpenOmics/genome-seek that referenced this issue May 1, 2023
@AlexanderDilthey
Copy link
Member

Hi @skchronicles,

Thank you for your note!

HLA-LA.pl has the (undocumented) customGraphDir command line parameter - it seems that this is what you may need?

Best wishes

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants