Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why my NA12878 test result is not same with NA12878_example_output_G.txt? #32

Closed
PavitaKae opened this issue Nov 22, 2019 · 4 comments
Closed
Labels

Comments

@PavitaKae
Copy link

PavitaKae commented Nov 22, 2019

This is my command.
~/HLA-LA/src/HLA-LA.pl --BAM NA12878.mini.cram --graph PRG_MHC_GRCh38_withIMGT --sampleID NA12878 --maxThreads 40

This is my test result.
R1_bestguess_G.txt

@AlexanderDilthey
Copy link
Member

Hi @PavitaKae, very difficult to tell - looking at the output file you provided, coverage on the class I genes (HLA-A, -B, -C) is very low. This would indicate that either the test file is corrupted, or that something with the read extraction process has gone wrong. Did you modify the reference extraction files in any way? Could you send an md5 of NA12878.mini.cram? And could you capture all of STDOUT and STDERR and post it here?

@PavitaKae
Copy link
Author

This is my MD5sum for NA12878 file -> 45d1769ffed71418571c9a2414465a12
I didn't modify your reference graph, just download and make graph by following manual.
I attach file for .out and .err.

41436.out.txt
41436.err.txt

@AlexanderDilthey
Copy link
Member

There is some issue with read extraction - in your output log, it says processBAM::extractSeeds(): getReadIDs 833136 reads, collected 402762 read IDs., whereas it should say processBAM::extractSeeds(): getReadIDs 13649900 reads, collected 1373415 read IDs..

In your error log, there is a message from Picard: To execute picard run: java -jar $EBROOTPICARD/picard.jar (also, there are some warning messages about the locale that come from Perl, but I don't think these matter too much).

If you go into the working directory for the sample (e.g. HLA-LA/working/NA12878_mini), R_1.fastq and R_2.fastq should both be about 500Mb in size (I would expect them to be smaller on your system), and extraction.bam should be a little bit larger than 310Mb (I would expect this to be the case on your system).

I think that there is some issue with Picard - if you execute the extraction command, i.e. /tarafs/biobank/data/modules/.local/easybuild/software/Miniconda3/4.4.10/envs/noon/bin/picard SamToFastq VALIDATION_STRINGENCY=LENIENT I=/tarafs/biobank/data/home/pkaewpro/proj0015/HLA-LA/working/NA_test/extraction.bam F=/tarafs/biobank/data/home/pkaewpro/proj0015/HLA-LA/working/NA_test/R_1.fastq F2=/tarafs/biobank/data/home/pkaewpro/proj0015/HLA-LA/working/NA_test/R_2.fastq FU=/tarafs/biobank/data/home/pkaewpro/proj0015/HLA-LA/working/NA_test/R_U.fastq 2>&1, manually, do you get an error message?

@PavitaKae
Copy link
Author

Hi, AlexanderDilthey
I back to run again, it look good. Because i choose to install new picard program.
Thank you for your response. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants