Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Association output only on SNPs #16

Open
ciaran-campbell opened this issue Nov 16, 2021 · 4 comments
Open

Association output only on SNPs #16

ciaran-campbell opened this issue Nov 16, 2021 · 4 comments

Comments

@ciaran-campbell
Copy link

Hi there,
When I run HATK on one dataset, the association output has only worked for SNPs with rsids.
The following error is displayed and no heatmaps are produced?
Do you know what could be causing this?

Thanks for the help

image

@WansonChoi
Copy link
Owner

WansonChoi commented Dec 11, 2021

@ciaran-campbell

Hi, Thank you for your interest in HATK.

It's highly likely that the bMarkerGenerator step failed.

You should check
(1) your input dictionaries are well prepared with the IMGT2Seq,
(2) your HLA type data goes through the NomenCleaner so that its field resolution is well matched to that of the input dictionaries.

If the problem isn't solved, then could you consider sending your input HLA type and SNP data? (Anonymize samples' label if necessary) I'll try to work on them.

@schen2643
Copy link

schen2643 commented Jan 10, 2022

Hello!

Thanks a lot for your reply. From our log file, it seems IMGT2Seq and NomenCleaner have run successfully (?) as no error was reported.

[IMGT2Seq.py]: Multiprocessing.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA B.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA A.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA C.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DRB1.
Warning: Variants 'HLA_A*01:02' and 'HLA_A*01:01' have the same position.
Warning: Variants 'HLA_A*02:01' and 'HLA_A*01:02' have the same position.
Warning: Variants 'HLA_A*02:05' and 'HLA_A*02:01' have the same position.
2740 more same-position warnings: see log file.

[HLA_Study.py]: IMGT2Seq result : 
< IMGT2Sequence(Newly generated.) >
- HLA Amino Acids : /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_DICTIONARY_AA.hg18.imgt3320
- HLA SNPs : /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_DICTIONARY_SNPS.hg18.imgt3320
- HLA Allele Table : /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_ALLELE_TABLE.imgt3320.hat
- Maptables for heatmap : 
   A   : /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_MAPTABLE_A.hg18.imgt3320.txt
   B   : /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_MAPTABLE_B.hg18.imgt3320.txt
   C   : /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_MAPTABLE_C.hg18.imgt3320.txt
   DPA1: /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_MAPTABLE_DPA1.hg18.imgt3320.txt
   DPB1: /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_MAPTABLE_DPB1.hg18.imgt3320.txt
   DQA1: /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_MAPTABLE_DQA1.hg18.imgt3320.txt
   DQB1: /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_MAPTABLE_DQB1.hg18.imgt3320.txt
   DRB1: /stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/HLA_MAPTABLE_DRB1.hg18.imgt3320.txt


[HLA_Study.py]: Given HPED file('/stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/eur_epi_cc_unrel_qc_chr1-23-updated-chr6+1000G.MHC.HLA_IMPUTATION_OUT.hped') is to be processed by NomenCleaner.

[NomenCleaner.py]: Generating CHPED with Maximum 2 fields HLA alleles.

[bMarkerGenerator.py]: Making Reference Panel for "/stanley/genetics/analysis/epi25/sc/ilae_gwas/hla/eur_hla/hg18/eur_epi_cc_unrel_qc_chr1-23-updated-chr6+1000G_hatk_hg18_isGGE_rm-ibd"

[1] Generating Amino acid(AA)sequences from HLA types.
[2] Encoding Amino acids positions.
[3] Encoding HLA alleles.
[4] Generating DNA(SNPS) sequences from HLA types.
[5] Encoding SNP positions.
[6] Extracting founders.
[7] Merging SNP, HLA, and amino acid datasets.
[8] Performing quality control.
[9] Making reference panel for HLA-AA,SNPS,HLA and Normal variants(SNPs) is Done!

though we did not run IMGT2Seq and NomenCleaner separately - we ran the pipeline following the example in the documentation (see below), which says 'This command will implement (1) IMGT2Seq, (2) NomenCleaner, (3) bMarkerGenerator, (4) HLA_Analyzer(Association Test - logistic regression), (5) Manhattan Plot and (6) Heatmap Plot, which are the minimal components for HLA fine-mapping analysis.'

python3 HATK.py \
    --variants ${pop_resdir}/${pfx}.COPY.LiftDown_hg18 \
    --hped ${pop_resdir}/${pfx_o2}.MHC.HLA_IMPUTATION_OUT.hped \
    --2field \
    --out ${pop_resdir}/hg18/${pfx_o3} \
    --imgt 3320 \
    --hg 18 \
    --imgt-dir example/IMGTHLA3320 \
    --pheno ${phenodir}/eur_epi_cc${sub}_gwas_pheno_covar_${epi_type}.hatk.tsv \
    --pheno-name ${epi_type} \
    --covar ${phenodir}/eur_epi_cc${sub}_gwas_pheno_covar_${epi_type}.hatk.tsv \
    --covar-name SEX,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10 \
    --multiprocess 2

So is there any difference by running the above script vs running IMGT2Seq and NomenCleaner separately? The complete log file is also attached. Would appreciate your suggestions! And we are happy to send anonymized data for further examination. Thanks!

test_epi.log

@WansonChoi
Copy link
Owner

@schen2643

Hi, schen2643. Thank you for your interest in HATK.

No. there is no difference.

The 'bMarkerGenerator' is practically the main process for HLA fine-mapping while it requires some preprocessings for data from the IPD-IMGT/HLA database and raw HLA type data. The IMGT2Seq and NomenCleaner are for this. In other words, Once those required preprocessings are done, users don't have to repeat them and can save time. The IMGT2Seq can be skipped with the '--dict-AA' and '--dict-SNPS' arguments and the NomenCleaner can be skipped with the '--chped' argument, where those arguments essentailly take output from those preprocessing steps.

If Logistic regression result and Manhattan plot were generated well, then there would be no significant problem. In your log file, plotting heatmap for 8 HLA genes failed anyway. If you need help related to this, Please consider sending the anonymized genotype and HLA type data.

@PoojaMiddha
Copy link

@schen2643 & @ciaran-campbell Were you able to identify the issue. I am running into the same issue. And in the association file I see results for SNPs but NA for HLA markers and AAs. Any help is much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants