Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All SNPs are being dropped due to non-positive-semi-definiteness of omega #23

Closed
kapoormanav opened this issue Jun 7, 2021 · 30 comments
Assignees
Labels
bug Something isn't working

Comments

@kapoormanav
Copy link

Hi Grant and Jon,

All SNPs from my analysis are being dropped due to "non-positive-semi-definiteness of omega". Do you have any suggestions to fix this problem.

Thanks,

Manav

#########################################
Running LD Score regression.
Regression coefficients (LD):
[[-1.45883266e-03 -1.27498494e-03]
[-1.27498494e-03 -9.65617573e-05]]
Regression coefficients (Intercept):
[[1.08435618 1.04543503]
[1.04543503 1.00893974]]
Regression coefficients (SE^2):
[[-0.04235656 -2.6532606 ]
[-2.6532606 -3.08265681]]

Creating omega and sigma matrices.
Average Omega (including dropped slices) =
[[-0.01087047 -0.00785719]
[-0.00785719 -0.00084023]]
Average Sigma (including dropped slices) =
[[1.08297402 1.04543503]
[1.04543503 1.00471217]]

Adjusted 0 SNPs to make omega positive semi-definite.

Dropped 4730 SNPs due to non-positive-semi-definiteness of omega.
Dropped 4730 SNPs due to non-positive-definiteness of sigma.
Dropped 4730 total SNPs due to non-positive-(semi)-definiteness of omega / sigma.

Running main MAMA method.

Preparing results for output.

    Population 0: ('SAS', 'T2D')

/mnt/efs/users/manav.kapoor/mama-mainline/mama_pipeline.py:552: RuntimeWarning: Mean of empty slice.
mean_chi_2 = np.square(new_df[Z_COL].to_numpy()).mean()
ERROR: Received Numpy error: invalid value (8)
Mean Chi^2 for ('SAS', 'T2D') = nan
Population 1: ('EUR', 'T2D')
ERROR: Received Numpy error: invalid value (8)
Mean Chi^2 for ('EUR', 'T2D') = nan

@kapoormanav kapoormanav added the bug Something isn't working label Jun 7, 2021
@JonJala
Copy link
Owner

JonJala commented Jun 7, 2021

If there are negative diagonal elements in the LD score regression coefficients, that's going to torpedo your results. That might happen with a very small sample size and/or mean chi squared < 1. Mind including your full log with the logging turned up so that we can have a little more information?

@kapoormanav
Copy link
Author

2021-06-07 14:07:59,337

Reading in and running QC on LD Scores
2021-06-07 14:07:59,337 Reading in LD Scores file: ./GGC_Freeze_Three_merged_with_ukb_500_random_eur_exc_mhc_ldsc.1.l2.ldscore.gz
2021-06-07 14:07:59,568

Reading in summary statistics.
2021-06-07 14:07:59,568

2021-06-07 14:07:59,568 Reading in ('SAS', 'T2D') sumstats file: ./type_2_diabetes_UKB_450_SAS.txt
2021-06-07 14:08:13,402
Running QC on ('SAS', 'T2D') summary statistics
2021-06-07 14:08:13,403
Initial number of SNPs / rows = 1402142

2021-06-07 14:08:18,965 Filtered out 0 SNPs with "NO NAN" (Filters out SNPs with any NaN values in required columns {'FREQ', 'SNP', 'A2', 'BETA', 'CHR', 'BP', 'A1', 'SE', 'P'})
2021-06-07 14:08:18,965 Filtered out 775365 SNPs with "FREQ BOUNDS" (Filters out SNPs with FREQ values outside of [0.01, 0.99])
2021-06-07 14:08:18,965 Filtered out 0 SNPs with "SE BOUNDS" (Filters out SNPs with non-positive SE values)
2021-06-07 14:08:18,965 Filtered out 7193 SNPs with "CHR VALUES" (Filters out SNPs with listed chromosomes not in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'])
2021-06-07 14:08:18,965 Filtered out 0 SNPs with "DUPLICATE ALLELE SNPS" (Filters out SNPs with major allele = minor allele)
2021-06-07 14:08:18,965 Filtered out 148443 SNPs with "PALINDROMIC SNPS" (Filters out SNPs where major / minor alleles are a base pair)
2021-06-07 14:08:18,965 Filtered out 77665 SNPs with "INVALID ALLELES" (Filters out SNPs with alleles not in {'G', 'A', 'T', 'C'})
2021-06-07 14:08:18,965 Filtered out 0 SNPs with "NEGATIVE GWAS P" (Filters out SNPs with negative GWAS P values)
2021-06-07 14:08:18,967

Filtered out 845055 SNPs in total (as the union of drops, this may be less than the total of all the per-filter drops)
2021-06-07 14:08:18,967 Additionally dropped 0 duplicate SNPs
2021-06-07 14:08:18,992

2021-06-07 14:08:18,992 Reading in ('EUR', 'T2D') sumstats file: ./EUR_T2d.txt
2021-06-07 14:08:22,889
Running QC on ('EUR', 'T2D') summary statistics
2021-06-07 14:08:22,890
Initial number of SNPs / rows = 383358

2021-06-07 14:08:24,666 Filtered out 0 SNPs with "NO NAN" (Filters out SNPs with any NaN values in required columns {'FREQ', 'SNP', 'A2', 'BETA', 'CHR', 'BP', 'A1', 'SE', 'P'})
2021-06-07 14:08:24,666 Filtered out 0 SNPs with "FREQ BOUNDS" (Filters out SNPs with FREQ values outside of [0.01, 0.99])
2021-06-07 14:08:24,666 Filtered out 0 SNPs with "SE BOUNDS" (Filters out SNPs with non-positive SE values)
2021-06-07 14:08:24,666 Filtered out 6 SNPs with "CHR VALUES" (Filters out SNPs with listed chromosomes not in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'])
2021-06-07 14:08:24,666 Filtered out 0 SNPs with "DUPLICATE ALLELE SNPS" (Filters out SNPs with major allele = minor allele)
2021-06-07 14:08:24,666 Filtered out 27323 SNPs with "PALINDROMIC SNPS" (Filters out SNPs where major / minor alleles are a base pair)
2021-06-07 14:08:24,666 Filtered out 5697 SNPs with "INVALID ALLELES" (Filters out SNPs with alleles not in {'G', 'A', 'T', 'C'})
2021-06-07 14:08:24,666 Filtered out 0 SNPs with "NEGATIVE GWAS P" (Filters out SNPs with negative GWAS P values)
2021-06-07 14:08:24,667

Filtered out 33022 SNPs in total (as the union of drops, this may be less than the total of all the per-filter drops)
2021-06-07 14:08:24,667 Additionally dropped 0 duplicate SNPs
2021-06-07 14:08:24,780

Number of SNPS in initial intersection of all sources: 9350
2021-06-07 14:08:25,947
Standardizing reference alleles in summary statistics.
2021-06-07 14:08:25,986 Standardized to population: ('SAS', 'T2D')
2021-06-07 14:08:25,986 Dropped 0 SNPs during reference allele standardization.
2021-06-07 14:08:25,993

Running LD Score regression.
2021-06-07 14:08:25,995 Regression coefficients (LD):
[[-0.00218478 -0.00255036]
[-0.00255036 -0.00026218]]
2021-06-07 14:08:25,996 Regression coefficients (Intercept):
[[1.05870677 1.03990404]
[1.03990404 1.01028588]]
2021-06-07 14:08:25,996 Regression coefficients (SE^2):
[[ 0.55252289 -1.37263668]
[-1.37263668 -3.60841569]]
2021-06-07 14:08:25,996

Creating omega and sigma matrices.
2021-06-07 14:08:26,274 Average Omega (including dropped slices) =
[[-0.01974345 -0.01886878]
[-0.01886878 -0.00278857]]
2021-06-07 14:08:26,275 Average Sigma (including dropped slices) =
[[1.07551735 1.03990404]
[1.03990404 1.00548116]]
2021-06-07 14:08:26,275
Adjusted 0 SNPs to make omega positive semi-definite.
2021-06-07 14:08:26,275
Dropped 9350 SNPs due to non-positive-semi-definiteness of omega.
2021-06-07 14:08:26,275 Dropped 7118 SNPs due to non-positive-definiteness of sigma.
2021-06-07 14:08:26,275 Dropped 9350 total SNPs due to non-positive-(semi)-definiteness of omega / sigma.
2021-06-07 14:08:26,275

Running main MAMA method.
2021-06-07 14:08:26,372
Preparing results for output.

2021-06-07 14:08:26,372 Population 0: ('SAS', 'T2D')
2021-06-07 14:08:26,410 Received Numpy error: invalid value (8)
2021-06-07 14:08:26,410 Mean Chi^2 for ('SAS', 'T2D') = nan
2021-06-07 14:08:26,410 Population 1: ('EUR', 'T2D')
2021-06-07 14:08:26,443 Received Numpy error: invalid value (8)
2021-06-07 14:08:26,443 Mean Chi^2 for ('EUR', 'T2D') = nan
2021-06-07 14:08:26,450
Final SNP count = 0
2021-06-07 14:08:26,488 Writing results to disk.
2021-06-07 14:08:26,488 ./T2D_MAMA_SAS_T2D.res
2021-06-07 14:08:26,517 ./T2D_MAMA_EUR_T2D.res
2021-06-07 14:08:26,542
Execution complete.

@JonJala
Copy link
Owner

JonJala commented Jun 7, 2021

It looks like you're losing a ton of SNPs when the data sets are being intersected / merged (it gets pared down to less than 10k). Are you expecting such a small overlap? Any chance anything happened to the rsID columns of your data or that they're somehow formatted very differently from each other or something like that?>

@ggoldman1
Copy link
Collaborator

On top of what Jon said, it looks like 50% of your SAS SNPs are being dropped due to allele frequency. You can use the --freq-bounds MIN MAX flag to change that, if you want. Only about 9% (33,022 / 383,358) of your EUR SNPs made it through QC. You can loosen the QC that MAMA applies, but I think it's a bit of a concern that you're only starting with under 400K SNPs in EUR. Is this low number due to some additional QC you're applying in preprocessing?

@kapoormanav
Copy link
Author

Hi Grant,

The data is from exome sequencing of SAS and UKB cohorts and there is not much overlap for the common SNPs. I already filtered the EUR data for the common SNPs (AF > 0.05%). I can use the imputed data to see if the overlap increases between the data. Unfortunately there is no array data for our SAS group and I wanted to avoid the imputation of SAS.

Thanks,

Manav

@paturley
Copy link
Collaborator

paturley commented Jun 7, 2021 via email

@kapoormanav
Copy link
Author

Thanks Patrick. I will try that and let you people know. I will make sure to describe the process for reviewers.

@JonJala
Copy link
Owner

JonJala commented Jun 15, 2021

Any luck with trying that out?

@kapoormanav
Copy link
Author

Hi,
We are reimputing our data to 1000 genome as it has bit better representation of whole genomes for SAS subjects. Meanwhile, I found one issue that was because of my oversight. I didn't convert the odds ratios to betas. After converting to betas, analysis worked for chromosomes with larger proportion of SNPs (chr1 and chr2). But analysis was still wonky. The test statistics was highly inflated with P values ranging in negative 300.
We will have the latest imputed data by next week and I will rerun the pipeline then.
Thanks!

@JonJala
Copy link
Owner

JonJala commented Jun 16, 2021

Ok, sounds good!

@samkleeman1
Copy link

Hi,

I am having similar issues with almost all my SNPs being removed by the 'non-positive-semi-definiteness of omega.' filter.

I am using UKB data, imputed by UKB using HRC (the standard data provided) for EUR, AFR and CSA population subgroups. I selected SNPs with INFO score > 0.8 derived from 1000 randomly selected subjects in each population. This gives trans-ancestral LD scores spanning approximately 6.3 million SNPs. Then when I try to merge summary statistics in each population (also from UKB, so covering the same SNP set and imputation strategy), I am getting almost all SNPs removed by 'non-positive-semi-definiteness of omega'.

I am slightly confused as I would have thought this is a routine implementation of this approach?

I have copied the output below.

Kind regards,

Sam Kleeman
PhD Student, CSHL, NY, USA

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MAMA: Multi-Ancestry Meta-Analysis
<> Version: 1.0.0
<> (C) 2020 Social Science Genetic Association Consortium (SSGAC)
<> MIT License
<>
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Software-related correspondence: grantgoldman0@gmail.com or jjala.ssgac@gmail.com
<> All other correspondence: paturley@broadinstitute.org
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

See full log at: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/cystatinc_mama_merged.log


Program executed via:
/mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/mama/mama.py  \ 
	--sumstats /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/AFR/summ_SEM.tsv,AFR,cyc /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/CSA/summ_SEM.tsv,CSA,cyc /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/PRS/EUR/summ_SEM_cystatin.tsv,EUR,cyc  \ 
	--ld-scores /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr*.l2.ldscore.gz  \ 
	--out ../cystatinc_mama_merged  \ 
	--add-a1-col-match A1  \ 
	--add-a2-col-match A2  \ 
	--add-snp-col-match SNP  \ 
	--add-beta-col-match est  \ 
	--add-freq-col-match MAF  \ 
	--add-se-col-match SE  \ 
	--add-p-col-match Pval_Estimate  \ 
	--allow-palindromic-snps



Reading in and running QC on LD Scores
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr7.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr22.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr19.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr11.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr12.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr5.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr21.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr2.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr3.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr17.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr8.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr16.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr9.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr1.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr6.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr14.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr10.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr18.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr13.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr20.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr15.l2.ldscore.gz
Reading in LD Scores file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr4.l2.ldscore.gz


Reading in summary statistics.



Reading in ('AFR', 'cyc') sumstats file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/AFR/summ_SEM.tsv

Running QC on ('AFR', 'cyc') summary statistics

Initial number of SNPs / rows = 7184911

Filtered out 0 SNPs with "NO NAN" (Filters out SNPs with any NaN values in required columns {'A1', 'P', 'CHR', 'BP', 'SE', 'SNP', 'A2', 'FREQ', 'BETA'})
Filtered out 396147 SNPs with "FREQ BOUNDS" (Filters out SNPs with FREQ values outside of [0.01, 0.99])
Filtered out 0 SNPs with "SE BOUNDS" (Filters out SNPs with non-positive SE values)
Filtered out 0 SNPs with "CHR VALUES" (Filters out SNPs with listed chromosomes not in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'])
Filtered out 0 SNPs with "DUPLICATE ALLELE SNPS" (Filters out SNPs with major allele = minor allele)
Filtered out 0 SNPs with "INVALID ALLELES" (Filters out SNPs with alleles not in {'A', 'G', 'C', 'T'})
Filtered out 0 SNPs with "NEGATIVE GWAS P" (Filters out SNPs with negative GWAS P values)


Filtered out 396147 SNPs in total (as the union of drops, this may be less than the total of all the per-filter drops)
Additionally dropped 0 duplicate SNPs



Reading in ('CSA', 'cyc') sumstats file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/CSA/summ_SEM.tsv

Running QC on ('CSA', 'cyc') summary statistics

Initial number of SNPs / rows = 8048719

Filtered out 0 SNPs with "NO NAN" (Filters out SNPs with any NaN values in required columns {'A1', 'P', 'CHR', 'BP', 'SE', 'SNP', 'A2', 'FREQ', 'BETA'})
Filtered out 464820 SNPs with "FREQ BOUNDS" (Filters out SNPs with FREQ values outside of [0.01, 0.99])
Filtered out 0 SNPs with "SE BOUNDS" (Filters out SNPs with non-positive SE values)
Filtered out 0 SNPs with "CHR VALUES" (Filters out SNPs with listed chromosomes not in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'])
Filtered out 0 SNPs with "DUPLICATE ALLELE SNPS" (Filters out SNPs with major allele = minor allele)
Filtered out 0 SNPs with "INVALID ALLELES" (Filters out SNPs with alleles not in {'A', 'G', 'C', 'T'})
Filtered out 0 SNPs with "NEGATIVE GWAS P" (Filters out SNPs with negative GWAS P values)


Filtered out 464820 SNPs in total (as the union of drops, this may be less than the total of all the per-filter drops)
Additionally dropped 0 duplicate SNPs



Reading in ('EUR', 'cyc') sumstats file: /mnt/grid/ukbiobank/data/Application58510/skleeman/gwas_cystatinc/PRS/EUR/summ_SEM_cystatin.tsv

Running QC on ('EUR', 'cyc') summary statistics

Initial number of SNPs / rows = 8408363

Filtered out 0 SNPs with "NO NAN" (Filters out SNPs with any NaN values in required columns {'A1', 'P', 'CHR', 'BP', 'SE', 'SNP', 'A2', 'FREQ', 'BETA'})
Filtered out 106550 SNPs with "FREQ BOUNDS" (Filters out SNPs with FREQ values outside of [0.01, 0.99])
Filtered out 0 SNPs with "SE BOUNDS" (Filters out SNPs with non-positive SE values)
Filtered out 0 SNPs with "CHR VALUES" (Filters out SNPs with listed chromosomes not in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'])
Filtered out 0 SNPs with "DUPLICATE ALLELE SNPS" (Filters out SNPs with major allele = minor allele)
Filtered out 0 SNPs with "INVALID ALLELES" (Filters out SNPs with alleles not in {'A', 'G', 'C', 'T'})
Filtered out 0 SNPs with "NEGATIVE GWAS P" (Filters out SNPs with negative GWAS P values)


Filtered out 106550 SNPs in total (as the union of drops, this may be less than the total of all the per-filter drops)
Additionally dropped 0 duplicate SNPs


Number of SNPS in initial intersection of all sources: 6371548

Standardizing reference alleles in summary statistics.
Standardized to population: ('AFR', 'cyc')
Dropped 0 SNPs during reference allele standardization.


Running LD Score regression.
Regression coefficients (LD):
[[-3.78167042e-006 -4.16683593e-052 -3.08677069e-006]
 [-4.16683593e-052  3.08958396e-108  3.47087472e-051]
 [-3.08677069e-006  3.47087472e-051  2.84505267e-007]]
Regression coefficients (Intercept):
[[ 4.63763260e-03  0.00000000e+00  2.08242133e-04]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00]
 [ 2.08242133e-04  0.00000000e+00 -7.33317158e-06]]
Regression coefficients (SE^2):
[[ 2.13826844e-08 -1.58295174e-28  3.55113200e-04]
 [-1.58295174e-28  2.30266760e-56  2.45580980e-28]
 [ 3.55113200e-04  2.45580980e-28  1.82818718e+00]]


Creating omega and sigma matrices.
Average Omega (including dropped slices) =
[[-5.53372417e-004 -4.47908790e-050 -3.47376702e-004]
 [-4.47908790e-050  5.68809103e-106  6.12977954e-049]
 [-3.47376702e-004  6.12977954e-049  6.12108703e-005]]
Average Sigma (including dropped slices) =
[[5.38197663e-03 0.00000000e+00 2.08242133e-04]
 [0.00000000e+00 3.56580622e-09 0.00000000e+00]
 [2.08242133e-04 0.00000000e+00 1.18942935e-04]]

Adjusted 13859 SNPs to make omega positive semi-definite.

Dropped 6356207 SNPs due to non-positive-semi-definiteness of omega.
Dropped 0 SNPs due to non-positive-definiteness of sigma.
Dropped 6356207 total SNPs due to non-positive-(semi)-definiteness of omega / sigma.


Running main MAMA method.

Preparing results for output.

	Population 0: ('AFR', 'cyc')
		Mean Chi^2 for ('AFR', 'cyc') = 1.4034997256374622e+19
	Population 1: ('CSA', 'cyc')
		Mean Chi^2 for ('CSA', 'cyc') = nan
	Population 2: ('EUR', 'cyc')
		Mean Chi^2 for ('EUR', 'cyc') = 2.494013823014077e+20

Final SNP count = 15341
Writing results to disk.
	../cystatinc_mama_merged_AFR_cyc.res
	../cystatinc_mama_merged_CSA_cyc.res
	../cystatinc_mama_merged_EUR_cyc.res

Execution complete.


@paturley
Copy link
Collaborator

It looks like you have negative diagonal coefficients on your LD score regressions. Maybe try the following option, which will set the intercept to zero: --reg-int-zero and see if that helps.

@ggoldman1 @JonJala Maybe we should make it so the software throws an error in cases like this since the results will never make sense if the diagonal entries of the LD score regression coefficient matrices are negative.

@samkleeman1
Copy link

I tried using --reg-int-zero - it runs across all SNPs but I get crazy p values in the region of 8.169262804980647e-291088526. Why do I see negative diagonal coefficients in LD score regressions?

This is the script I use to compute LD scores. IID ancestry file and SNP ancestry file were generated as per the Jupyter Notebook. SNPs are from UK Biobank with INFO > 0.8, randomly selected 1000 subjects per population.

source mama/mama_env/bin/activate 

python3 ./mama/mama_ldscores.py --ances-path "./iid_ances_file" \
                            --snp-ances "./snp_ances_file" \
                            --ld-wind-cm 1 \
                            --stream-stdout \
                            --bfile-merged-path "./ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8" \
                            --out "ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld"

@ggoldman1
Copy link
Collaborator

Hey Sam,

Did you see the most recent response at #15? I see you're using a centimorgan-based window but it seems like from that thread your data doesn't have cM reported. If this is the case then your LD scores will likely be wrong which could cause issues.

Grant

@samkleeman1
Copy link

Sorry the script above was an old version, I used the following (with the ld-wind-kb parameter):

source ../mama/mama_env/bin/activate 

python3 ../mama/mama_ldscores.py --ances-path "../iid_ances_file" \
                            --snp-ances "../snp_ances_file" \
                            --ld-wind-kb 1000 \
                            --stream-stdout \
                            --bfile-merged-path "../ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_${chr}" \
                            --out "../out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr${chr}"
EOF

If I add in centimorgan will that fix the issue??

@ggoldman1
Copy link
Collaborator

No, not unless your .bim files have a non-zero third column. Otherwise leaving it as you have is fine.

Out of curiosity, what is the mean chi^2 in each of your input GWAS? Per #26, if you rerun with --debug passed then it'll tell you in the log file.

@samkleeman1
Copy link

I can see the following:

Do you have any suggestions about how we can resolve this? We are really keen to use this approach in our work.

	Population 0: ('AFR', 'cyc')
		Mean Chi^2 for ('AFR', 'cyc') = 1.0611160357792465e+20
	Population 1: ('CSA', 'cyc')
		Mean Chi^2 for ('CSA', 'cyc') = nan
	Population 2: ('EUR', 'cyc')
		Mean Chi^2 for ('EUR', 'cyc') = 1.292007853733898e+21

@JonJala
Copy link
Owner

JonJala commented Aug 3, 2021

That looks like it might tbe the mean chi squared of the outputs? If you pull the latest from the repo and re-run (making sure to keep the --verbose flag set), the log should have the mean chi squared of the input populations after it goes through QCing and harmonizing all the input summary statistics.

The log file should have something like "Harmonized AFR cyc mean chi squared: [VALUE]" in it

@samkleeman1
Copy link

Sure, have done

Harmonized AFR cyc mean chi squared: 0.785988856018892
Harmonized CSA cyc mean chi squared: 0.8874974162834628
Harmonized EUR cyc mean chi squared: 2.9613204740241974

@JonJala
Copy link
Owner

JonJala commented Aug 3, 2021 via email

@samkleeman1
Copy link

I ran again using a different set of phenotypes. They are standard UK Biobank phenotypes (eGFR Creatinine), with GWAS implemented in BOLT-LMM, with summary statistics directly imported to Mama. I am really struggling to understanding why it isn't working and am grateful for your help with this.

python3 /mnt/grid/ukbiobank/data/Application58510/skleeman/was/mama/mama/mama.py --sumstats "/mnt/grid/ukbiobank/data/Application58510/skleeman/gwas/AFR/creatinine_summary2.tsv,AFR,cyc" "/mnt/grid/ukbiobank/data/Application58510/skleeman/gwas/CSA/creatinine_summary2.tsv,CSA,cyc" "/mnt/grid/ukbiobank/data/Application58510/skleeman/EUR/creatinine_summary2.tsv,EUR,cyc" \
                   --ld-scores "/mnt/grid/ukbiobank/data/Application58510/skleeman/mama/out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr*.l2.ldscore.gz" \
                   --out "./mama_merged" \
                   --replace-a1-col-match "a1" --verbose \
                   --replace-bp-col-match "BP" --replace-chr-col-match "CHR" --replace-freq-col-match "VAF" \
                   --replace-info-col-match "info" \
                   --replace-a2-col-match "a2" --replace-snp-col-match "snpid" --replace-beta-col-match "beta" \
                   --replace-se-col-match "se" --replace-p-col-match "pval" --allow-palindromic-snps

Harmonized AFR cyc mean chi squared: 1.0315720691747015
Harmonized CSA cyc mean chi squared: 1.0396088044819032
Harmonized EUR cyc mean chi squared: 2.681982126791807

Dropped 6596642 total SNPs due to non-positive-(semi)-definiteness of omega / sigma.

@JonJala
Copy link
Owner

JonJala commented Aug 4, 2021 via email

@samkleeman1
Copy link

I ran across each chromosome separately. No error messages to my knowledge.

source ../mama/mama_env/bin/activate 

python3 ../mama/mama_ldscores.py --ances-path "../iid_ances_file"                             --snp-ances "../snp_ances_file"                             --ld-wind-kb 1000                             --stream-stdout                             --bfile-merged-path "../ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_8"                             --out "../out/ukb_EUR_AFR_CSA_EAS_maf0.01_info0.8_ld_chr8"

@paturley
Copy link
Collaborator

paturley commented Aug 5, 2021 via email

@samkleeman1
Copy link

What do you mean by standardized genotypes? Like recoding to 0/1/2?

@paturley
Copy link
Collaborator

paturley commented Aug 5, 2021 via email

@ggoldman1
Copy link
Collaborator

When calling the ldsc script, you should specify --std-geno-ldsc. When calling MAMA, pass --use-standardized-units (we should probably harmonize these, I can open a PR for that today) passing in the LD scores you generated in the previous step.

@samkleeman1
Copy link

Running now, will keep you posted

@samkleeman1
Copy link

It worked! Thanks so much guys!

@ggoldman1
Copy link
Collaborator

ggoldman1 commented Aug 7, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants