Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtaining PRS from a single individual using PGS Catalog files #302

Open
aheritas opened this issue Sep 12, 2022 · 1 comment
Open

Obtaining PRS from a single individual using PGS Catalog files #302

aheritas opened this issue Sep 12, 2022 · 1 comment

Comments

@aheritas
Copy link

Hi! I have been reading the documentation and some of your answers in forums. I understand that it is possible to calculate the PRS score for a single individual using the PGS Catalog files. I have tried to do so using PRSice-2 but I have been unsuccessful. I am sharing here my detailed steps and I would be grateful if you could guide me into how to troubleshoot this.

I want to calculate the PRS for breast cancer (PGS000004). I know the fact that, the scoring file for this particular PRS, does not include RSIDs but genomic positions (I am using this harmonized file for GRCh37). According to some of your answers in a forum, when using PGS Catalog files, we should add an additional column including all 1 or 0 as p-values. I modify this file to include a new column, (named p_value) that contains all 1 resulting in
PGS000004_withpval.txt.

My input file is a a VCF obtained from imputation software, containing approx. 80M variants. The first I do is to normalize this VCF using bcftools, so that there is one single row per genomic position.

bcftools norm -m +any -O z -o NORMALIZED_VCF /home/user/data/ORIGINALFILEVCF_imputed.vcf.gz

Then, I transform this file into the necessary input files for PRSice (.bed, .bim, .fam) using PLINK v1.9.

plink --vcf /home/user/data/NORMALIZED_VCF.vcf.gz --snps-only --make-bed --out NORM_PLINK_VCF

Finally, I run PRSice, with the following parameters:

Rscript /home/user/data/PRSice.R --prsice /home/user/data/PRSice_linux --base /home/user/data/PGS000004_withpval.txt --a1 effect_allele --a2 other_allele --stat effect_weight --pvalue p_value --beta --bp chr_position --chr chr_name --chr-id c:l-ab --target NORM_PLINK_VCF --no-clump --out Output_NORM_PLINK_VCF_PRSice
The script runs, but I get the following error:

81192144 variant(s) not found in previous data 
237 variant(s) included 

There are a total of 1 phenotype to process 

Processing the 1 th phenotype 

Phenotype is a continuous phenotype 

Only one phenotype value detected and they are all -9. Not 
enough valid phenotype 

So, I understand there is a problem with the phenotype file. The phenotype of this file, is unknown, that's why I want to calculate the PRS, but perhaps I am incorrectly adding some extra parameters that are not necessary. Would you mind guiding me to this calculation? Thank you very much!

@choishingwan
Copy link
Owner

choishingwan commented Sep 30, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants