-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obtaining PRS from a single individual using PGS Catalog files #302
Comments
You want to also add --no-regress. As you don't need to do the regression
to optimize parameter.
…On Mon, Sep 12, 2022, 11:05 AM aheritas ***@***.***> wrote:
Hi! I have been reading the documentation and some of your answers in
forums. I understand that it is possible to calculate the PRS score for a
single individual using the PGS Catalog files. I have tried to do so using
PRSice-2 but I have been unsuccessful. I am sharing here my detailed steps
and I would be grateful if you could guide me into how to troubleshoot this.
I want to calculate the PRS for breast cancer (PGS000004
<https://www.pgscatalog.org/score/PGS000004/>). I know the fact that, the
scoring file for this particular PRS, does not include RSIDs but genomic
positions (I am using this harmonized file
<https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000004/ScoringFiles/Harmonized/PGS000004_hmPOS_GRCh37.txt.gz>
for GRCh37). According to some of your answers in a forum
<https://www.biostars.org/p/9463113/>, when using PGS Catalog files, we
should add an additional column including all 1 or 0 as p-values. I modify
this file to include a new column, (named p_value) that contains all 1
resulting in
PGS000004_withpval.txt
<https://github.com/choishingwan/PRSice/files/9549230/PGS000004_withpval.txt>
.
My input file is a a VCF obtained from imputation software, containing
approx. 80M variants. The first I do is to normalize this VCF using
bcftools, so that there is one single row per genomic position.
bcftools norm -m +any -O z -o NORMALIZED_VCF
/home/user/data/ORIGINALFILEVCF_imputed.vcf.gz
Then, I transform this file into the necessary input files for PRSice
(.bed, .bim, .fam) using PLINK v1.9.
plink --vcf /home/user/data/NORMALIZED_VCF.vcf.gz --snps-only --make-bed
--out NORM_PLINK_VCF
Finally, I run PRSice, with the following parameters:
Rscript /home/user/data/PRSice.R --prsice /home/user/data/PRSice_linux
--base /home/user/data/PGS000004_withpval.txt --a1 effect_allele --a2
other_allele --stat effect_weight --pvalue p_value --beta --bp chr_position
--chr chr_name --chr-id c:l-ab --target NORM_PLINK_VCF --no-clump --out
Output_NORM_PLINK_VCF_PRSice
The script runs, but I get the following error:
81192144 variant(s) not found in previous data
237 variant(s) included
There are a total of 1 phenotype to process
Processing the 1 th phenotype
Phenotype is a continuous phenotype
Only one phenotype value detected and they are all -9. Not
enough valid phenotype
So, I understand there is a problem with the phenotype file. The phenotype
of this file, is unknown, that's why I want to calculate the PRS, but
perhaps I am incorrectly adding some extra parameters that are not
necessary. Would you mind guiding me to this calculation? Thank you very
much!
—
Reply to this email directly, view it on GitHub
<#302>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTRYQCJASR77SN3KUTMO3V55BDXANCNFSM6AAAAAAQKSAPAY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi! I have been reading the documentation and some of your answers in forums. I understand that it is possible to calculate the PRS score for a single individual using the PGS Catalog files. I have tried to do so using PRSice-2 but I have been unsuccessful. I am sharing here my detailed steps and I would be grateful if you could guide me into how to troubleshoot this.
I want to calculate the PRS for breast cancer (PGS000004). I know the fact that, the scoring file for this particular PRS, does not include RSIDs but genomic positions (I am using this harmonized file for GRCh37). According to some of your answers in a forum, when using PGS Catalog files, we should add an additional column including all 1 or 0 as p-values. I modify this file to include a new column, (named p_value) that contains all 1 resulting in
PGS000004_withpval.txt.
My input file is a a VCF obtained from imputation software, containing approx. 80M variants. The first I do is to normalize this VCF using bcftools, so that there is one single row per genomic position.
bcftools norm -m +any -O z -o NORMALIZED_VCF /home/user/data/ORIGINALFILEVCF_imputed.vcf.gz
Then, I transform this file into the necessary input files for PRSice (.bed, .bim, .fam) using PLINK v1.9.
plink --vcf /home/user/data/NORMALIZED_VCF.vcf.gz --snps-only --make-bed --out NORM_PLINK_VCF
Finally, I run PRSice, with the following parameters:
Rscript /home/user/data/PRSice.R --prsice /home/user/data/PRSice_linux --base /home/user/data/PGS000004_withpval.txt --a1 effect_allele --a2 other_allele --stat effect_weight --pvalue p_value --beta --bp chr_position --chr chr_name --chr-id c:l-ab --target NORM_PLINK_VCF --no-clump --out Output_NORM_PLINK_VCF_PRSice
The script runs, but I get the following error:
So, I understand there is a problem with the phenotype file. The phenotype of this file, is unknown, that's why I want to calculate the PRS, but perhaps I am incorrectly adding some extra parameters that are not necessary. Would you mind guiding me to this calculation? Thank you very much!
The text was updated successfully, but these errors were encountered: