-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
confirm numbers for one of the Illumina BaseSpace Hiseq V4 reference datasets #5
Comments
Normally, PG should come with confident call regions in a bed file, these should be passed to hap.py using the -f switch. This will most likely not increase the recall though, not sure why this would be low (I don't have access to this project and don't know what workflow was run). An easy way to see if the hap.py numbers are ok would probably be to run the VCF through VCAT on BaseSpace: https://basespace.illumina.com/apps/1800799/Variant-Calling-Assessment-Tool?preferredversion The most recent version of this uses hap.py 0.2.x -- the numbers might differ slightly, but shouldn't be too different. |
I see, thanks for the info. There seems to be a discrepancy between versions, presumably due to a combination of differences in the hap.py code and the version of the PG vcfs. I took a vcf from a Public dataset in BaseSpace that has been run against VCAT, and got the numbers shown below. https://basespace.illumina.com/analyses/23453440?preview=False&projectId=20407389 The VCAT report on Basespace shows higher numbers than the ones below, and was run against PG v7.0. I believe I am running it agains PG v8.0.1. The SNV Recall is fractionally lower for the numbers below wrt VCAT v7.0 but drops from 0.8173 to 0.7805 for Indels.
As long as these are versioning differences, I will carry on as it is as a happy hap.py user... |
The numbers will be different between PG7 and 8.0.1. Also, VCAT uses -f ConfidentRegions.bed.gz - this affects precision and Frac_NA. |
The numbers seem close enough to me considering the differences in PG version and command line. Closing. |
I downloaded one of the vcf files for NA12877 from a Public Dataset in Basespace (see screenshot).
https://basespace.illumina.com/analyses/18540672?preview=False&projectId=16095081
Ran it against hap.py and got the values below:
Are these numbers correct? I was expecting SNP Recall to be in the order of 0.98 and SNP Precision to be around 0.998 .
The text was updated successfully, but these errors were encountered: