Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign quality metrics to each genotype as they are loaded #5

Open
carolyncaron opened this issue Mar 9, 2017 · 0 comments
Open

Assign quality metrics to each genotype as they are loaded #5

carolyncaron opened this issue Mar 9, 2017 · 0 comments
Assignees

Comments

@carolyncaron
Copy link
Member

Since the loader is already saving meta-data for each genotype call in the VCF format, it would be an extremely useful feature to allow the user to set thresholds for what constitutes a good/bad quality call, and then allow that information to be stored directly into the database, also as meta-data. For our purposes, it is more efficient to calculate and store quality metrics than to perform these calculations on the front-end every time user requests for genotype calls through a web interface occurs.

For example,

  • The user specifies they want a quality of metric of "NDQR" to be saved as meta-data
  • The user also provides thresholds, let's say lower and upper thresholds, to categorize each call as "bad", "acceptable" or "excellent" quality. These thresholds are set based on meta-data that is already present in the genotype file. In this case, the user may specify a lower-threshold as having a read depth (DP) of 5 and allele depth (AD) of 4, whereas an upper threshold requires a read depth of 50 and allele depth of 45.

We would also like to extend this to all file formats and not just VCF, by allowing an optional column within the legacy format, and additional parsing capabilities within the genotype call column of the matrix format for key-value pairs. This provides the user the most flexibility to specify quality in whatever method they want based on their data (for example, if they happen to know read depth or have a percentage-based quality score), regardless of file format.

For reference, this is our (myself and @laceysanderson) thought process about this on the whiteboard:
img_20170309_151604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant