With our standardized samples for molecular analysis, we make quantitative, objective benchmarking possible, but at Zymo Research, our motto is "The beauty of science is to make things simple." With that in mind, we present you with the MIQ score, a simplified metric for grading the accuracy of an analysis relative to a known standard sample input.
Publication on this method is pending. Watch this space for news and a link once it is available on bioRXiv.
This package is designed to work with Python 3.6.7 and Matplotlib 3.0.2. It will most likely be compatible with other recent versions of both.
This package is designed to be either copied in as a Python module or (my preferred method) cloned in as a git submodule
The general workflow for this system is to load a set of standard reference data that will describe various properties of your standard sample, generate read counts or some other measure of frequency or relative abundance of signals from your analysis of the standard stamples, and create a Python object from these that contains a MIQ Score for the sample as well as some plots that may give information as to the source of any bias or inaccuracy in the test.
This method is your most likely starting point for the use of this package. This method will instantiate your StandardReference class object. The path should point to JSON formatted key:values that describe the standard as well as how data may be delivered and aliases for template names. For an example of this using the Zymo Research Mock Microbial Community Standard, please look in the reference handler directory of this package.
Attribute | Type | Description |
---|---|---|
nameLookup | dict | Key values linking potential templates (reference contigs) to their source name. Especially useful in microbiome, since a read may map to different templates but still be effectively the same source. |
printNames | dict | Key values linking source names to how they should be printed for figures |
itemIDs | list | List of possible read sources for this standard |
sortings | dict of lists | Possible ways to sort the read sources, generally useful to detect bias at one end of the spectrum or the other |
expectedValues | dict of dicts | Dictionary of analysisMethod:readSource:expectedValue. Generally relative abundances in percent. |
analysisMethods | list | List of analysis methods from the above value. |
miqScoreNGSReadCountPublic.MiqScoreCalculator(standardReference: miqScoreNGSReadCountPublic.referenceHandler.StandardReference, analysisMethod:str, percentToleranceInStandard:[int, float]=0, floor:[int, float]=0)
This method will instantiate a MIQ Score calculator object, which is the "business end" of this entire package. Required arguments include a standard reference of StandardReference type (see previous item for more info on that type) and an analysis method (chosen from possible analysis methods for the standard). If you are unsure of potential analysis methods for the standard, standardReference.analysisMethods will list them if the object is loaded, or they will be visible in the JSON storing the reference data. Optional arguments include percentToleranceInStandard, which represents the manufacturing tolerances of your standard expressed as a percentage (enter 15 for 15% and not 0.15), and floor, which is a minimum value for the miqScore (defaults to zero, but can be set to None if unwanted). The percent tolerance will be used to adjust any deviations from expected that are observed in your measurement, as this package assumes that any deviation from expected that can be explained by manufacturing tolerances in the standard should be treated as such. Generally, having the miq score floor out to zero improves the ease of understanding and rapid analysis for scores where there is an upper limit of 100 and no set lower limit.
This method will calculate a MIQ score for a set of input data and return a MiqScoreData object. The MiqScoreData object is the main output from this package and has the following attributes and methods:
Reads will generally be divided into expected/reference reads and unexpected/nonreference reads here. Expected/reference reads will be only reads that aligned to an expected read source. Unexpected/nonreference reads will be those that aligned to something other than an expected reference sequence, were unalignable, or were removed before alignment. These can help diagnose problems in library quality or other issues. These are not used to calculate the MIQ score (although they can be used to explain an unexpectedly low score due to issues with sample/library preparation and sequencing).
Attribute/method | Type | Description |
---|---|---|
miqScore | float | The MIQ score, which may be floored if a floor value was set when calculating |
rawMiqScore | float | The MIQ score without any floor. Will be equivalent to miqScore if no floor was set |
referenceReadCounts | dict | Key values of reads from expected sources:count. Keys will be their unique identifier from the StandardReference object |
nonReferenceReadCounts | dict | Dictionary with all nonreference/unexpected read counts and counts of any reads removed prior to alignment |
samplePercentages | dict | Relative abundances of each reference read |
samplePercentagesOfExpected | dict | Abundances of each reference read type relative to expected (ie. 100 means that the count seen was exactly what was expected) |
percentToleranceInStandard | float | Directly stored from the MiqScoreCalculator object that created it for tracibility |
analysisMethod | str | Directly stored from the MiqScoreCalculator object that created it for tracibility |
standardReference | StandardReference | Directly stored from the MiqScoreCalculator object that created it for tracibility |
readFateTable | dict | This table will be similar to nonReferenceReadCounts except that it will contain an additional group called "Reference" that has a count of all reads that ultimately aligned to an expected reference sequence. |
sampleID | str | Directly stored from the MiqScoreCalculator object that created it for tracibility |
storePlots | bool | If set to false, plots will be regenerated each time the plot creation method is called, otherwise they will be stored (see below for how) |
plots | dict | Dictionary used to store plots. Plots are all stored as base64-encoded PNG files (although other formats can be specified when calling the plot creation method). Keys will be plot names or groups of plots. Values will either be the base64 string of the plot file or, if is a group of plots (such as radar plots), a dictionary of plotName:plotData in base64. |
makeReadFateChart(format:str="png", forceRedraw:bool=False, readFatePrintNames:dict=None) | str (base64) | Generates a pie chart describing the fate of all reads that entered the analysis from the read fate table. Expected reads will be grouped as "Reference." format should denote a valid file saving format for matplotlib and defaults to PNG (although SVG is another very good option for vector format). forceRedraw will cause the plot to be regenerated even if it has already been saved. readFatePrintNames can contain a dictionary where any read fate identifier in the read fate table with a key present will be changed to the corresponding value when generating a key for the figure. |
makeRadarPlots(format:str="png", forceRedraw:bool=False) | dict of str:base64 | Iterates over possible sorting methods for standard and generates radar plots to help with bias detection. Output will be a dictionary of sortingMethod:base64 of plot. This takes optional format and forceRedraw objects with the same behavior as makeReadFateChart. |
makeCompositionBarPlot(goodExample:dict=None, badExample:dict=None) | str (base64) | Makes a composition bar plot that is typical of microbiome samples. Can take in a good example of samplePercentages (in a dictionary) and a badExample likewise. |
jsonOutput | str | Returns a string of JSON-formatted output to store a detailed report on the sample |
miqScoreNGSReadCountPublic.MiqScoreCalculator.generateReport(replacementTable:dict, template:str, sampleMiq:MiqScoreData, goodExampleMiq:MiqScoreData, badExampleMiq:MiqScoreData, readFatePrintNames:dict=None)
Reads will generally be divided into expected/reference reads and unexpected/nonreference reads here. Expected/reference reads will be only reads that aligned to an expected read source. Unexpected/nonreference reads will be those that aligned to something other than an expected reference sequence, were unalignable, or were removed before alignment. These can help diagnose problems in library quality or other issues. These are not used to calculate the MIQ score (although they can be used to explain an unexpectedly low score due to issues with sample/library preparation and sequencing).
Attribute/method | Type | Description |
---|---|---|
replacementTable | dict | Dictionary of values to replace in the HTML template (value identifiers should be surrounded with two percent symbols for %%VALUE%% |
template | str (HTML template) | HTML template with values to be replaced surrounded by double percents (see above) |
We welcome and encourage contributions to this project from the scientific community and will happily accept and acknowledge input (and possibly provide some free kits as a thank you). We aim to provide a positive and inclusive environment for contributors that is free of any harassment or excessively harsh criticism. Our Golden Rule: Treat others as you would like to be treated.
We use a modification of Semantic Versioning to identify our releases.
Release identifiers will be major.minor.patch
Major release: Newly required parameter or other change that is not entirely backwards compatible Minor release: New optional parameter Patch release: No changes to parameters
- Michael M. Weinstein - Project Lead, Programming and Design - michael-weinstein
- Aishani Prem - Testing, Design - AishaniPrem
- Mingda Jin - Testing, Code Review - jinmingda
- Shuiquan Tang - Design - shuiquantang
- Jeffrey Bhasin - Design, Code Review - jeffbhasin
- Justin J. Lin - Design - jjlinscientist
- Ryan Kemp - Design - Zymo Research
- Brian Janssen - Design - Zymo Research
- Christopher E. Mason - Guidance, Testing - Cornell University
See also the list of contributors who participated in this project.
This project is not currently licensed it will likely be licensed under the GNU GPLv3 License - see the LICENSE file for details. This license restricts the usage of this application for non-open sourced systems. Please contact the authors for questions related to relicensing of this software in non-open sourced systems.
We would like to thank the following, without whom this would not have happened:
- The Python Foundation
- The staff at Zymo Research
- Our customers