Use of parentheses for predicted consequences #18
Replies: 10 comments 18 replies
-
|
The most important issue for me is that this proposal would create inconsistency and, therefore, will need very strong arguments why the parentheses should be removed for protein descriptions. Parentheses are used throughout the HGVS nomenclature to indicate uncertainty and/or that values are inferred.
Removal of the rule to use parentheses on the protein level if the variant has only been detected on the DNA level would create an inconsistency that creates an illogical conflict/inconsistency in the nomenclature. It also takes away part of the descriptive power of the nomenclature, that now can indicate on DNA, RNA, and protein level that a change is predicted/inferred and not observed directly. Strong issues will need to be raised to justify this removal. Issues that were raised, to my awareness:
This is factually incorrect, so I'm not sure how this was measured. Also, this argument would mean that all features of the HGVS nomenclature that seemingly aren't used (much?) should be removed. However, this feature is used to a great extent. LOVD, Mutalyzer, and VariantValidator all use parentheses and use them to indicate when protein changes are predictions. On the other hand, ClinVar doesn't seem to value receiving protein descriptions much at all - their submission template doesn't have a field for protein descriptions, only one for alternative descriptions. They actually state providing the protein level is not necessary because they always provide predictions, so we have to assume all protein descriptions in ClinVar are predictions. I was told ClinVar doesn't use parentheses in the generated protein descriptions because submitters weren't providing them. This doesn't surprise me since there isn't a protein change field (sounds like a chicken or egg problem). In my opinion, it would be more logical to notify a few large annotation services that their generated protein descriptions aren't valid HGVS; Ensembl VEP, for example, has in the past responded positively when I highlighted bugs in their generated HGVS descriptions.
I'm not sure who complains here or what the issues are that they encounter. Users of Mutalyzer or Variant Validator all get protein descriptions with parentheses, so I assume these are users of other annotation/prediction tools that then end up not having parentheses. Perhaps they don't want to add them manually. In that case, the most logical course of action is to file a bug with their annotation software. A skilled programmer familiar with the annotation software would need only a few minutes to make sure that all predicted output will use parentheses from now on, so this can't be a real issue. If, however, they're running into problems finding data associated with this protein description in databases, several issues could be causing this. See the next point.
Let's actually look at that. It only takes a few minutes for a skilled programmer familiar with the software to make sure the matching is performed regardless of using parentheses or not. However, the biggest problem is that the protein change is the least consistent of all descriptions across databases. E.g., ClinVar uses the Additionally to all points above, suggesting changing the HGVS nomenclature and removing the use of parentheses in protein descriptions creates a win/lose situation. There are people, databases, and systems relying on this specific use of parentheses, and this option would be taken away. The simplest solution, costing a mere fraction of the time already spent on this discussion, is to get databases that care about matching to implement a parentheses-agnostic method of matching the input with the database contents. It'll often be only one line of code, and it also solves the same problem on DNA and RNA levels, where parentheses are also used to indicate predictions. This would create a win/win situation. The guidelines will still be consistent between DNA, RNA, and protein level. No time is wasted on changing the guidelines, updating systems (LOVD, Mutalyzer, Variant Validator), and checking and correcting hundreds of thousands of data entries in LOVD. People using and relying on the parentheses can still show the difference between predictions and confirmed variants. And, matching no longer relies on parentheses. |
Beta Was this translation helpful? Give feedback.
-
|
From what I recall from previous conversations, the main argument against the use of parentheses is that they convey no information about variants themselves, but rather about how variants were identified. I can see why people would like to keep these concepts separated, but I can also see why others find it important to keep annotation close by. Related to the above, the use of parentheses introduce a minor inconvenience in normalisation procedures because annotation needs to be preserved, e.g., normalisation of Concerns about the adoption of parentheses are valid in principle as the amount of information conveyed by them is a function of the consistency of their use. Perhaps database owners and/or curators can estimate to which extent this concern is relevant in practice. Personally, I have no strong opinion about the use of parentheses. I do have some concerns about the clarity (and perhaps internal consistency) of the relevant guidelines in their current form. I may come back to this later, pending the outcome of this discussion. |
Beta Was this translation helpful? Give feedback.
-
|
The inherent issue I see with the use of parentheses is that the specification infers meaning from the lack of markup. That is, the lack of parentheses means some non-typical analysis (sequencing at the RNA or protein level) was done. However, that can't be distinguished from simple omission, and there's no way to independently determine the method of analysis (other than reading a paper if there is one describing the variant), so we functionally have to resort to considering everything to be predicted, and therefore assume that parentheses should be added for all r. and p. descriptions. To me that means they confer no additional information. Furthermore, while the original request was to consider the use of parentheses with protein descriptions, I believe the same issue applies with their use in DNA and RNA descriptions. The only difference is that most sequencing is done on DNA, so parentheses aren't called for with g. and c. descriptions, and r. descriptions are rarely used in the literature and submissions. I also find the definition for parentheses aren't very clear.
So there are two separable meanings (uncertainty vs predicted), and the meaning of theoretically deduced isn't explicitly spelled out such that users may not realize how it should apply for g., c., and r. expressions. Taking a look at how parentheses are treated in Mutalyzer and Variant Validator:
So both fit spec for p. with the assumption that parentheses should always be present (in which case I'd argue that they're not really adding anything), and the mapping behavior between c., g., and r. isn't always adding parentheses when changing molecule types. The potential value in knowing if an r. expression is based on RNA sequencing vs inferred from DNA sequencing would arise from cases of RNA editing changing the sequence. Similarly, the potential value for p. expressions would come from RNA editing or rare cases of alternate tRNA usage (e.g. translation of stop codons as selenocysteine or some other amino acid in the case of stop-codon readthrough). Otherwise inferring from DNA is reliable. If the HGVS spec was originally stated the other way, such that addition of some markup meant direct sequencing of the molecule had been done, then it would be useful. But as it stands it seems any logic must assume that everything is from DNA sequencing by default, and lack of parentheses is an error of omission, rendering the markup uninformative. |
Beta Was this translation helpful? Give feedback.
-
|
When this is discussed, I would like us to consider parallelism between the decision here and inferred sequence junctions, as discussed in #15. |
Beta Was this translation helpful? Give feedback.
-
|
It seems this issue is still not resolved and as such, I would like to request a vote of at least the current HGVS committee, if not engagement of the external community, on the topic. I'd be happy to help with this engagement. However, I would like to clarify the request, which is not to remove parentheses, but to instead create consistency in their use. The current recommendation requires varied use of parentheses depending on whether there is scientific evidence that a protein is produced with the predicted change.
Challenges with varied parentheses use. Please note I included Terrence Murphy's argument as well.
|
Beta Was this translation helpful? Give feedback.
-
|
My few cents.
To me, this sounds like an argument for a separate column to explicitly indicate prediction status, instead of having two independent types of information in one field.
Both this and the converse is true (different effects may share the same description). I would be in favour of solving these issues (i.e., a variant description should be lossless and only depend on the reference and modified sequence). The possibility of finding DNA or RNA variants that share the same (predicted) effect would be a great asset. |
Beta Was this translation helpful? Give feedback.
-
Variants A description is lossless when it contains all the information needed to reconstruct the modified (or observed) sequence. We could describe the effect of the first variant above as a Variants |
Beta Was this translation helpful? Give feedback.
-
Annotation tools (e.g. VEP) just spit out HGVS nomenclature with or without parentheses per command line, not by knowing which is experimentally correct. So these annotations never convey actual knowledge. For example, if you examine VEP documentation, here: https://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html the tool simply forces all p. to have parentheses. Force --hgvs to return the HGVSp notation in predicted format. For example, ENSP00000233741.4:p.Thr367AsnfsTer13 will be returned as ENSP00000233741.4:p.(Thr367AsnfsTer13).
This is a valid question but the answer is that it was based on extensive input from the community. My first ClinGen grant funded many focus groups and efforts to work with ClinVar to provide guidance to how ClinVar represented information. For gnomAD, we also routinely interact with the community and perform user input surveys to gather input on how people want information provided. There is overwhelming objection to the parentheses usage in the HGVS rules.
I'm not sure I follow. You'd have to deploy custom search algorithms for every search tool used to find information on variation including common ones (PubMed, Google Scholar, etc) to thousands that are more obscure.
The truth is that both free text information and structured information is needed. For example, for all of ClinGen's expert panels, we provide both structured and free text evidence. If you follow this link: https://erepo.clinicalgenome.org/evrepo/ui/classification/e8b1ce82-bc97-4baf-b442-294c2a6849cd you will see the structure evidence codes: PM3_Very Strong PS4 PP1 PP3 are listed as applicable and all others listed as not applicable. These codes are useful for AI/ML-based uses of the aggregate data in ClinVar, but doesn't begin to convey the nuances of the actual complex data that has been assembled on this variant and the rationale for how the committee classified the variant with respect to pathogenicity. You'll also see that many labs that submit to ClinVar take a similar approach in applying both structured evidence codes as well as free text. But in focus groups led by ClinVar, the community overwhelmingly highlighted the free text evidence descriptions submitted by the labs as the most useful information that ClinVar provides, far more useful than the structured evidence codes. No one ever relies on the HGVS nomenclature parentheses to convey information given how little it is used and when used, is largely based on automated application, not actually figuring out the experimental data. In the new ACMG guidelines we are currently working on, we will have more codes that will differentiate both predicted impacts from splicing prediction algorithms like SpliceAI, Pangolin, etc as well as missense in silico predictors and then separate codes for conveying many different types of experimentally derived data. All of these codes will be structured and have point values along with each code to convey the strength of evidence in addition to the type of evidence. This is where this type of information should be collected, not in the nomenclature. |
Beta Was this translation helpful? Give feedback.
-
|
Heidi suggests: I propose that parentheses should always be used and the c./p. format be consistent with the format used in ClinVar: NM_004004.6(GJB2):c.101T>C (p.Met34Thr).
|
Beta Was this translation helpful? Give feedback.
-
|
Regarding the first issue, simply use "GJB2 NM_004004.6:c.101T>C", you list the gene (for human regonition) and the variant description following the HGVS nomenclature standard. Regarding point 3, we will not agree. ACMG is covers variant classification, HGVS nomeclature variant description and HGVS nomenclature states that predicted descriptions should be given using parentheses. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The current HGVS nomenclature recommendations state: predicted consequences, i.e. without experimental evidence (no RNA or protein sequence analysed), should be given in parentheses, e.g. p.(Arg727Ser).
People have proposed to get rid of this rule and use p.Arg727Ser irrespective of whether there is experimental evidence or not. In my opinion, this is not a good idea, the use of "()" is a standard HGVS format to indicate uncertainty. When people do not want to make the discrimination between predicted/not predicted, they can use Arg727Ser, i.e. not using the "p.", so not following HGVS recommendations.
Beta Was this translation helpful? Give feedback.
All reactions