+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + The MoleculerDefinition resource is designed for representing genetic molecules (e.g., sequence). It can represent the genetic molecules in different ways, allowing implementations to adopt the most effective one for their use case. +
+
+ The MoleculerDefinition resource is designed to represent a single sequence or a composite of genetic sequences (e.g., haplotype). Each genetic molecule might have multiple representations, but implementers SHALL ensure all representations are for the same molecule. This means that if a single MoleculerDefinition instance contains a literal
, two formatted
files, and a relative
, all four of those representations must represent the same genetic molecule (e.g.,sequence). This can be a challenge across systems, as semantic equivalency of sequences cannot be guaranteed unless there is an agreed upon standard between sending and receiving systems.
+
+ The MoleculerDefinition resource should only be used to capture a molecular sequence or a composite of moleculer sequences. It will not be used for other entities such as variant, variant annotations, etc. Those concepts will be captured in Observation profiles found in the Genomics Reporting Implementation Guide. The sequence that was observed that led to the identification of those concepts can be delivered with this resource, and will be referenced by those observations. +
+
+ MoleculerDefinition will not be used to capture data such as precise read of DNA sequences and sequence alignment are not included; such data may be accessible through references to GA4GH (Global Alliance for Genomics and Health) API, and may be referenced to by the formatted
element.
+
+ This resource supports three patterns for representing a sequence of interest: +
+ The MolecularSequence resource is designed to represent a single sequence in an instance. Each sequence might have multiple representations, but implementers SHALL ensure all representations are for the same sequence. +
+
+ literal
: This string element can be used to hold the sequence as a string of characters.
+
+ formatted
: This Attachment is used to refer to the sequence as embedded file content or via a URL reference.
+
+ This method can be used to refer to sequence data from in an external source. If the sequence is referring to a GA4GH repository, the formatted.url
should refer to a GA4GH compliant endpoint that conforms to GA4GH data models.
+
+ relative
: This complex element is used for encoding sequence. When the information of starting sequence and edits are provided, the observed sequence will be derived. Here is a picture below:
+
+ relative.ordinalPosition
: Indicates the order in which the sequence should be considered when putting multiple relative
instances together.
+
+ relative.sequenceRange
: Indicates the nucleotide range in the composed sequence when multiple relative
instances are used together.
+
+ These attributes help to clarify what sequence is being represented with less computation/inference on the recipient side. Implementers SHOULD use sequenceRange
first to determine order as the most reliable. If sequenceRange
is not present then ordinalPosition
SHOULD be used. Finally, if both sequenceRange
and ordinalPosition
are absent, then the order of the relative
data elements SHOULD be used to calculate a composition. It is the responsibility of the data sender to ensure the message can be consistently understood. Additionally, gaps in sequenceRange
are considered intentional (i.e. the composed sequence contains a sequence of N's, the placeholder nucleotide, for the gap range).
+
+ relative.startingSequence
: There are four optional ways to represent a starting sequence in MolecularSequence resource:
+
relative.startingSequence.sequenceCodeableConcept
: Starting sequence id in public database;relative.startingSequence.sequenceString
: Starting sequence string; relative.startingSequence.sequenceReference
: Reference to starting sequence stored in another sequence entity; relative.startingSequence.genomeAssembly
, relative.startingSequence.chromosome
: The combination of genome assembly and chromosome.
+ The relative.startingSequence.windowStart
and relative.startingSequence.windowEnd
defines a range from the starting sequence that is used to define a subsequence used as the starting sequence.
+
+ When saving the sequence information, the nucleic acid will be numbered with order. Some representations use a 0-based system (e.g. GA4GH API, BAM files) while some use a 1-based system (e.g. VCF file format). The element coordinateSystem contains this information. +
+
+ relative.coordinateSystem
binds to a LOINC answer list, please review those answers here as well as the detailed description found here.
+
+ There are many considerations concerning the directionality of DNA or RNA. Here we are using relative.startingSequence.orientation
and relative.startingSequence.strand
. Orientation represents the sense of the sequence, which has different meanings depending on the type
. Strand represents the sequence writing order. Watson strand refers to 5' to 3' top strand (5' -> 3'), whereas Crick strand refers to 5' to 3' bottom strand (3' <- 5').
+
+ Only two possible values can be made by strand, watson
and crick
. Since the directionality of the sequence string might be represented in different ways in different omics scenario, below are examples of how to map other expressions into its correlated value:
+
Watson | +Crick | +
---|---|
5′-to-3′ direction | +3′-to-5′ direction | +
+1 | +-1 | +
Sense | +Antisense | +
Positive | +Negative | +
+ There are attributes where the sequence is represented as a string of characters. +
relative.startingSequence.sequenceString
relative.edit.replacementSequence
relative.edit.replacedSequence
literal
+ The characters used in these string representations of a sequence should be constrained to the IUPAC codes found here https://www.bioinformatics.org/sms2/iupac.html. +
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ http://hl7.org/fhir/moleculardefinition-focus
+
+
+
+
+
+
+
+
-
-
-
+
+
-
- - This resource supports three patterns for representing a sequence of interest: -
- The MolecularSequence resource is designed to represent a single sequence in an instance. Each sequence might have multiple representations, but implementers SHALL ensure all representations are for the same sequence. + The MolecularSequence resource is designed to represent a single sequence in an instance. Each sequence might have multiple representations, but implementers SHALL ensure all representations are for the same sequence.
@@ -42,9 +42,9 @@
These attributes help to clarify what sequence is being represented with less computation/inference on the recipient side. Implementers SHOULD use sequenceRange
first to determine order as the most reliable. If sequenceRange
is not present then ordinalPosition
SHOULD be used. Finally, if both sequenceRange
and ordinalPosition
are absent, then the order of the relative
data elements SHOULD be used to calculate a composition. It is the responsibility of the data sender to ensure the message can be consistently understood. Additionally, gaps in sequenceRange
are considered intentional (i.e. the composed sequence contains a sequence of N's, the placeholder nucleotide, for the gap range).
- In a FGFR2:MET Fusion use case, where the fusion was uncovered through RNA sequencing, a partial representation can be found here. -
+
relative.startingSequence
: There are four optional ways to represent a starting sequence in MolecularSequence resource:
@@ -65,13 +65,13 @@
relative.coordinateSystem
binds to a LOINC answer list, please review those answers here as well as the detailed description found here.
- Here are two examples: -
- +
There are many considerations concerning the directionality of DNA or RNA. Here we are using relative.startingSequence.orientation
and relative.startingSequence.strand
. Orientation represents the sense of the sequence, which has different meanings depending on the type
. Strand represents the sequence writing order. Watson strand refers to 5' to 3' top strand (5' -> 3'), whereas Crick strand refers to 5' to 3' bottom strand (3' <- 5').
@@ -101,18 +101,18 @@
- There are attributes where the sequence is represented as a string of characters. +
+ There are attributes where the sequence is represented as a string of characters.
relative.startingSequence.sequenceString
relative.edit.replacementSequence
relative.edit.replacedSequence
literal
- The characters used in these string representations of a sequence should be constrained to the IUPAC codes found here https://www.bioinformatics.org/sms2/iupac.html. -
+ ++ The characters used in these string representations of a sequence should be constrained to the IUPAC codes found here https://www.bioinformatics.org/sms2/iupac.html. +
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
-
-
+
+
+
+
+
+
+
+