# hgvs Documention: Examples

This notebook is being drafted to run and review the code presented in the hgvs documentation that is in the "Creating a SequenceVariant from scratch" section (https://hgvs.readthedocs.io/en/stable/examples/creating-a-variant.html#overview).  

Errors will be addressed through "troubleshooting: (issue)" sections.  These sections will not include any documentation.  Rather cite "help(*function*)" which can be found at the very end of the document.

## Step 1:  Make an Interval to defined a position of the edit

In [2]:
import hgvs.location
import hgvs.posedit

In [3]:
start = hgvs.location.BaseOffsetPosition(base=200,offset=-6,datum=hgvs.location.CDS_START)
start, str(start)

AttributeError: 'module' object has no attribute 'CDS_START'

In [4]:
end = hgvs.location.BaseOffsetPosition(base=22,datum=hgvs.location.CDS_END)
end, str(end)

AttributeError: 'module' object has no attribute 'CDS_END'

## Troubleshooting:  "AttributeError: 'module' object has no attribute 'CDS_START/CDS_END'"

### Resolution: 
It seems that there was a typo in this section.  There seems to be `.Datum` missing before `.CDS_START`/`.CDS_END` and after `hgvs.location`.  This seems to be fixed by also using `hgvs.enums.Datum`.  Used help(*hgvs.location.BaseOffsetPosition*).

### Post-resolution Issues:
As a user, I am confused why there are a number of objects that can be entered for the datum variable.  I would like to have an explaination or a clearly defined workflow for using the datum variable.

In [7]:
dir(hgvs.enums.Datum), dir(hgvs.location.Datum)  

(['CDS_END',
  'CDS_START',
  'SEQ_START',
  '__class__',
  '__doc__',
  '__members__',
  '__module__'],
 ['CDS_END',
  'CDS_START',
  'SEQ_START',
  '__class__',
  '__doc__',
  '__members__',
  '__module__'])

In [110]:
start = hgvs.location.BaseOffsetPosition(base=200,offset=-6,datum=hgvs.location.Datum.CDS_START)
start, str(start)

(BaseOffsetPosition(base=200, offset=-6, datum=Datum.CDS_START, uncertain=False),
 '200-6')

In [111]:
end = hgvs.location.BaseOffsetPosition(base=22,datum=hgvs.enums.Datum.CDS_END)
end, str(end)

(BaseOffsetPosition(base=22, offset=0, datum=Datum.CDS_END, uncertain=False),
 '*22')

In [112]:
iv = hgvs.location.Interval(start=start,end=end)
iv, str(iv)

(Interval(start=200-6, end=*22, uncertain=False), '200-6_*22')

## Make an edit

In [11]:
import hgvs.edit

In [12]:
edit = hgvs.edit.NARefAlt(ref='A',alt='T')
edit, str(edit)

(NARefAlt(ref='A', alt='T', uncertain=False), 'A>T')

In [13]:
posedit = hgvs.posedit.PosEdit(pos=iv,edit=edit)
posedit, str(posedit)

(PosEdit(pos=200-6_*22, edit=A>T, uncertain=False), '200-6_*22A>T')

## Make a variant

In [14]:
import hgvs.variant

ImportError: No module named variant

In [114]:
var = hgvs.variant.SequenceVariant(ac='NM_01234.5', type='c', posedit=posedit)
var, str(var)

AttributeError: 'module' object has no attribute 'variant'

## Troubleshooting: "ImportError: No module named variant" & "AttributeError: 'module' object has no attribute 'variant'"

### Resolution: 
This seems to be another typo.  Used help(*hgvs*) and executed the example that was displayed in the "Description" section.  Used help(*hgvs.sequencevariant*, *hgvs.sequencevariant.SequenceVariant*) and explored how to create a variant with the `posedit` variable as the variant.

### Post-resolution Issues:
As a user, I am confused why there isn't any clear instruction on how to build variants and would like clear documentation for a reproducible workflow.  

In [68]:
import hgvs.dataproviders.uta
import hgvs.parser
import hgvs.variantmapper
import hgvs.assemblymapper

In [123]:
seq = hgvs.sequencevariant.SequenceVariant(ac='NG_008376.4', type='g', posedit=None)
seq, str(seq)

(SequenceVariant(ac=NG_008376.4, type=g, posedit=None), 'NG_008376.4:g.?')

In [126]:
seq.posedit = posedit
seq, str(seq)

(SequenceVariant(ac=NG_008376.4, type=g, posedit=200-6_*22A>T),
 'NG_008376.4:g.200-6_*22A>T')

In [107]:
val1 = hgvs.sequencevariant.validate_type_ac_pair(ac=seq.ac, type=seq.type)
val2 = hgvs.validator.Validator(seq)
val1, val2

((<ValidationLevel.VALID: 1>,
  'Accession (NG_008376.4) is compatible with variant type g'),
 <hgvs.validator.Validator at 0x7f7b73250c50>)

In [85]:
hdp = hgvs.dataproviders.uta.connect()
am = hgvs.assemblymapper.AssemblyMapper(hdp,assembly_name="GRCh37", alt_aln_method="splign",replace_reference=True)

In [98]:
transcripts = am.relevant_transcripts(seq)
sorted(transcripts)

[]

In [128]:
seq2 = hp.parse_hgvs_variant("NG_011806.1:g.41721G>A")
seq2, am.relevant_transcripts(seq2), hgvs.sequencevariant.validate_type_ac_pair(ac=seq2.ac, type=seq2.type)

(SequenceVariant(ac=NG_011806.1, type=g, posedit=41721G>A),
 ['NM_000130.4'],
 (<ValidationLevel.VALID: 1>,
  'Accession (NG_011806.1) is compatible with variant type g'))

In [5]:
help(hgvs.location.BaseOffsetPosition)

Help on class BaseOffsetPosition in module hgvs.location:

class BaseOffsetPosition(__builtin__.object)
 |  Class for dealing with CDS coordinates in transcript variants.
 |  
 |  This class models CDS positions using a `base` coordinate, which is
 |  measured relative to a specified `datum` (CDS_START or CDS_END), and
 |  an `offset`, which is 0 for exonic positions and non-zero for intronic
 |  positions.  **Positions and offsets are 1-based**, with no 0, per the HGVS
 |  recommendations.  (If you"re using this with UTA, be aware that UTA
 |  uses interbase coordinates.)
 |  
 |  +----------+------------+-------+---------+------------------------------------------+
 |  | hgvs     | datum      | base  | offset  | meaning                                  |
 |  | r.55     | SEQ_START  |   55  |      0  | RNA position 55                          |
 |  +----------+------------+-------+---------+------------------------------------------+
 |  | c.55     | CDS_START  |   55  |      0  | CDS

In [117]:
help(hgvs.sequencevariant), help(hgvs.sequencevariant.SequenceVariant)

Help on module hgvs.sequencevariant in hgvs:

NAME
    hgvs.sequencevariant - represents simple sequence-based variants

FILE
    /home/aaron/biocommons/hgvs/sequencevariant.py

CLASSES
    __builtin__.object
        SequenceVariant
    
    class SequenceVariant(__builtin__.object)
     |  represents a basic HGVS variant.  The only requirement is that each
     |  component can be stringified; for example, passing pos as either a string
     |  or an hgvs.location.CDSInterval (for example) are both intended uses
     |  
     |  Methods defined here:
     |  
     |  __eq__(self, other)
     |  
     |  __ge__(self, other)
     |      Automatically created by attrs.
     |  
     |  __getstate__ = slots_getstate(self)
     |      Automatically created by attrs.
     |  
     |  __gt__(self, other)
     |      Automatically created by attrs.
     |  
     |  __init__(self, ac, type, posedit)
     |  
     |  __le__(self, other)
     |      Automatically created by attrs.
     |  
     

(None, None)

In [16]:
help(hgvs)

Help on package hgvs:

NAME
    hgvs

FILE
    /home/aaron/biocommons/hgvs/__init__.py

DESCRIPTION
    hgvs is a package to parse, format, and manipulate biological sequence
    variants.  See https://github.com/biocommons/hgvs/ for details.
    
    Example use:
    
    >>> import hgvs.dataproviders.uta
    >>> import hgvs.parser
    >>> import hgvs.variantmapper
    
    # start with these variants as strings
    >>> hgvs_g, hgvs_c = "NC_000007.13:g.36561662C>T", "NM_001637.3:c.1582G>A"
    
    # parse the genomic variant into a Python structure
    >>> hp = hgvs.parser.Parser()
    >>> var_g = hp.parse_hgvs_variant(hgvs_g)
    >>> var_g
    SequenceVariant(ac=NC_000007.13, type=g, posedit=36561662C>T)
    
    # SequenceVariants are composed of structured objects, e.g.,
    >>> var_g.posedit.pos.start
    SimplePosition(base=36561662, uncertain=False)
    
    # format by stringification 
    >>> str(var_g)
    'NC_000007.13:g.36561662C>T'
    
    # initialize the mapper for GRC

In [64]:
help(hgvs.sequencevariant.validate_type_ac_pair),help(hgvs.validator.Validator)

Help on function validate_type_ac_pair in module hgvs.utils.validation:

validate_type_ac_pair(type, ac)
    validate that accession is correct for variant type AND that
    accession is fully specified.

Help on class Validator in module hgvs.validator:

class Validator(__builtin__.object)
 |  invoke intrinsic and extrinsic validation
 |  
 |  Methods defined here:
 |  
 |  __init__(self, hdp, strict=True)
 |  
 |  validate(self, var, strict=None)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



(None, None)

In [92]:
help(am)

Help on AssemblyMapper in module hgvs.assemblymapper object:

class AssemblyMapper(hgvs.variantmapper.VariantMapper)
 |  Provides simplified variant mapping for a single assembly and
 |  transcript-reference alignment method.
 |  
 |  AssemblyMapper is instantiated with an assembly name and
 |  alt_aln_method. These enable the following conveniences over
 |  VariantMapper:
 |  
 |  * The assembly and alignment method are used to
 |    automatically select an appropriate chromosomal reference
 |    sequence when mapping from a transcript to a genome (i.e.,
 |    c_to_g(...) and n_to_g(...)).
 |  
 |  * A new method, relevant_trancripts(g_variant), returns a list of
 |    transcript accessions available for the specified variant. These
 |    accessions are candidates mapping from genomic to trancript
 |    coordinates (i.e., g_to_c(...) and g_to_n(...)).
 |  
 |  Note: AssemblyMapper supports only chromosomal references (e.g.,
 |  NC_000006.11). It does not support contigs or other genom