# hgvs Documention: Examples

This notebook is being drafted to run and review the code presented in the hgvs documentation that is in the "Creating a SequenceVariant from scratch" section (https://hgvs.readthedocs.io/en/stable/examples/creating-a-variant.html#overview).  

Errors will be addressed through "Troubleshooting: (Error)" sections.  These sections trigger a workflow to resolve the error.  After a resolution has been implemented, a markdown down cell will state "End of troubleshooting for ..." and resume the steps.

## Step 1: Import necessary modules

In [1]:
import hgvs.location # hgvs.location.BaseOffsetPosition
import hgvs.posedit  # hgvs.posedit.PosEdit
import hgvs.edit     # hgvs.edit.NARefAlt
import hgvs.variant  # hgvs.sequencevariant.SequenceVariant
import copy

ImportError: No module named variant

## Troubleshooting:  "ImportError: No module named variant"

### Resolution: 
Execute example from `help(hgvs)`.  Import `hgvs.variantmapper`.  If `hgvs.variant` is in docs, defer to the following level for use.  Ex. `hgvs.variant.SequenceVariant` is now `hgvs.sequencevariant.SequenceVariant`.

### Comments:


In [2]:
#check dir(hgvs) and help(hgvs), found no 'variant' in list of classes
dir(hgvs), help(hgvs)

Help on package hgvs:

NAME
    hgvs

FILE
    /home/aaron/biocommons/hgvs/__init__.py

DESCRIPTION
    hgvs is a package to parse, format, and manipulate biological sequence
    variants.  See https://github.com/biocommons/hgvs/ for details.
    
    Example use:
    
    >>> import hgvs.dataproviders.uta
    >>> import hgvs.parser
    >>> import hgvs.variantmapper
    
    # start with these variants as strings
    >>> hgvs_g, hgvs_c = "NC_000007.13:g.36561662C>T", "NM_001637.3:c.1582G>A"
    
    # parse the genomic variant into a Python structure
    >>> hp = hgvs.parser.Parser()
    >>> var_g = hp.parse_hgvs_variant(hgvs_g)
    >>> var_g
    SequenceVariant(ac=NC_000007.13, type=g, posedit=36561662C>T)
    
    # SequenceVariants are composed of structured objects, e.g.,
    >>> var_g.posedit.pos.start
    SimplePosition(base=36561662, uncertain=False)
    
    # format by stringification 
    >>> str(var_g)
    'NC_000007.13:g.36561662C>T'
    
    # initialize the mapper for GRC

(['__builtins__',
  '__doc__',
  '__file__',
  '__name__',
  '__package__',
  '__path__',
  '__version__',
  '_is_released_version',
  'absolute_import',
  'config',
  'division',
  'edit',
  'enums',
  'exceptions',
  'global_config',
  'location',
  'logger',
  'logging',
  'pkg_resources',
  'posedit',
  'print_function',
  're',
  'unicode_literals',
  'utils',
 None)

In [3]:
# follow example in Description
import hgvs.dataproviders.uta
import hgvs.parser
import hgvs.variantmapper

In [4]:
# choosing my own variant, https://www.ncbi.nlm.nih.gov/snp/rs6025
rs6025 = 'NC_000001.10:g.169519049T>C'

In [5]:
# parse the variant
hp = hgvs.parser.Parser()
rs6025P = hp.parse_hgvs_variant(rs6025)
rs6025P

SequenceVariant(ac=NC_000001.10, type=g, posedit=169519049T>C)

In [6]:
# SequenceVariant can be pulled apart
rs6025P.posedit.pos, rs6025P.posedit.edit, rs6025P.ac, rs6025P.type

(Interval(start=169519049, end=169519049, uncertain=False),
 NARefAlt(ref='T', alt='C', uncertain=False),
 'NC_000001.10',
 'g')

In [7]:
# create dataprovider variable -- what does this do?
hdp = hgvs.dataproviders.uta.connect()

In [8]:
# create assemblymapper variable
am = hgvs.assemblymapper

AttributeError: 'module' object has no attribute 'assemblymapper'

## Troubleshooting:  "AttributeError: 'module' object has no attribute 'assemblymapper'"

### Resolution: 
Import `hgvs.assemblymapper`.

### Comments:


In [9]:
# import module
import hgvs.assemblymapper

End of troubleshooting for **"AttributeError: 'module' object has no attribute 'assemblymapper'"**

In [10]:
# create assemblymapper variable, determine transcripts effected
am = hgvs.assemblymapper.AssemblyMapper(hdp, alt_aln_method='splign', assembly_name='GRCh37', replace_reference=True)
transcripts = am.relevant_transcripts(rs6025P)
sorted(transcripts)

['NM_000130.4']

In [11]:
# map variant to coding sequence
rs6025c = am.g_to_c(rs6025P,transcripts[0])
rs6025c

SequenceVariant(ac=NM_000130.4, type=c, posedit=1601=)

In [12]:
# pull apart the SequenceVariant
rs6025c.ac, rs6025c.posedit.edit, rs6025c.posedit.pos.start, rs6025c.type

('NM_000130.4',
 NARefAlt(ref=u'G', alt=u'G', uncertain=False),
 BaseOffsetPosition(base=1601, offset=0, datum=Datum.CDS_START, uncertain=False),
 u'c')

End of troubleshooting for **"ImportError: No module named variant"** 

## Step 2: Make an Interval to define a position of the edit

In [13]:
start = hgvs.location.BaseOffsetPosition(base=200,offset=-6,datum=hgvs.location.CDS_START)
start, str(start)

AttributeError: 'module' object has no attribute 'CDS_START'

## Troubleshooting:  "AttributeError: 'module' object has no attribute 'CDS_START'"

### Resolution: 
Use `hgvs.location.Datum.` prefix.


### Comments:


In [14]:
# Check dir() on hgvs.location and hgvs.posedit
dir(hgvs.location)

['AAPosition',
 'BaseOffsetInterval',
 'BaseOffsetPosition',
 'Datum',
 'HGVSInvalidIntervalError',
 'HGVSUnsupportedOperationError',
 'Interval',
 'SimplePosition',
 'ValidationLevel',
 '__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 'aa1_to_aa3',
 'absolute_import',
 'attr',
 'division',
 'hgvs',
 'print_function',
 'total_ordering',
 'unicode_literals']

In [23]:
# read doc on 'Datum' and check class list
help(hgvs.location.Datum), dir(hgvs.location.Datum)

Help on class Datum in module hgvs.enums:

Datum = <enum 'Datum'>


(None,
 ['CDS_END',
  'CDS_START',
  'SEQ_START',
  '__class__',
  '__doc__',
  '__members__',
  '__module__'])

End of troubleshooting for **"AttributeError: 'module' object has no attribute 'CDS_START'"**
## Step 2 cont.

In [17]:
start = hgvs.location.BaseOffsetPosition(base=200,offset=-6,datum=hgvs.location.Datum.CDS_START)
start, str(start)

(BaseOffsetPosition(base=200, offset=-6, datum=Datum.CDS_START, uncertain=False),
 '200-6')

In [18]:
end = hgvs.location.BaseOffsetPosition(base=22,datum=hgvs.location.Datum.CDS_END)
end, str(end)

(BaseOffsetPosition(base=22, offset=0, datum=Datum.CDS_END, uncertain=False),
 '*22')

In [19]:
iv = hgvs.location.Interval(start=start,end=end)
iv, str(iv)

(Interval(start=200-6, end=*22, uncertain=False), '200-6_*22')

## Step 3:  Make an edit object

In [20]:
edit = hgvs.edit.NARefAlt(ref='A',alt='T')
edit, str(edit)

(NARefAlt(ref='A', alt='T', uncertain=False), 'A>T')

In [21]:
posedit = hgvs.posedit.PosEdit(pos=iv,edit=edit)
posedit, str(posedit)

(PosEdit(pos=200-6_*22, edit=A>T, uncertain=False), '200-6_*22A>T')

In [44]:
var = hgvs.variant.SequenceVariant(ac=transcripts[0], type='g', posedit=posedit)
var, str(var)

AttributeError: 'module' object has no attribute 'variant'

In [28]:
# see AttributeError: 'module' object has no attribute 'variant' troubleshooting
dir(hgvs), dir(hgvs.sequencevariant)

(['__builtins__',
  '__doc__',
  '__file__',
  '__name__',
  '__package__',
  '__path__',
  '__version__',
  '_is_released_version',
  'absolute_import',
  'alignmentmapper',
  'assemblymapper',
  'config',
  'dataproviders',
  'decorators',
  'division',
  'edit',
  'enums',
  'exceptions',
  'global_config',
  'hgvsposition',
  'location',
  'logger',
  'logging',
  'normalizer',
  'parser',
  'pkg_resources',
  'posedit',
  'print_function',
  're',
  'sequencevariant',
  'unicode_literals',
  'utils',
  'validator',
  'variantmapper',
 ['SequenceVariant',
  'ValidationLevel',
  '__builtins__',
  '__doc__',
  '__file__',
  '__name__',
  '__package__',
  'absolute_import',
  'attr',
  'division',
  'hgvs',
  'print_function',
  're',
  'unicode_literals',
  'validate_type_ac_pair'])

In [53]:
# hgvs.sequencevariant is an accepted class with SequenceVariant as a class
var = hgvs.sequencevariant.SequenceVariant(ac=transcripts[0], type='g', posedit=posedit)
var, str(var)

(SequenceVariant(ac=NM_000130.4, type=g, posedit=200-6_*22A>T),
 'NM_000130.4:g.200-6_*22A>T')

## Step 4: Validate the variant
See hgvs.validator.Validator for validation options.

In [33]:
dir(hgvs.validator.Validator), help(hgvs.validator.Validator)

Help on class Validator in module hgvs.validator:

class Validator(__builtin__.object)
 |  invoke intrinsic and extrinsic validation
 |  
 |  Methods defined here:
 |  
 |  __init__(self, hdp, strict=True)
 |  
 |  validate(self, var, strict=None)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



(['__class__',
  '__delattr__',
  '__dict__',
  '__doc__',
  '__format__',
  '__getattribute__',
  '__hash__',
  '__init__',
  '__module__',
  '__new__',
  '__reduce__',
  '__reduce_ex__',
  '__repr__',
  '__setattr__',
  '__sizeof__',
  '__str__',
  '__subclasshook__',
  '__weakref__',
  'validate'],
 None)

In [36]:
hgvs.validator.Validator.validate(var)

TypeError: unbound method validate() must be called with Validator instance as first argument (got SequenceVariant instance instead)

In [38]:
hgvs.validator.Validator.validate(var.validate)

TypeError: unbound method validate() must be called with Validator instance as first argument (got instancemethod instance instead)

## Troubleshooting:  "TypeError: unbound method validate() must be called with Validator instance as first argument"

### Resolution: 
Use `hgvs.sequencevariant.validate_type_ac_pair(ac= , type= )`.


### Comments:


In [54]:
# hgvs.sequencevariant has validate_type_ac_pair
val = hgvs.sequencevariant.validate_type_ac_pair(ac=var.ac, type=var.type)
val

(<ValidationLevel.ERROR: 3>,
 'Accession (NM_000130.4) is not compatible with variant type g')

End of troubleshooting for **"TypeError: unbound method validate() must be called with Validator instance as first argument"**

In [55]:
var.type = 'c'

In [56]:
val = hgvs.sequencevariant.validate_type_ac_pair(ac=var.ac, type=var.type)
val

(<ValidationLevel.VALID: 1>,
 'Accession (NM_000130.4) is compatible with variant type c')

## Step 5: Update variant using copy.deepcopy

In [78]:
import copy

In [86]:
var2 = copy.deepcopy(var)
var2

SequenceVariant(ac=NM_000130.4, type=c, posedit=200-6_*22A>T)

In [87]:
var2.posedit.pos.start.base = 456

In [88]:
str(var2)

'NM_000130.4:c.456-6_*22A>T'

In [89]:
var2.posedit.edit.alt = 'CT'

In [90]:
str(var2)

'NM_000130.4:c.456-6_*22delinsCT'

In [91]:
var2.posedit.pos.end.uncertain = True

In [92]:
str(var2)

HGVSUnsupportedOperationError: Cannot compare coordinates of uncertain positions

In [93]:
var2 = copy.deepcopy(var)
var2.posedit.pos.end.uncertain = True

In [94]:
str(var2)

HGVSUnsupportedOperationError: Cannot compare coordinates of uncertain positions

## Troubleshooting:  "HGVSUnsupportedOperationError: Cannot compare coordinates of uncertain positions"

### Resolution: 
None at this time.  


### Comments:
