### Genomes

ModelSEEDpy provides its own genome object type `modelseedpy.core.msgenome.MSGenome` to manipulate genomes

In [1]:
import modelseedpy
from modelseedpy.core.msgenome import MSGenome

#### Reading faa file

To load a genome we can read a `.faa` file that contains protein sequences

In [2]:
genome = MSGenome.from_fasta('GCF_000005845.2_ASM584v2_protein.faa', split=' ')

In [3]:
genome

<modelseedpy.core.msgenome.MSGenome at 0x7f48a414d100>

#### Manipulating genes

Each gene is stored as a `modelseedpy.core.msgenome.MSFeature` in the `.features` of type `cobra.core.dictlist.DictList` similiar to the cobrapy `.reactions` and `.metabolites` in the `cobra.core.Model`

In [4]:
len(genome.features)

4285

In [5]:
gene = genome.features.get_by_id('NP_414542.1')
gene

<modelseedpy.core.msgenome.MSFeature at 0x7f48f02c80d0>

In [6]:
type(gene)

modelseedpy.core.msgenome.MSFeature

##### Gene annotation
Annotation is store as an **ontology term**. When loading from a `.faa` file no ontology term is present but we can add them later.

In [7]:
gene.ontology_terms

{}

In [8]:
gene.description

'thr operon leader peptide [Escherichia coli str. K-12 substr. MG1655]'

In [9]:
gene.add_ontology_term('annotation', gene.description)
gene.ontology_terms

{'annotation': ['thr operon leader peptide [Escherichia coli str. K-12 substr. MG1655]']}

#### RAST
It is possible to annotate genomes with RAST by calling the `RastClient`

In [10]:
from modelseedpy.core.rast_client import RastClient
rast = RastClient()

In [11]:
rast.annotate_genome(genome)

[{'id': 'C54F08A4-CDB3-11ED-A7E9-CAF09D6086F0',
  'parameters': ['-a',
   '-g',
   200,
   '-m',
   5,
   '-d',
   '/opt/patric-common/data/kmer_metadata_v2',
   '-u',
   'http://pear.mcs.anl.gov:6100/query'],
  'hostname': 'pear',
  'tool_name': 'kmer_search',
  'execution_time': 1680040751.14837},
 {'id': 'C5638324-CDB3-11ED-A7E9-CAF09D6086F0',
  'parameters': ['annotate_hypothetical_only=1',
   'dataset_name=Release70',
   'kmer_size=8'],
  'tool_name': 'KmerAnnotationByFigfam',
  'hostname': 'pear',
  'execution_time': 1680040751.28257},
 {'parameters': [],
  'id': 'C5944E1E-CDB3-11ED-8217-51F29F6086F0',
  'execute_time': 1680040751.60236,
  'tool_name': 'annotate_proteins_similarity',
  'hostname': 'pear'}]

RAST annotation is stored in the ontology term **RAST** and this is used as default to build metabolic models with the ModelSEED templates

In [12]:
gene.ontology_terms

{'annotation': ['thr operon leader peptide [Escherichia coli str. K-12 substr. MG1655]'],
 'RAST': ['Thr operon leader peptide']}