Introduction to Pygenprop
=========================
An python library for interactive programmatic usage of Genome Properties
------------------------------------------------------------------------

InterProScan files used in this tutorial can be found at:
- https://raw.githubusercontent.com/Micromeda/pygenprop/master/docs/source/_static/tutorial/E_coli_K12.tsv
- https://raw.githubusercontent.com/Micromeda/pygenprop/master/docs/source/_static/tutorial/E_coli_K12.faa
- https://raw.githubusercontent.com/Micromeda/pygenprop/master/docs/source/_static/tutorial/E_coli_O157_H7.tsv
- https://raw.githubusercontent.com/Micromeda/pygenprop/master/docs/source/_static/tutorial/E_coli_O157_H7.faa

### Creation and use of GenomePropertyTree objects
GenomePropertyTree objects allow for the programmatic exploration of the Genome properties database.

In [1]:
import requests
from io import StringIO
from pygenprop.results import GenomePropertiesResults, GenomePropertiesResultsWithMatches, \
    load_assignment_caches_from_database, load_assignment_caches_from_database_with_matches
from pygenprop.database_file_parser import parse_genome_properties_flat_file
from pygenprop.assignment_file_parser import parse_interproscan_file, \
    parse_interproscan_file_and_fasta_file
from sqlalchemy import create_engine

In [2]:
# The Genome Properties is a flat-file database that can be fount on Github.
# The latest release of the database can be found at the following URL.

genome_properties_database_url = 'https://raw.githubusercontent.com/ebi-pf-team/genome-properties/master/flatfiles/genomeProperties.txt'

# For this tutorial, we will stream the file directly into the Jupyter notebook. Alternatively, 
# one could be downloaded the file with the UNIX wget or curl commands.

with requests.Session() as current_download:
    response = current_download.get(genome_properties_database_url, stream=True)
    tree = parse_genome_properties_flat_file(StringIO(response.text))

In [3]:
# There are 1286 properties in the Genome Properties tree.
len(tree)

1286

In [4]:
# Find all properties of type "GUILD".
for genome_property in tree:
    if genome_property.type == 'GUILD':
        print(genome_property.name)

Coenzyme F420 utilization
CRISPR region
Reduction of oxidized methionine
Phage: major features
Resistance to Reactive Oxygen Species (ROS)
tRNA aminoacylation
Toxin-antitoxin system, type II
Protein-coding palindromic elements
Flagellar components of unknown function
Bacillithiol utilization
Toxin-antitoxin system, type I
Toxin-antitoxin system, type III
Abortive infection proteins
Energy-coupling factor transporters
Initiator caspases of the apoptosis extrinsic pathway
Executor caspases of apoptosis


In [5]:
# Get property by identifier
virulence = tree['GenProp0074']

In [6]:
virulence

GenProp0074, Type: CATEGORY, Name: Virulence, Thresh: 0, References: False, Databases: False, Steps: True, Parents: True, Children: True, Public: True

In [7]:
# Iterate to get the identifiers of child properties of virulence
types_of_vir = [genprop.id for genprop in virulence.children]

In [8]:
steps_of_type_3_secretion = [step.name for step in virulence.children[0].steps]

In [9]:
steps_of_type_3_secretion

['Type III secretion protein HpaP',
 'Type III secretion system, HrpB1/HrpK',
 'Type III secretion protein HrpB2',
 'Type III secretion protein HrpB4',
 'Type III secretion protein HrpB7',
 'Type III secretion regulator, YopN/LcrE/InvE/MxiC',
 'Type III secretion protein, LcrG',
 'Type III secretion system, low calcium response, chaperone LcrH/SycD',
 'Type III secretion system regulator, LcrR',
 'Type III secretion apparatus protein OrgA/MxiK',
 'Type III secretion system, PrgH/EprH',
 'Surface presentation of antigens protein SpaK',
 'Secretion system effector C, SseC-like',
 'Tir chaperone protein (CesT) family',
 'Type III secretion system chaperone SycN',
 'Type III secretion system effector delivery regulator TyeA-related',
 'YopD-like',
 'Type III secretion system needle length determinant',
 'Type III secretion system, needle protein',
 'Proximal regulatory components',
 'NolW-like',
 'Secreted proteins, effectors',
 'Type III secretion system chaperone YscB',
 'Type III secret

### Creation and use of GenomePropertiesResults objects
GenomePropertiesResults are used to compare property and step assignments across organisms programmatically.

In [10]:
# Parse InterProScan files
with open('E_coli_K12.tsv') as ipr5_file_one:
    assignment_cache_1 = parse_interproscan_file(ipr5_file_one)

In [11]:
with open('E_coli_O157_H7.tsv') as ipr5_file_two:
    assignment_cache_2 = parse_interproscan_file(ipr5_file_two)

In [12]:
# Create results comparison object
results = GenomePropertiesResults(assignment_cache_1, assignment_cache_2, properties_tree=tree)

In [13]:
results.sample_names

['E_coli_K12', 'E_coli_O157_H7']

In [14]:
# The property results property is used to compare two property assignments across samples.
results.property_results

Unnamed: 0_level_0,E_coli_K12,E_coli_O157_H7
Property_Identifier,Unnamed: 1_level_1,Unnamed: 2_level_1
GenProp0001,YES,YES
GenProp0002,NO,NO
GenProp0007,YES,YES
GenProp0010,NO,NO
GenProp0011,NO,NO
...,...,...
GenProp2095,NO,NO
GenProp2096,NO,NO
GenProp2097,NO,NO
GenProp2098,NO,NO


In [15]:
# The step results property is used to compare two step assignments across samples.
results.step_results

Unnamed: 0_level_0,Unnamed: 1_level_0,E_coli_K12,E_coli_O157_H7
Property_Identifier,Step_Number,Unnamed: 2_level_1,Unnamed: 3_level_1
GenProp0001,1,YES,YES
GenProp0001,2,YES,YES
GenProp0001,3,YES,YES
GenProp0001,4,YES,YES
GenProp0001,5,YES,YES
...,...,...,...
GenProp2099,7,NO,NO
GenProp2099,8,NO,NO
GenProp2099,9,NO,NO
GenProp2099,10,NO,NO


In [16]:
# Get properties with differing assignments
results.differing_property_results

Unnamed: 0_level_0,E_coli_K12,E_coli_O157_H7
Property_Identifier,Unnamed: 1_level_1,Unnamed: 2_level_1
GenProp0051,NO,YES
GenProp0052,NO,PARTIAL
GenProp0059,NO,YES
GenProp0111,YES,PARTIAL
GenProp0139,NO,PARTIAL
GenProp0176,YES,NO
GenProp0183,YES,PARTIAL
GenProp0232,PARTIAL,YES
GenProp0236,PARTIAL,YES
GenProp0283,YES,NO


In [17]:
# Get property assignments for virulence properties
results.get_results(*types_of_vir, steps=False)

Unnamed: 0_level_0,E_coli_K12,E_coli_O157_H7
Property_Identifier,Unnamed: 1_level_1,Unnamed: 2_level_1
GenProp0052,NO,PARTIAL
GenProp0648,YES,YES
GenProp0707,NO,NO


In [18]:
# Get step assignments for virulence properties
results.get_results(*types_of_vir, steps=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,E_coli_K12,E_coli_O157_H7
Property_Identifier,Step_Number,Unnamed: 2_level_1,Unnamed: 3_level_1
GenProp0052,1,NO,NO
GenProp0052,2,NO,NO
GenProp0052,3,NO,NO
GenProp0052,4,NO,NO
GenProp0052,5,NO,NO
GenProp0052,6,NO,YES
GenProp0052,7,NO,NO
GenProp0052,8,NO,YES
GenProp0052,9,NO,NO
GenProp0052,10,YES,YES


In [19]:
# Get counts of virulence properties assigned YES, NO, and PARTIAL per organism
results.get_results_summary(*types_of_vir, steps=False, normalize=False)

Unnamed: 0,E_coli_K12,E_coli_O157_H7
NO,2.0,1
PARTIAL,0.0,1
YES,1.0,1


In [20]:
# Get counts of virulence steps assigned YES, NO, and PARTIAL per organism
results.get_results_summary(*types_of_vir, steps=True, normalize=False)

Unnamed: 0,E_coli_K12,E_coli_O157_H7
NO,46,27
YES,9,28


In [21]:
# Get percentages of virulence steps assigned YES, NO, and PARTIAL per organism
results.get_results_summary(*types_of_vir, steps=True, normalize=True)

Unnamed: 0,E_coli_K12,E_coli_O157_H7
NO,83.636364,49.090909
YES,16.363636,50.909091


### Creation and use of GenomePropertiesResultsWithMatches objects
GenomePropertiesResultsWithMatches are an extension of GenomePropertiesResults objects that provides methods for comparing the InterProScan match information and protein sequences that support the existence of property steps.

In [22]:
# Parse InterProScan files and FASTA files
with open('./E_coli_K12.tsv') as ipr5_file_one:
    with open('./E_coli_K12.faa') as fasta_file_one:
        extended_cache_one = parse_interproscan_file_and_fasta_file(ipr5_file_one, fasta_file_one)

In [23]:
# Parse InterProScan files and FASTA files
with open('./E_coli_O157_H7.tsv') as ipr5_file_two:
    with open('./E_coli_O157_H7.faa') as fasta_file_two:
        extended_cache_two = parse_interproscan_file_and_fasta_file(ipr5_file_two, fasta_file_two)

In [24]:
extended_results = GenomePropertiesResultsWithMatches(extended_cache_one,
                                                      extended_cache_two,
                                                      properties_tree=tree)

In [25]:
# GenomePropertiesResultsWithMatches objects possess the same 
# results comparison methods as GenomePropertiesResults objects
extended_results.property_results

Unnamed: 0_level_0,E_coli_K12,E_coli_O157_H7
Property_Identifier,Unnamed: 1_level_1,Unnamed: 2_level_1
GenProp0001,YES,YES
GenProp0002,NO,NO
GenProp0007,YES,YES
GenProp0010,NO,NO
GenProp0011,NO,NO
...,...,...
GenProp2095,NO,NO
GenProp2096,NO,NO
GenProp2097,NO,NO
GenProp2098,NO,NO


In [26]:
extended_results.step_results

Unnamed: 0_level_0,Unnamed: 1_level_0,E_coli_K12,E_coli_O157_H7
Property_Identifier,Step_Number,Unnamed: 2_level_1,Unnamed: 3_level_1
GenProp0001,1,YES,YES
GenProp0001,2,YES,YES
GenProp0001,3,YES,YES
GenProp0001,4,YES,YES
GenProp0001,5,YES,YES
...,...,...,...
GenProp2099,7,NO,NO
GenProp2099,8,NO,NO
GenProp2099,9,NO,NO
GenProp2099,10,NO,NO


In [27]:
# Get matches and protein sequences that support properties and steps. 
extended_results.step_matches

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Signature_Accession,Protein_Accession,E-value,Sequence
Sample_Name,Property_Identifier,Step_Number,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
E_coli_K12,GenProp0001,1,TIGR00034,P00888,6.700000e-176,MQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKSISD...
E_coli_K12,GenProp0001,1,TIGR00034,P0AB91,1.000000e-183,MNYQNDDLRIKEIKELLPPVALLEKFPATENAANTVAHARKAIHKI...
E_coli_K12,GenProp0001,1,TIGR00034,P00887,1.400000e-177,MNRTDELRTARIESLVTPAELALRYPVTPGVATHVTDSRRRIEKIL...
E_coli_K12,GenProp0001,2,TIGR01357,P07639,3.400000e-123,MERIVVTLGERSYPITIASGLFNEPASFLPLKSGEQVMLVTNETLA...
E_coli_K12,GenProp0001,3,TIGR01093,P05194,1.000000e-87,MKTVTVKDLVIGTGAPKIIVSLMAKDIASVKSEALAYREADFDILE...
...,...,...,...,...,...,...
E_coli_O157_H7,GenProp1764,2,PF01687,P0AG42,4.200000e-37,MKLIRGIHNLSQAPQEGCVLTIGNFDGVHRGHRALLQGLQEEGRKR...
E_coli_O157_H7,GenProp2089,11,TIGR02093,Q8X708,0.000000e+00,MSQPIFNDKQFQEALSRQWQRYGLNSAAEMTPRQWWLAVSEALAEM...
E_coli_O157_H7,GenProp2089,11,TIGR02093,Q8X6Y1,0.000000e+00,MNAPFTYSSPTLSVEALKHSIAYKLMFTIGKDPVVANKHEWLNATL...
E_coli_O157_H7,GenProp2089,11,cd04300,Q8X708,0.000000e+00,MSQPIFNDKQFQEALSRQWQRYGLNSAAEMTPRQWWLAVSEALAEM...


In [28]:
# Get only matches for K12
extended_results.get_sample_matches('E_coli_K12', top=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Signature_Accession,Protein_Accession,E-value,Sequence
Property_Identifier,Step_Number,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
GenProp0001,1,TIGR00034,P00888,6.700000e-176,MQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKSISD...
GenProp0001,1,TIGR00034,P0AB91,1.000000e-183,MNYQNDDLRIKEIKELLPPVALLEKFPATENAANTVAHARKAIHKI...
GenProp0001,1,TIGR00034,P00887,1.400000e-177,MNRTDELRTARIESLVTPAELALRYPVTPGVATHVTDSRRRIEKIL...
GenProp0001,2,TIGR01357,P07639,3.400000e-123,MERIVVTLGERSYPITIASGLFNEPASFLPLKSGEQVMLVTNETLA...
GenProp0001,3,TIGR01093,P05194,1.000000e-87,MKTVTVKDLVIGTGAPKIIVSLMAKDIASVKSEALAYREADFDILE...
...,...,...,...,...,...
GenProp1764,2,PF01687,P0AG40,4.200000e-37,MKLIRGIHNLSQAPQEGCVLTIGNFDGVHRGHRALLQGLQEEGRKR...
GenProp2089,11,TIGR02093,P00490,0.000000e+00,MSQPIFNDKQFQEALSRQWQRYGLNSAAEMTPRQWWLAVSEALAEM...
GenProp2089,11,TIGR02093,P0AC86,0.000000e+00,MNAPFTYSSPTLSVEALKHSIAYKLMFTIGKDPVVANKHEWLNATL...
GenProp2089,11,cd04300,P00490,0.000000e+00,MSQPIFNDKQFQEALSRQWQRYGLNSAAEMTPRQWWLAVSEALAEM...


In [29]:
type_three_secretion_property_id = types_of_vir[0] # From section above.


# Get matches for each Type III Secretion System component across both organisms.
extended_results.get_property_matches(type_three_secretion_property_id)

Unnamed: 0_level_0,Unnamed: 1_level_0,Signature_Accession,Protein_Accession,E-value,Sequence
Sample_Name,Step_Number,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
E_coli_K12,10,TIGR02555,Q46795,2.2999999999999998e-48,MSGNIGANPINNWNLLPLICLLSGCHFYRERFAERGFFYKVPDVLR...
E_coli_K12,22,PF03958,P45758,9.6e-16,MKGLNKITCCLLAALLMPCAGHAENEQYGANFNNADIRQFVEIVGQ...
E_coli_K12,22,PF03958,P45758,7.5e-11,MKGLNKITCCLLAALLMPCAGHAENEQYGANFNNADIRQFVEIVGQ...
E_coli_K12,22,PF03958,P45758,7e-17,MKGLNKITCCLLAALLMPCAGHAENEQYGANFNNADIRQFVEIVGQ...
E_coli_K12,22,PF03958,P34749,2.5e-09,MKQWIAALLLMLIPGVQAAKPQKVTLMVDDVPVAQVLQALAEQEKL...
E_coli_O157_H7,6,TIGR02568,Q8X6D8,2.7e-31,MAIHVEHVGVLERAREVSRLEDIITEDNEDIEAEMPKMRDDPAGKE...
E_coli_O157_H7,8,TIGR02552,Q7DB63,2.7e-46,MSRKFSSLEDIYDFYQDGGTLASLTNLTQQDLNDLHSYAYTAYQSG...
E_coli_O157_H7,10,TIGR02555,A0A0H3JNS9,1.8e-73,MNLALRKIIYAPISYIHPQRVSLNNTPINNPVLRSITNEMILLQYN...
E_coli_O157_H7,13,PF04888,Q7DB81,1.6e-09,MLNVNNDTLSVTSGVNTASGTSGITQSETGLSLDLQLVKSMNSSAG...
E_coli_O157_H7,14,PF05932,P58233,7e-22,MSSRSELLLEKFAEKIGIGSISFNENRLCSFAIDEIYYISLSDAND...


In [30]:
# Get lowest E-value matches for each Type III Secretion System component for E_coli_O157_H7.
extended_results.get_property_matches(type_three_secretion_property_id, sample='E_coli_O157_H7', top=True)

Unnamed: 0_level_0,Signature_Accession,Protein_Accession,E-value,Sequence
Step_Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
6,TIGR02568,Q8X6D8,2.7e-31,MAIHVEHVGVLERAREVSRLEDIITEDNEDIEAEMPKMRDDPAGKE...
8,TIGR02552,Q7DB63,2.7e-46,MSRKFSSLEDIYDFYQDGGTLASLTNLTQQDLNDLHSYAYTAYQSG...
10,TIGR02555,A0A0H3JNS9,1.8e-73,MNLALRKIIYAPISYIHPQRVSLNNTPINNPVLRSITNEMILLQYN...
13,PF04888,Q7DB81,1.6e-09,MLNVNNDTLSVTSGVNTASGTSGITQSETGLSLDLQLVKSMNSSAG...
14,PF05932,P58233,7e-22,MSSRSELLLEKFAEKIGIGSISFNENRLCSFAIDEIYYISLSDAND...
16,TIGR02511,Q7DB79,9.5e-20,MANGIEFNQNPASVFNSNSLDFELESQQLTQKNSSNISSPLINLQN...
19,TIGR02105,Q7DB83,5e-33,MNLSEITQQMGEVGKTLSDSVPELLNSTDLVNDPEKMLELQFAVQQ...
22,PF03958,Q7DB64,1.3e-18,MKKISFFIFTALFCCSAQAAPSSLEKRLGKSEYFIITKSSPVRAIL...
24,PF04888,Q7DB81,1.6e-09,MLNVNNDTLSVTSGVNTASGTSGITQSETGLSLDLQLVKSMNSSAG...
26,TIGR02516,Q7DB64,6.5e-185,MKKISFFIFTALFCCSAQAAPSSLEKRLGKSEYFIITKSSPVRAIL...


In [31]:
# Get lowest E-value matches for step 22 of Type III Secretion across both organisms. 
extended_results.get_step_matches(type_three_secretion_property_id, 22, top=True)

Unnamed: 0_level_0,Signature_Accession,Protein_Accession,E-value,Sequence
Sample_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
E_coli_K12,PF03958,P45758,7e-17,MKGLNKITCCLLAALLMPCAGHAENEQYGANFNNADIRQFVEIVGQ...
E_coli_O157_H7,PF03958,Q7DB64,1.3e-18,MKKISFFIFTALFCCSAQAAPSSLEKRLGKSEYFIITKSSPVRAIL...


In [32]:
# Get all matches for step 22 of Type III Secretion for E. coli K12. 
extended_results.get_step_matches(type_three_secretion_property_id, 22, top=False, sample='E_coli_K12')

Unnamed: 0_level_0,Signature_Accession,Protein_Accession,E-value,Sequence
Step_Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
22,PF03958,P45758,9.6e-16,MKGLNKITCCLLAALLMPCAGHAENEQYGANFNNADIRQFVEIVGQ...
22,PF03958,P45758,7.5e-11,MKGLNKITCCLLAALLMPCAGHAENEQYGANFNNADIRQFVEIVGQ...
22,PF03958,P45758,7e-17,MKGLNKITCCLLAALLMPCAGHAENEQYGANFNNADIRQFVEIVGQ...
22,PF03958,P34749,2.5e-09,MKQWIAALLLMLIPGVQAAKPQKVTLMVDDVPVAQVLQALAEQEKL...


In [33]:
# Get skbio protein objects for a particular step.
extended_results.get_supporting_proteins_for_step(type_three_secretion_property_id, 22, top=True)

[Protein
 ---------------------------------------------------------------------
 Metadata:
     'description': '(From E_coli_K12)'
     'id': 'P45758'
 Stats:
     length: 650
     has gaps: False
     has degenerates: False
     has definites: True
     has stops: False
 ---------------------------------------------------------------------
 0   MKGLNKITCC LLAALLMPCA GHAENEQYGA NFNNADIRQF VEIVGQHLGK TILIDPSVQG
 60  TISVRSNDTF SQQEYYQFFL SILDLYGYSV ITLDNGFLKV VRSANVKTSP GMIADSSRPG
 ...
 540 ETVVLGGLLD DFSKEQVSKV PLLGDIPLVG QLFRYTSTER AKRNLMVFIR PTIIRDDDVY
 600 RSLSKEKYTR YRQEQQQRID GKSKALVGSE DLPVLDENTF NSHAPAPSSR, Protein
 ---------------------------------------------------------------------
 Metadata:
     'description': '(From E_coli_O157_H7)'
     'id': 'Q7DB64'
 Stats:
     length: 512
     has gaps: False
     has degenerates: False
     has definites: True
     has stops: False
 ---------------------------------------------------------------------
 0   MKKISFFIFT ALFCCSAQAA PSSLE

#### A note on Scikit-Bio
Scikit-Bio is a numpy-based bioinformatics library that is a competitor to BioPython. It is optimized for building bioinformatics software. Because it is numpy-based, it is quite fast and can be used to perform operations such as alignments and phylogenetic tree building. Pygenprop integrates Scikit-Bio for reading and writing FASTA files and the get_supporting_proteins_for_step() function of GenomePropertiesResultsWithMatches objects returns a list of Scikit-Bio Sequence objects. These Sequence objects can be aligned using Scikit-Bio and used for building phylogenetic trees that compare proteins that support the same pathway step. Alignment and tree construction can be performed inside a Jupyter Notebook. 

See the following documentation and tutorials for more information:

- http://scikit-bio.org/docs/0.5.5/alignment.html
- http://scikit-bio.org/docs/0.5.5/tree.html
- https://nbviewer.jupyter.org/github/biocore/scikit-bio-cookbook/blob/master/Progressive%20multiple%20sequence%20alignment.ipynb
- https://nbviewer.jupyter.org/github/biocore/scikit-bio-cookbook/blob/master/Alignments%20and%20phylogenetic%20reconstruction.ipynb

In [34]:
# Write FASTA file containing the sequences of the lowest E-value matches for 
# Type III Secretion System component 22 across both organisms.
with open('type_3_step_22_top.faa', 'w') as out_put_fasta_file:
    extended_results.write_supporting_proteins_for_step_fasta(out_put_fasta_file, 
                                                              type_three_secretion_property_id, 
                                                              22, top=True)

In [35]:
# Write FASTA file containing the sequences all matches for 
# Type III Secretion System component 22 across both organisms.
with open('type_3_step_22_all.faa', 'w') as out_put_fasta_file:
    extended_results.write_supporting_proteins_for_step_fasta(out_put_fasta_file, 
                                                              type_three_secretion_property_id, 
                                                              22, top=False)

### Reading and writing Micromeda files
Micromeda files are a new SQLite3-based pathway annotation storage format that allows for the simultaneous transfer of multiple organism's Genome Properties assignments and supporting information such as InterProScan annotations and protein sequences. These files allow for the transfer of complete Genome properties Datasets between researchers and software applications. 

In [36]:
# Create a SQLAlchemy engine object for a SQLite3 Micromeda file.  
engine_no_proteins = create_engine('sqlite:///ecoli_compare_no_proteins.micro')

# Write the results to the file.
results.to_assignment_database(engine_no_proteins)

In [37]:
# Create a SQLAlchemy engine object for a SQLite3 Micromeda file.  
engine_proteins = create_engine('sqlite:///ecoli_compare.micro')

# Write the results to the file.
extended_results.to_assignment_database(engine_proteins)

#### A note on SQLAlchemy
Because Pygenprop uses SQLAlchemy to write Micromeda files (SQlite3), it can also write assignment results and supporting information to a variety of relational databases.

For example:

```python
create_engine('postgresql://scott:tiger@localhost/mydatabase')
```

See the following documentation for more information:

- https://docs.sqlalchemy.org/en/13/core/engines.html

In [38]:
# Load results from a Micromeda file.
assignment_caches = load_assignment_caches_from_database(engine_no_proteins)
results_reconstituted = GenomePropertiesResults(*assignment_caches, properties_tree=tree)

In [39]:
# Load results from a Micromeda file with proteins sequences.
assignment_caches_with_proteins = load_assignment_caches_from_database_with_matches(engine_proteins)
results_reconstituted_with_proteins = GenomePropertiesResultsWithMatches(*assignment_caches_with_proteins, 
                                                                         properties_tree=tree)

### Pygenprop CLI interface
Pygenprop also includes a command line interface. A command-line tutorial can be found here [here](https://github.com/Micromeda/pygenprop/blob/improve-documentation/README.md#example-workflow).