## Standard Numbering

The standard numbering tool is used to number the residues of a protein sequence. It allows for comparison of different protein sequences by aligning them and numbering the residues in a common reference frame.

It can be run in two different modes:

1. **Pairwise alignment**: This mode aligns two sequences and numbers the residues in a common reference frame. Here a base sequence is provided and the other sequences are aligned to it.
2. **Clustal alignment**: This mode aligns a sequence against a multiple sequence alignment and numbers the residues in a common reference frame. Here a base sequence is provided and the other sequences are aligned to it.


In [1]:
%reload_ext autoreload
%autoreload 2
import sys
from loguru import logger

from pyeed import Pyeed
from pyeed.analysis.mutation_detection import MutationDetection
from pyeed.analysis.standard_numbering import StandardNumberingTool

logger.remove()
level = logger.add(sys.stderr, level="WARNING")

In [2]:
uri = "bolt://129.69.129.130:7687"
user = "neo4j"
password = "12345678"

eedb = Pyeed(uri, user=user, password=password)
eedb.db.wipe_database(date="2025-03-19")

eedb.db.initialize_db_constraints(user=user, password=password)


📡 Connected to database.
All data has been wiped from the database.
the connection url is bolt://neo4j:12345678@129.69.129.130:7687
Loaded /home/nab/Niklas/pyeed/src/pyeed/model.py
Connecting to bolt://neo4j:12345678@129.69.129.130:7687
Setting up indexes and constraints...

Found model.StrictStructuredNode
 ! Skipping class model.StrictStructuredNode is abstract
Found model.Organism
 + Creating node unique constraint for taxonomy_id on label Organism for class model.Organism
{code: Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists} {message: An equivalent constraint already exists, 'Constraint( id=12, name='constraint_unique_Organism_taxonomy_id', type='UNIQUENESS', schema=(:Organism {taxonomy_id}), ownedIndex=5 )'.}
Found model.Site
 + Creating node unique constraint for site_id on label Site for class model.Site
{code: Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists} {message: An equivalent constraint already exists, 'Constraint( id=14, name='constraint_unique_Site_s

In [3]:
ids = ["AAM15527.1", "AAF05614.1", "AFN21551.1", "CAA76794.1", "AGQ50511.1"]

eedb.fetch_from_primary_db(ids, db="ncbi_protein")
eedb.fetch_dna_entries_for_proteins()
eedb.create_coding_sequences_regions()

In [4]:
sn = StandardNumberingTool(name="test_standard_numbering_pairwise")


sn.apply_standard_numbering_pairwise(
    base_sequence_id="AAM15527.1", db=eedb.db, list_of_seq_ids=ids[0:5]
)


Output()

In [5]:
sn.apply_standard_numbering_pairwise(
    base_sequence_id="AAM15527.1", db=eedb.db, list_of_seq_ids=ids
)


Output()

In [6]:
sn_clustal = StandardNumberingTool(name="test_standard_numbering_clustal")

sn_clustal.apply_standard_numbering(
    base_sequence_id="AAM15527.1", db=eedb.db, list_of_seq_ids=ids
)

In [7]:
sn_dna = StandardNumberingTool(name="test_standard_numbering_dna")

sn_dna.apply_standard_numbering(
    base_sequence_id="AF190695.1", db=eedb.db, node_type="DNA"
)

In [8]:
sn_dna_pairwise = StandardNumberingTool(name="test_standard_numbering_dna_pairwise")

sn_dna_pairwise.apply_standard_numbering_pairwise(
    base_sequence_id="AF190695.1", db=eedb.db, node_type="DNA"
)

Output()

In [9]:
sn_dna_region = StandardNumberingTool(name="test_standard_numbering_dna_pairwise_region")

sn_dna_region.apply_standard_numbering_pairwise(
    base_sequence_id="AF190695.1", db=eedb.db, node_type="DNA", region_based_sequence='coding sequence'
)

Output()

In both cases, there are now standard numbering nodes to all the proteins and they have on their edge the standradnumbering data.