## Standard Numbering

The standard numbering tool is used to number the residues of a protein sequence. It allows for comparison of different protein sequences by aligning them and numbering the residues in a common reference frame.

It can be run in two different modes:

1. **Pairwise alignment**: This mode aligns two sequences and numbers the residues in a common reference frame. Here a base sequence is provided and the other sequences are aligned to it.
2. **Clustal alignment**: This mode aligns a sequence against a multiple sequence alignment and numbers the residues in a common reference frame. Here a base sequence is provided and the other sequences are aligned to it.


In [42]:
%reload_ext autoreload
%autoreload 2
import sys
from loguru import logger

from pyeed import Pyeed
from pyeed.analysis.mutation_detection import MutationDetection
from pyeed.analysis.standard_numbering import StandardNumberingTool

logger.remove()
level = logger.add(sys.stderr, level="INFO")

In [43]:
uri = "bolt://129.69.129.130:7687"
user = "neo4j"
password = "12345678"

eedb = Pyeed(uri, user=user, password=password)
eedb.db.wipe_database(date="2025-02-28")

eedb.db.initialize_db_constraints(user=user, password=password)


📡 Connected to database.
All data has been wiped from the database.
the connection url is bolt://neo4j:12345678@129.69.129.130:7687
Loaded /home/nab/Niklas/pyeed/src/pyeed/model.py
Connecting to bolt://neo4j:12345678@129.69.129.130:7687
Setting up indexes and constraints...

Found model.StrictStructuredNode
 ! Skipping class model.StrictStructuredNode is abstract
Found model.Organism
 + Creating node unique constraint for taxonomy_id on label Organism for class model.Organism
{code: Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists} {message: An equivalent constraint already exists, 'Constraint( id=12, name='constraint_unique_Organism_taxonomy_id', type='UNIQUENESS', schema=(:Organism {taxonomy_id}), ownedIndex=5 )'.}
Found model.Site
 + Creating node unique constraint for site_id on label Site for class model.Site
{code: Neo.ClientError.Schema.EquivalentSchemaRuleAlreadyExists} {message: An equivalent constraint already exists, 'Constraint( id=14, name='constraint_unique_Site_s

In [44]:
ids = ["KJO56189.1", "KLP91446.1", "AAM15527.1", "AAF05614.1", "AFN21551.1", "CAA76794.1", "AGQ50511.1", "KJO56289.1"]

eedb.fetch_from_primary_db(ids, db="ncbi_protein")

[32m2025-02-28 17:20:32.533[0m | [1mINFO    [0m | [36mpyeed.main[0m:[36mfetch_from_primary_db[0m:[36m87[0m - [1mFound 0 sequences in the database.[0m
[32m2025-02-28 17:20:32.533[0m | [1mINFO    [0m | [36mpyeed.main[0m:[36mfetch_from_primary_db[0m:[36m89[0m - [1mFetching 8 sequences from ncbi_protein.[0m
[32m2025-02-28 17:20:32.560[0m | [1mINFO    [0m | [36mpyeed.adapter.primary_db_adapter[0m:[36mexecute_requests[0m:[36m140[0m - [1mStarting requests for 1 batches.[0m
[32m2025-02-28 17:20:33.682[0m | [1mINFO    [0m | [36mpyeed.adapter.ncbi_protein_mapper[0m:[36madd_to_db[0m:[36m301[0m - [1mAdded/updated NCBI protein KJO56189.1 in database[0m
[32m2025-02-28 17:20:33.711[0m | [1mINFO    [0m | [36mpyeed.adapter.ncbi_protein_mapper[0m:[36madd_to_db[0m:[36m301[0m - [1mAdded/updated NCBI protein KLP91446.1 in database[0m
[32m2025-02-28 17:20:33.741[0m | [1mINFO    [0m | [36mpyeed.adapter.ncbi_protein_mapper[0m:[36madd_to_db[0m

In [45]:
sn = StandardNumberingTool(name="test_standard_numbering_pairwise")


sn.apply_standard_numbering_pairwise(
    base_sequence_id="KJO56189.1", db=eedb.db, list_of_seq_ids=ids[0:5]
)


Output()

In [46]:
sn.apply_standard_numbering_pairwise(
    base_sequence_id="KJO56189.1", db=eedb.db, list_of_seq_ids=ids
)


[32m2025-02-28 17:20:35.248[0m | [1mINFO    [0m | [36mpyeed.analysis.standard_numbering[0m:[36mapply_standard_numbering_pairwise[0m:[36m372[0m - [1mPair KJO56189.1 and KLP91446.1 already exists under the same standard numbering node[0m
[32m2025-02-28 17:20:35.250[0m | [1mINFO    [0m | [36mpyeed.analysis.standard_numbering[0m:[36mapply_standard_numbering_pairwise[0m:[36m372[0m - [1mPair KJO56189.1 and AAM15527.1 already exists under the same standard numbering node[0m
[32m2025-02-28 17:20:35.252[0m | [1mINFO    [0m | [36mpyeed.analysis.standard_numbering[0m:[36mapply_standard_numbering_pairwise[0m:[36m372[0m - [1mPair KJO56189.1 and AAF05614.1 already exists under the same standard numbering node[0m
[32m2025-02-28 17:20:35.254[0m | [1mINFO    [0m | [36mpyeed.analysis.standard_numbering[0m:[36mapply_standard_numbering_pairwise[0m:[36m372[0m - [1mPair KJO56189.1 and AFN21551.1 already exists under the same standard numbering node[0m


Output()

In [47]:
sn_clustal = StandardNumberingTool(name="test_standard_numbering_clustal")

sn_clustal.apply_standard_numbering(
    base_sequence_id="KJO56189.1", db=eedb.db, list_of_seq_ids=ids
)

[32m2025-02-28 17:20:36.255[0m | [1mINFO    [0m | [36mpyeed.analysis.standard_numbering[0m:[36mapply_standard_numbering[0m:[36m470[0m - [1mUsing 7 sequences for standard numbering[0m
[32m2025-02-28 17:20:36.414[0m | [1mINFO    [0m | [36mpyeed.analysis.standard_numbering[0m:[36mapply_standard_numbering[0m:[36m490[0m - [1mAlignment received from ClustalOmega:
KJO56189.1  -----------------------------------------MSIQHFRVALIPFFAAFC-LPVFAHPE--TLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDSWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKV---AGP---LLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW
KLP91446.1  -----------------------------------------MSIQHFRVALIPFFAAFC-LPVFAHPE--TLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVKYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEAD