Structure Align

A minimal Python package for structural alignment of protein structures with different lengths.

Overview

Structure Align performs pairwise sequence alignment followed by structural alignment of protein structures that don't have the same number of amino acids. It's designed to be simple, clean, and efficient for structural biology applications.

Key Features

Sequence-based alignment: Uses BioPython's pairwise aligner to find matching residues
Flexible selection: Support for different atom selections (CA, backbone, etc.)
Structured results: Uses Pydantic models for clean, validated data structures
RMSD calculation: Provides before/after RMSD values
Position-wise analysis: Calculate per-residue distances after alignment
Residue ID mapping: Query distances by original residue IDs (e.g., "residue 22")
Gap handling: Automatically handles sequence alignment gaps
Visualization: Built-in plotting functionality for distance analysis
Interactive plots: Rich hover information with Plotly integration
Chain-aware display: Visual separation and coloring of different protein chains

Installation

# Install dependencies
pip install structurealign

Quick Start

import MDAnalysis as mda
from structure_align import StructuralAligner

# Load your structures
reference = mda.Universe("reference.pdb")
mobile = mda.Universe("mobile.pdb")

# Initialize aligner
aligner = StructuralAligner()

# Perform alignment
result = aligner.align(reference, mobile, selection="name CA")

# Print results
print(f"RMSD: {result.rmsd_before:.2f} → {result.rmsd_after:.2f} Å")
print(f"Aligned residues: {result.n_aligned_residues}")

# Query specific residue distances
distance = result.get_distance_by_residue(22)  # Distance for residue 22
if distance:
    print(f"Residue 22 distance: {distance:.2f} Å")

Residue ID Mapping

One of the key features is the ability to query distances by original residue IDs, even after sequence alignment with gaps:

# Get distance for specific residue
distance = result.get_distance_by_residue(22)

# Get all aligned residue pairs
pairs = result.get_aligned_residue_pairs()
for ref_resid, mob_resid, distance in pairs:
    print(f"Ref {ref_resid} ↔ Mob {mob_resid}: {distance:.2f} Å")

# Get formatted table of results
df = result.get_residue_info_table()  # Returns pandas DataFrame
print(df.head())

# Or get formatted string (backward compatibility)
table_str = result.get_residue_info_table_formatted()
print(table_str)

# Get residue mappings
ref_mapping, mob_mapping = result.get_residue_mapping()

API Reference

StructuralAligner

Main class for performing structural alignments.

Methods

__init__(gap_open=-10.0, gap_extend=-0.5): Initialize with gap penalties
align(reference, mobile, selection="name CA"): Perform alignment
calculate_position_distances(result): Get per-residue distances
plot_distances(result, **kwargs): Create distance plot

AlignmentResult

Complete alignment result with residue mapping capabilities.

Key Methods

get_distance_by_residue(ref_resid): Get distance for specific residue ID
get_aligned_residue_pairs(): Get all (ref_resid, mob_resid, distance) tuples
get_residue_mapping(): Get residue ID to position mappings
get_residue_info_table(): Get formatted table of aligned residues

Properties

rmsd_before/rmsd_after: RMSD values before and after alignment
n_aligned_residues: Number of successfully aligned residues
position_distances: Per-position distances after alignment

Examples

Basic Usage

from structure_align import StructuralAligner
import MDAnalysis as mda

# Load structures
ref = mda.Universe("protein1.pdb")
mob = mda.Universe("protein2.pdb")

# Align using CA atoms
aligner = StructuralAligner()
result = aligner.align(ref, mob)

print(f"RMSD: {result.rmsd_before:.2f} → {result.rmsd_after:.2f} Å")

Query Specific Residues

# Query distance for residue 22
distance = result.get_distance_by_residue(22)
if distance:
    print(f"Residue 22: {distance:.2f} Å")
else:
    print("Residue 22 not found in alignment")

# Find high-distance residues
pairs = result.get_aligned_residue_pairs()
high_distance = [(r1, r2, d) for r1, r2, d in pairs if d > 3.0]
print(f"Found {len(high_distance)} residues with distance > 3.0 Å")

Analysis and Visualization

# Get comprehensive analysis as DataFrame
df = result.get_residue_info_table()
print(f"Alignment shape: {df.shape}")
print(df.describe())

# Find high-distance residues
high_distance = df[df['distance'] > 3.0]
print(f"Found {len(high_distance)} residues with distance > 3.0 Å")

# Find specific amino acid combinations
cys_pairs = df[(df['ref_aa'] == 'C') & (df['mob_aa'] == 'C')]
print("Cysteine-Cysteine alignments:")
print(cys_pairs[['ref_resid', 'mob_resid', 'distance']])

# Sort by distance
worst_aligned = df.nlargest(10, 'distance')
print("10 worst aligned residues:")
print(worst_aligned)

# Plot distances with residue information
fig = aligner.plot_distances(result, title="Residue Distance Analysis")
fig.show()

# Statistical analysis
distances = result.get_distances_array()
print(f"Mean distance: {distances.mean():.2f} Å")
print(f"Std deviation: {distances.std():.2f} Å")

DataFrame Operations

The get_residue_info_table() method returns a pandas DataFrame with the following columns:

ref_resid: Reference residue ID
mob_resid: Mobile residue ID
distance: Distance between aligned residues (Å)
ref_aa: Reference amino acid (single letter)
mob_aa: Mobile amino acid (single letter)

This enables powerful analysis:

df = result.get_residue_info_table()

# Filter by distance threshold
high_rmsd = df[df['distance'] > 2.0]

# Filter by amino acid type
aromatics = df[df['ref_aa'].isin(['F', 'W', 'Y'])]

# Group by amino acid and get statistics
aa_stats = df.groupby('ref_aa')['distance'].agg(['mean', 'std', 'count'])

# Export to CSV for further analysis
df.to_csv('alignment_results.csv', index=False)

# Merge with other data
# df = df.merge(other_data, on='ref_resid')

Handling Sequence Gaps

The package automatically handles sequence alignment gaps:

Sequence alignment: Creates optimal alignment with gaps (-)
Residue mapping: Only aligned residues (no gaps) are used for structural alignment
ID preservation: Original residue IDs are preserved for querying
Gap tracking: You can see which residues were aligned vs. skipped

# Example with gaps
# Reference: ACDEFGHIK
# Mobile:    A-DEF-HIK
# Result:    Only ADEF and HIK positions are structurally aligned
# But you can still query by original residue IDs

Dependencies

MDAnalysis: Structure handling and analysis
BioPython: Sequence alignment
NumPy: Numerical calculations
Pandas: Data analysis and DataFrame operations
Matplotlib: Static plotting
Plotly: Interactive plotting with rich hover information
Pydantic: Data validation and models

License

MIT License

Interactive Plotting

The package provides both static (matplotlib) and interactive (Plotly) plotting options:

# Static matplotlib plot
fig_static = aligner.plot_distances(result)
fig_static.show()

# Interactive Plotly plot with rich hover information
fig_interactive = aligner.plot_distances_interactive(result)
fig_interactive.show()

# Save interactive plot as HTML
fig_interactive.write_html("interactive_plot.html")

Interactive Features

Rich Hover Tooltips: Each point shows:

Residue information: LEU123 (Chain A)
Amino acid conservation
Precise distance measurements
Alignment position

Interactive Controls:

🔍 Zoom: Click and drag to zoom into regions
🖱️ Pan: Shift+drag to navigate
📏 Range Slider: Navigate large proteins easily
🎨 Legend: Show/hide specific chains
💾 Export: HTML format for sharing

Chain Visualization:

Color-coded chains with boundaries
Custom color schemes
Chain-specific statistics

# Custom chain colors
colors = {'A': 'red', 'B': 'blue', 'C': 'green'}
fig = aligner.plot_distances_interactive(result, chain_colors=colors)

# Large plot for detailed analysis
fig = aligner.plot_distances_interactive(result, height=800, width=1400)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
structurealign		structurealign
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.ipynb		example.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tkMAT-D.pdb		tkMAT-D.pdb
tkMAT_D-SAM_cleaned.pdb		tkMAT_D-SAM_cleaned.pdb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Structure Align

Overview

Key Features

Installation

Quick Start

Residue ID Mapping

API Reference

StructuralAligner

Methods

AlignmentResult

Key Methods

Properties

Examples

Basic Usage

Query Specific Residues

Analysis and Visualization

DataFrame Operations

Handling Sequence Gaps

Dependencies

License

Interactive Plotting

Interactive Features

Analysis and Visualization

About

Uh oh!

Releases

Packages

Languages

License

PyEED/structure-align

Folders and files

Latest commit

History

Repository files navigation

Structure Align

Overview

Key Features

Installation

Quick Start

Residue ID Mapping

API Reference

StructuralAligner

Methods

AlignmentResult

Key Methods

Properties

Examples

Basic Usage

Query Specific Residues

Analysis and Visualization

DataFrame Operations

Handling Sequence Gaps

Dependencies

License

Interactive Plotting

Interactive Features

Analysis and Visualization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages