[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RecognitionAnalytics/NovoSmithy/blob/main/ProteinLinker.ipynb)


🧱 Designing the Initial Structure
Before using the Protein Segment Stitcher, it's recommended to create an initial scaffold by assembling relevant structural fragments:

Start with the RCSB PDB
Download one or more protein structures from the RCSB Protein Data Bank that are similar to the target protein you're designing.

Use ChimeraX for Structural Design
Tools like ChimeraX allow you to:

Position chains into desired orientations or conformations.

Delete unwanted segments or extraneous domains.

Assemble new architectures by combining parts of different proteins.

Manually inspect and edit chain connectivity, clashes, and spatial fit.

Export Your Designer Structure
Once you've created a rough design (with intended connectivity but likely chain breaks), export the structure as a PDB file. This becomes the input for the Segment Stitcher, which will intelligently bridge the broken regions with polyG linkers.

In [1]:
from Bio import PDB
from Bio.PDB.PDBIO import PDBIO
from ChainStitcher import ChainStich, PDBLoader, PDBSaver, PDBViewer

🧬 Protein Chain Stitcher
Protein Chain Stitcher is a specialized tool designed to repair and connect fragmented protein structures within PDB files by intelligently stitching together disjointed segments using flexible poly-glycine (polyG) linkers.

🔧 What It Does
Analyzes a PDB file to identify continuous protein segments, chain breaks, and structural issues.

Determines the optimal stitching strategy by calculating the longest possible path through all valid segment connections.

Bridges chain breaks with polyG linkers, where breaks are within a user-defined X Ångström threshold.

Outputs a modified PDB with a unified, continuous backbone for downstream modeling, simulation, or structure prediction.

🧠 Use Case
This tool is particularly useful when working with:

Fragmented predictions from structure prediction tools like AlphaFold or ESMFold

Engineered chimeric proteins requiring chain fusion

Loop modeling or de novo backbone design workflows

🧵 Why PolyG?
Poly-glycine segments provide a minimal, flexible linker that can later be refined or rebuilt using loop modeling tools, making them ideal for bridging uncertain or flexible regions in a structure.

In [None]:
input_pdb_path = r"./p96_Left.pdb"
output_pdb_path = r"./connected_protein.pdb"

model = PDBLoader(input_pdb_path)
chainStich = ChainStich(model, excluded_chains=[], connection_threshold_Ang=15)

PDBViewer(model)
connectedModel = chainStich.ConnectClosest()
PDBViewer(connectedModel)
PDBSaver(connectedModel, output_pdb_path)

Identified 8 segments in total.


Found 6 chains


Connected structure saved to C:\Users\bashc\Desktop\working\ssBinding\connected_protein.pdb


In [None]:
from IPython.display import clear_output
!pip install biopython
!pip install py3Dmol
!git clone https://github.com/dauparas/LigandMPNN.git
 

%cd LigandMPNN
!bash get_model_params.sh "./model_params"

#setup your conda/or other environment
#conda create -n ligandmpnn_env python=3.11
!pip3 install -r requirements.txt
clear_output()