<a href="https://colab.research.google.com/github/Sanarazaaa/Personal-Projects-On-going-/blob/main/Computational_Analysis_of_Fibrinogen_%26_Its_Role_in_Clot_Contraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install biopython

Collecting biopython
  Downloading biopython-1.85-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading biopython-1.85-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m32.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: biopython
Successfully installed biopython-1.85


In [2]:
from Bio import Entrez, SeqIO
import time  # delay to prevent request limits

# email for NCBI access
Entrez.email = "sana.raza.eng@example.com"

# Dictionary of clotting factor NCBI protein IDs (Update these if needed)
protein_ids = {
    "human_fibrinogen": "NP_000499.1",
    "mouse_fibrinogen": "NP_034361.2",
    "rat_fibrinogen": "NP_001012187.1"
}

# Fetch sequences from NCBI
sequences = {}
for species, protein_id in protein_ids.items():
    try:
        print(f"Fetching {species} sequence...")
        handle = Entrez.efetch(db="protein", id=protein_id, rettype="fasta", retmode="text")
        seq_record = SeqIO.read(handle, "fasta")
        sequences[species] = str(seq_record.seq)
        time.sleep(1)  # Pause 1 second to avoid NCBI request limits
    except Exception as e:
        print(f"Error fetching {species}: {e}")

# Print the first 50 amino acids of each retrieved sequence
for species, seq in sequences.items():
    print(f"{species}: {seq[:50]}...")


Fetching human_fibrinogen sequence...
Fetching mouse_fibrinogen sequence...
Error fetching mouse_fibrinogen: HTTP Error 400: Bad Request
Fetching rat_fibrinogen sequence...
human_fibrinogen: MFSMRIVCLVLSVVGTAWTADSGEGDFLAEGGGVRGPRVVERHQSACKDS...
rat_fibrinogen: MATSGVEKSSKKKTEKKLAAREEAKLLAGFMGVMNNMRKQRTLCDVILMV...


Entrez API	Fetches data from NCBI’s online database.
FASTA Format	Standard text-based format for representing sequences.
Protein Accession IDs (NP_000499.1, etc.)	Unique identifiers used to retrieve protein sequences.
SeqIO.read()	Parses a FASTA file into a Biopython sequence object.
Error Handling	Prevents crashes if the request fails.
time.sleep(1)	Prevents NCBI from blocking requests due to excessive querying.

In [4]:
from Bio import pairwise2
from Bio.pairwise2 import format_alignment

# Align human and mouse fibrinogen
alignments = pairwise2.align.globalxx(sequences["human_fibrinogen"], sequences["rat_fibrinogen"])

# Print best alignment
print(format_alignment(*alignments[0]))


MF--SMRIVCL-VL--SVVG----T----AWTADSG-EGDF---L-AEGG--GV----RGP--R----V---V-ER----HQS----ACKD--SDWP-FCSDEDW-NYKCPSGCRMKGLIDEVNQD-FTNRI-NK-L--KNSLFEYQKNN---KDSHSLTTNIM-E--I---LRGD--FSSANN---RD---NTY-NR-V-SEDLRSRIEVLKRKV-------IEK-VQHIQLLQKNVRAQL---VDMKR-L-E-VDIDIKIR-S-CR-GSC-SRA-LAREV--D---LK---DYEDQQK---QL---EQVIA-K-D--L-LPSRD--RQ--HLPLIKMKPVPDLVPGNFKSQ--LQK-VPPEWK-ALT-DMPQM------RME-LERPGGNEITRGGSTS-YGTGSET-ESPRNPSSAGSWNSGSSGPGSTGNRN-PGSSGTGGT------ATWKPGSSGPGSTGSWNSGSSGTGSTGNQNPGSP-R-PG--STGTW-NPGSSERG--SAGHW-TSESSVSGSTGQWHS-ESGSFRP---DSPGSGNARPNNPDWGTFEEVSGN----V-SPGTR-REYHTEKLVT-SKG--DK-E-LRT-GKEKVTSGSTTTTR-RSCS-KTVTKTV----I---GPDGHKEVTKEVVTSEDG---SDC----PEAM-DLGTL-SGIG-TLDG----F--RHRHPDEAAF--F-DTASTGKTFPGFFSPM-----LGEFVSETESRGSESGI--FTNTKESSSHHPGIAEFPS-RG-------KSS-S-YSKQFTSS----TSYN-RGDSTFESKSYKM-ADEAGSE-ADH-EG---THSTKRGHAK-SRP-VRD---------CD-DVLQT-----H--PSG--TQ--S-GIF----N--IK------LPG---S----SKIFS--VYCDQ---ETSLGGWL--L---IQQ--RMD---GSLN-FNRTWQ--DYKRG-F--GSLNDEGE--GEFWLGNDYLHLLTQR

In [5]:
import urllib.request

# Download human fibrinogen PDB structure (replace with actual PDB ID)
pdb_id = "3GHG"
url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
urllib.request.urlretrieve(url, f"{pdb_id}.pdb")

print(f"Downloaded PDB structure: {pdb_id}.pdb")


Downloaded PDB structure: 3GHG.pdb


1️⃣ Import urllib: 	Allows Python to fetch files from the internet.

2️⃣ Define PDB ID:	Specifies the protein to download (e.g., "3GHG" for human fibrinogen).

3️⃣ Construct URL:	Generates the correct RCSB PDB download link.

4️⃣ Download File	Saves the PDB structure as "3GHG.pdb".

5️⃣ Print Confirmation:	Confirms the download was successful.

In [6]:
!pip install py3Dmol  # Install Py3Dmol for visualization

import py3Dmol

# Load the downloaded PDB file
view = py3Dmol.view(query="3GHG")  # Replace with correct PDB ID
view.setStyle({"cartoon": {"color": "cyan"}})
view.zoomTo()
view.show()


Collecting py3Dmol
  Downloading py3Dmol-2.4.2-py2.py3-none-any.whl.metadata (1.9 kB)
Downloading py3Dmol-2.4.2-py2.py3-none-any.whl (7.0 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.4.2


1️⃣ Install py3Dmol:	Enables 3D visualization in Colab.

2️⃣ Import py3Dmol:	Loads the visualization library.

3️⃣ Load the PDB Structure:	Fetches the protein’s 3D model from RCSB.

4️⃣ Set the Cartoon Style:	Displays the protein’s secondary structure.

5️⃣ Zoom In:	Adjusts the view to fit the protein.

6️⃣ Show the Viewer	Displays the interactive 3D model.

In [11]:
from Bio import motifs
from Bio.Seq import Seq

# Convert sequences to Biopython format
seqs = [Seq(sequences["human_fibrinogen"]), Seq(sequences["rat_fibrinogen"])]

# Get the minimum length of the sequences
min_len = min(len(seq) for seq in seqs)

# Truncate sequences to the minimum length
seqs = [seq[:min_len] for seq in seqs]

# Create motif from aligned sequences
motif = motifs.create(seqs)

# Display motif consensus
print("Consensus Motif:", motif.consensus)

Consensus Motif: AATAGAACAAAAATGTAAAAAAGAGAAAAAGGGAAGAAAAATACAACAAAAAAACAAAAAAAAAAAGCAAAGATTAAAAAATAAAAAAAAAAAAAAAAAAAAATATTAAAAAAAGAAAAAAAAAATAAAAAACAAAAAAAAAAAACAGAAAAAACAACAAAAATAAAAAAAAATAAAATCAGACAAAAAATAAAAAATATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAAAAAAATAAAAATAAAAAACAAAAAGGAAATAGGATAAGTGAGGTAAAAAAAAGAAAAGGGGAGATCAAAAGAAGTGGTATCAAGAAGAAACGAAAAGAAGTGGTGAAAAGAAACGATGTAAAGAAAGGAAGAATAAACAAGGTGATAGGAGAGAAAAAGAGCAATATAAAGTAAAAAGAACAGGAAAAAGAAAACGGGAGAAATGAAAATCGATTTATATCTATCTATAAGAAGGAAATAAAATAAGGAACAGAAAAGTAAGAGTAAGAAAAAAAAAAATATAAAGGTAAGAAAGAAGAGAAGTAAAGATAGAATAAAAAAAAAAGTACAACAGATCGAAAATTAT
