# Orphan GPCR Similarity Search Pipeline

This notebook executes the complete, end-to-end bioinformatics pipeline for predicting the function of 20 orphan GPCRs by finding their closest relatives in the Swiss-Prot database.

**Instructions:**
1.  Ensure your runtime is set to use a **T4 GPU** (`Runtime` -> `Change runtime type`).
2.  Run the cells sequentially.

**Pipeline Steps:**
1.  **Setup**: Clones the GitHub repository and installs all necessary dependencies.
2.  **Data Acquisition**: Downloads the Swiss-Prot database.
3.  **Embedding Computation (The Long Step)**: Computes ESM-2 embeddings for the entire Swiss-Prot database. **This will take many hours.**
4.  **Index & Search**: Builds a FAISS search index and finds the top hits for the 20 orphan GPCRs.
5.  **Results**: Displays the final, readable table of predictions.

In [None]:
# 1. Setup: Clone Repo and Install Dependencies

print("Cloning the GitHub repository...")
!git clone https://github.com/PorkRelatives/orphan-gpcr-meta-analysis.git

# Change directory into the repository
import os
os.chdir(repo_name)

print("Installing all necessary dependencies...")
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 -q
!pip install fair-esm biopython requests faiss-gpu -q

print("\nSetup complete. All dependencies are installed.")

In [None]:
# 2. Data Acquisition

print("Running script to download the Swiss-Prot database...")
!python scripts/01_download_swissprot.py

In [None]:
# 3. Compute Database Embeddings (The Long Step)

print("\nStarting the main computation. This will take many hours.")
print("The script is resumable and will save checkpoints periodically.")

!python scripts/02_compute_database_embeddings.py

In [None]:
# 4. Build the FAISS Search Index

print("\nComputation complete. Building the FAISS index for efficient search...")
!python scripts/03_build_search_index.py

In [None]:
# 5. Find and Display Top Hits

print("\nIndex built. Searching for the top hits for each orphan GPCR...")
!python scripts/04_find_top_hits.py