## Drug Repurposing from Predicted Genes

This notebook:
- Maps COVID-related predicted genes (from ML/DL) to known drugs
- Uses a UniProt–DrugBank mapping file
- Saves:
  - Drug–gene pairs (`*_druggable_genes.csv`)
  - Unique drugs (`*_unique_drugs.csv`)


Load Libraries and paths

In [None]:
import pandas as pd
import os

# === Set paths ===
BASE_DIR = "../results"                 # where gene prediction CSVs are stored
MAP_DIR = "../data/drug"               # path to drug mapping file
OUT_DIR = "../results/drug_repurposing"
os.makedirs(OUT_DIR, exist_ok=True)

# Input files
ml_file     = os.path.join(BASE_DIR, "ml_positive_genes.csv")
dl_file     = os.path.join(BASE_DIR, "dl_positive_genes.csv")
common_file = os.path.join(BASE_DIR, "common_positive_genes.csv")
map_file    = os.path.join(MAP_DIR, "uniprot_drugid_map.txt")  # or .csv/.tsv


Load Data

In [None]:
# Load gene predictions
ml_df     = pd.read_csv(ml_file)
dl_df     = pd.read_csv(dl_file)
common_df = pd.read_csv(common_file)

# Load drug mapping
try:
    map_df = pd.read_csv(map_file, sep="\t")  # tab-separated by default
except Exception:
    map_df = pd.read_csv(map_file)            # fallback to comma

# Quick look
print("✅ Loaded ML genes:", ml_df.shape)
print("✅ Loaded DL genes:", dl_df.shape)
print("✅ Loaded Common genes:", common_df.shape)
print("✅ Loaded Drug mapping:", map_df.shape)


Define Merge Function

In [None]:
def extract_druggable(positive_df, map_df, output_name):
    """
    Merge gene list with drug mapping and export CSV.
    """
    merged_df = pd.merge(
        positive_df,
        map_df,
        left_on='Gene ID',
        right_on='UniProt ID'
    )
    druggable_df = merged_df[['Gene ID', 'Drug IDs']].drop_duplicates()
    out_path = os.path.join(OUT_DIR, output_name)
    druggable_df.to_csv(out_path, index=False)
    print(f"✅ Saved {output_name} — {druggable_df.shape[0]} gene–drug pairs.")


Run on All Gene Sets

In [None]:
extract_druggable(ml_df, map_df, "ml_druggable_genes.csv")
extract_druggable(dl_df, map_df, "dl_druggable_genes.csv")
extract_druggable(common_df, map_df, "common_druggable_genes.csv")


## Output Summary

The following files were created in `/results/drug_repurposing/`:

- `ml_druggable_genes.csv`
- `dl_druggable_genes.csv`
- `common_druggable_genes.csv`

Each contains:
- `Gene ID` from ML/DL/common predictions
- Corresponding `Drug IDs` from UniProt–DrugBank mapping
