<p style="font-size:18px; font-weight:bold;"> 2025 Olivia Debnath</p>
<p style="font-size:14px;">Dana-Farber Cancer Institute & Harvard Medical School</p>

Moving forward from PPI_CellxGene_downstream_S0_12032025.ipynb (basic filtering & QC for genes & cell types) 

**Step 1 (Hypothesis Testing) evaluating strong co-expression of P.O.I & its partners in a cell type/state** 


**Hypothesis 1 (H₁): Strong co-Expression & functional disruption** 

If a gene & at least one of its interactors are moderately to highly co-expressed (cut-off: ≥30%) in a specific cell type/state, then perturbation of the PPI (due to mutations, knockdown, or inhibition) is likely to disrupt a specific cellular function.

🔹 Why this hypothesis?

1. Cell type specificity matters:
   - Many biological processes depend on tightly regulated co-expression of genes in specific cell types or functional states. E.g.,  Synaptic vesicle trafficking in neurons requires co-expression of essential interactors like STXBP1 & SYTL4/MAPK1 in neuronal subpopulations.
  
2. Pathogenicity & functional impact:
    - Disrupting PPIs in highly co-expressed modules can lead to disease phenotypes. E.g., STXBP1 mutations impair synaptic function, contributing to neurodevelopmental disorders (e.g., epilepsy, intellectual disability).
      
    - Similar disruptions in cardiomyocytes, hepatocytes, or immune cells can drive distinct disease mechanisms.

  
🔹 Applications:
    - Prioritizing functionally relevant PPIs for therapeutic targeting.
    - Identifying tissue/cell-type-specific vulnerabilities to genetic mutations.

In [1]:
import os
print(os.path.exists("./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/Level2/filtered_S0/"))

True


In [2]:
#Define input directory: 
input_dir = "./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/Level2/filtered_S0/"  

#Dynamically find all relevant input files
input_files = sorted([f for f in os.listdir(input_dir) if f.endswith("_filtered_S0_17042025.csv")]) 
print(input_files) 

#Print total count of files
print(f"\n✅ Total Input Files Found: {len(input_files)}")

['ACSF3_PPI_filtered_S0_17042025.csv', 'ACTB_PPI_filtered_S0_17042025.csv', 'ACY1_PPI_filtered_S0_17042025.csv', 'ADIPOQ_PPI_filtered_S0_17042025.csv', 'AGXT_PPI_filtered_S0_17042025.csv', 'AHCY_PPI_filtered_S0_17042025.csv', 'AIPL1_PPI_filtered_S0_17042025.csv', 'ALAS2_PPI_filtered_S0_17042025.csv', 'ALDOA_PPI_filtered_S0_17042025.csv', 'ALOX5_PPI_filtered_S0_17042025.csv', 'AMPD2_PPI_filtered_S0_17042025.csv', 'ANKRD1_PPI_filtered_S0_17042025.csv', 'ANXA11_PPI_filtered_S0_17042025.csv', 'AP2S1_PPI_filtered_S0_17042025.csv', 'APOA1_PPI_filtered_S0_17042025.csv', 'APOD_PPI_filtered_S0_17042025.csv', 'ASNS_PPI_filtered_S0_17042025.csv', 'ATPAF2_PPI_filtered_S0_17042025.csv', 'BAG3_PPI_filtered_S0_17042025.csv', 'BANF1_PPI_filtered_S0_17042025.csv', 'BCL10_PPI_filtered_S0_17042025.csv', 'BFSP2_PPI_filtered_S0_17042025.csv', 'BLK_PPI_filtered_S0_17042025.csv', 'C1QA_PPI_filtered_S0_17042025.csv', 'C1QB_PPI_filtered_S0_17042025.csv', 'C1QC_PPI_filtered_S0_17042025.csv', 'CA8_PPI_filtered_S

In [3]:
#Specify output directory: 
output_dir = "./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/"
os.makedirs(output_dir, exist_ok=True)  #Ensure output directory exists

In [4]:
import pandas as pd
import numpy as np

In [5]:
#Define expression cutoffs - can be adjusted later. 
HIGH_EXP = 50  #Highly co-expressed threshold
MODERATE_EXP = 30  #Moderately co-expressed (H1 lower bound)


#We will test the previously generated *_gene_expression_S0_13032025 files as input & test for co-expression (Hypothesis 1): 
def process_H1_file(df, protein_of_interest):
    """ 
    Filters for strong co-expression (H1) and computes scRNA-score.
    Keeps only tissue/cell type pairs where:
    - POI ≥30% expression & at least one interactor ≥30% (moderate co-expression)
    - POI ≥50% expression & at least one interactor ≥50% (strong co-expression)
    """

    #**Ensure POI is moderately expressed (≥30%) in at least some cells**
    main_protein_present = df[(df["Gene Symbol"] == protein_of_interest) & (df["%Cells Expressing Gene"] >= MODERATE_EXP)]
    
    if main_protein_present.empty:
        print(f"⚠️ WARNING: {protein_of_interest} does not meet ≥30% expression in any valid cluster. Skipping file.")
        return None

    #**Identify valid interaction cases**
    interactors_present_30 = df[(df["Gene Symbol"] != protein_of_interest) & (df["%Cells Expressing Gene"] >= MODERATE_EXP)]

    #**Scenario 1: POI ≥30%, Interactor ≥30%**
    valid_case_1 = pd.merge(
        main_protein_present[["Tissue", "Cell Type"]],
        interactors_present_30[["Tissue", "Cell Type"]],
        on=["Tissue", "Cell Type"], how="inner"
    ).drop_duplicates()

    #**Scenario 2: Both POI & at least one interactor are highly expressed (≥50%)**
    main_protein_present_50 = df[(df["Gene Symbol"] == protein_of_interest) & (df["%Cells Expressing Gene"] >= HIGH_EXP)]
    interactors_present_50 = df[(df["Gene Symbol"] != protein_of_interest) & (df["%Cells Expressing Gene"] >= HIGH_EXP)]
    valid_case_2 = pd.merge(
        main_protein_present_50[["Tissue", "Cell Type"]],
        interactors_present_50[["Tissue", "Cell Type"]],
        on=["Tissue", "Cell Type"], how="inner"
    ).drop_duplicates()

    #**Print filtered results per scenario**
    print(f"\n📊 Scenario 1: {protein_of_interest} ≥30%, Interactor ≥30% (Moderate Co-Expression)")
    print(valid_case_1)

    print(f"\n📊 Scenario 2: Both {protein_of_interest} and at least one interactor ≥50% (Highly Co-Expressed)")
    print(valid_case_2)

    #**Combine all valid cases & filter data**
    valid_tissue_cell_types = pd.concat([valid_case_1, valid_case_2]).drop_duplicates()
    df = df.merge(valid_tissue_cell_types, on=["Tissue", "Cell Type"], how="inner")

    #**Define Co-Expression Category**
    conditions = [
        df["%Cells Expressing Gene"] >= HIGH_EXP,   #Highly Co-Expressed
        df["%Cells Expressing Gene"] >= MODERATE_EXP  #Moderately Co-Expressed
    ]
    categories = ["Highly Co-Expressed (≥50%)", "Moderately Co-Expressed (30-50%)"]

    df["Co-Expression Category"] = np.select(conditions, categories, default="Low Co-Expression (<30%)")

    #**Remove Lowly Expressed Genes (<30%) since they are not part of H1**
    df = df[df["Co-Expression Category"] != "Low Co-Expression (<30%)"]

    return df


In [6]:
#Exporting results as .xlsx to avoid character encoding issues. 

def process_all_H1_files():
    """ Processes all Step-2 filtered files and saves H1 filtered output as Excel (.xlsx). """
    for file in input_files:
        input_path = os.path.join(input_dir, file)
        print(f"\n📂 Processing: {input_path}...")

        #Extract POI dynamically from filename
        protein_of_interest = file.split("_")[0]  
        print(f"🧬 Identified Protein of Interest: {protein_of_interest}")

        #Read file
        df = pd.read_csv(input_path)

        #Process file
        processed_df = process_H1_file(df, protein_of_interest)  # Ensure this function is implemented correctly!

        if processed_df is not None and not processed_df.empty:
            #**Modify output filename for Step 3 H1 (Now Excel)**
            output_file = os.path.join(output_dir, file.replace("_S0_17042025", "_S1_H1_17042025").replace(".csv", ".xlsx"))
            
            #**Save as Excel (without encoding argument)**
            processed_df.to_excel(output_file, index=False)

            print(f"✅ Saved: {output_file}")
        else:
            print(f"⚠️ WARNING: No valid data after processing {file}. Skipping...")

#Run the function to process all files
process_all_H1_files()


📂 Processing: ./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/Level2/filtered_S0/ACSF3_PPI_filtered_S0_17042025.csv...
🧬 Identified Protein of Interest: ACSF3

📂 Processing: ./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/Level2/filtered_S0/ACTB_PPI_filtered_S0_17042025.csv...
🧬 Identified Protein of Interest: ACTB

📊 Scenario 1: ACTB ≥30%, Interactor ≥30% (Moderate Co-Expression)
              Tissue                               Cell Type
0     adipose tissue                                  B cell
1     adipose tissue         CD4-positive, alpha-beta T cell
2     adipose tissue         CD8-positive, alpha-beta T cell
3     adipose tissue                                  T cell
4     adipose tissue                       alpha-beta T cell
...              ...                                     ...
1819          uterus                            stromal cell
1820          uterus                   stromal cell of ovary
1821          uterus                

In [7]:
#Same as per: PPI_CellxGene_coexpression_S1_H1_13032025.ipynb 

#LITAF: need to re-process this by removing UBA52 (dominantly expressed ubiquitin gene in all clusters)
#If a cell type/state pops because of just LITAF & UBA52, maybe it should be skipped. 
#Save output as _V2_noUBA52.xlsx for better clarity. 

def process_LITAF_without_UBA52():
    """ Reprocesses LITAF while removing UBA52 and saves as '_V2_noUBA52.xlsx'. """

    file = "LITAF_PPI_filtered_S1_H1_17042025.xlsx"  #Step 1 H1 output file
    input_path = os.path.join(output_dir, file)  

    if not os.path.exists(input_path):
        print(f"❌ ERROR: {input_path} not found!")
        return
    
    print(f"\n🔄 Reprocessing: {input_path} (Removing UBA52)...")

    #Read Excel file
    df = pd.read_excel(input_path)

    #**Step 1: Remove UBA52 from data**
    df = df[df["Gene Symbol"] != "UBA52"]

    #**Step 2: Identify & remove cell types where only LITAF & UBA52 were present**
    cell_type_counts = df.groupby(["Tissue", "Cell Type"])["Gene Symbol"].nunique().reset_index()
    only_litaf_clusters = cell_type_counts[cell_type_counts["Gene Symbol"] == 1]  # Clusters with only one gene (LITAF)
    
    #**Filter out these clusters**
    df = df[~df.set_index(["Tissue", "Cell Type"]).index.isin(only_litaf_clusters.set_index(["Tissue", "Cell Type"]).index)]

    #**Step 3: Save output as new version**
    output_file = os.path.join(output_dir, file.replace("_S1_H1", "_V2_noUBA52"))
    df.to_excel(output_file, index=False)

    print(f"✅ Successfully saved: {output_file}")

#Run the function
process_LITAF_without_UBA52()


🔄 Reprocessing: ./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/LITAF_PPI_filtered_S1_H1_17042025.xlsx (Removing UBA52)...
✅ Successfully saved: ./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/LITAF_PPI_filtered_V2_noUBA52_17042025.xlsx


Fix required:

- POI Expression (POI_Expression) is extracted correctly. 
- Prioritized PPI pairs are generated without self-interactions.
- Files are processed dynamically, excluding LITAF when needed.

1. Priority score fix: 
Instead of normalizing each feature independently, we first compute the raw Priority Score, then apply MinMax Scaling across final scores to ensure they're >0.
This prevents one feature from dominating and keeps all scores comparable.

2. Adjust Weights: Higher contribution for mean robustness => Since co-expression strength is crucial, we increase the weight of Mean Robustness.

3. Recalculate Mean Robustness Using %Cells Expressing POI & %Cells Expressing Interactors

4. Apply final MinMax scaling on raw Priority Scores (Ensuring No 0s) => if we don't have many PPIs to visualize, it's better to ignore this field.
    - If MinMax Scaling produces zeros, shift values slightly above 0 to maintain rankings.
    - Look into columns like Contributing_Interactors & Prioritized_PPI to check #interactors contributing to co-expression. Higher #partners presumably indicate tighter regulatory role. 

In [8]:
from sklearn.preprocessing import MinMaxScaler

#**Define input directory**
input_dir = "./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/"

#**Find all POI files dynamically**
input_files = [f for f in os.listdir(input_dir) if f.endswith("_S1_H1_17042025.xlsx")]

#**Check if LITAF file exists in input directory**
litaf_filename = "LITAF_PPI_filtered_V2_noUBA52_17042025.xlsx"

if litaf_filename in input_files:
    print(f"✅ LITAF file detected & will be processed.")
else:
    print(f"⚠️ WARNING: LITAF file not found in input directory!")

#**Check if any valid POI files exist**
if not input_files:
    print("❌ ERROR: No valid POI files found in the directory!")
    exit()

print(f"📂 Found {len(input_files)} files for processing.")

#**Processing all POI files independently**
for file in input_files:
    input_path = os.path.join(input_dir, file)

    #**Extract POI dynamically from filename**
    protein_of_interest = file.split("_")[0]

    print(f"\n📂 Processing Priority Ranking for: {input_path}")
    print(f"🧬 Identified Protein of Interest: {protein_of_interest}")

    #**Read the dataset**
    df = pd.read_excel(input_path)

    #**Ensure necessary columns exist**
    required_columns = {"Tissue", "Cell Type", "Gene Symbol", "%Cells Expressing Gene", "Expression", "Cell Count"}
    if not required_columns.issubset(df.columns):
        print(f"❌ ERROR: Missing required columns in {file}. Skipping...")
        continue

    #**Step 1: Identify Robust Interactions (≥30% Expression)**
    interaction_threshold = 30
    df["Robust Interaction"] = df["%Cells Expressing Gene"] >= interaction_threshold

    #**Step 2: Extract POI Expression & %Cells Expressing POI BEFORE groupby**
    poi_df = df[df["Gene Symbol"] == protein_of_interest][["Tissue", "Cell Type", "Expression", "%Cells Expressing Gene"]]
    poi_df.rename(columns={"Expression": "POI_Expression", "%Cells Expressing Gene": "%Cells_Expressing_POI"}, inplace=True)

    #**Step 3: Compute Cluster-Level Metrics**
    cluster_priority = df.groupby(["Tissue", "Cell Type"]).agg(
        Num_Robust_Interactions=("Gene Symbol", lambda x: df.loc[x.index, "Robust Interaction"].sum()),
        Contributing_Interactors=("Gene Symbol", lambda x: ", ".join(sorted(x[df.loc[x.index, "Robust Interaction"]]))),
        Mean_scRNA_score=("scRNA_score", "mean"),
        Total_Cell_Count=("Cell Count", "max")
    ).reset_index()

    #**Step 4: Extract %Cells Expressing Interactors**
    interactors_df = df[df["Robust Interaction"]][["Tissue", "Cell Type", "%Cells Expressing Gene"]]
    interactors_avg = interactors_df.groupby(["Tissue", "Cell Type"])["%Cells Expressing Gene"].mean().reset_index()
    interactors_avg.rename(columns={"%Cells Expressing Gene": "%Cells_Expressing_Interactors"}, inplace=True)

    #**Step 5: Merge POI Expression & %Cells Expressing POI Values**
    cluster_priority = cluster_priority.merge(poi_df, on=["Tissue", "Cell Type"], how="left")
    cluster_priority = cluster_priority.merge(interactors_avg, on=["Tissue", "Cell Type"], how="left")

    #**Step 6: Compute Mean Robustness**
    cluster_priority["Mean_Robustness"] = (cluster_priority["%Cells_Expressing_POI"] + cluster_priority["%Cells_Expressing_Interactors"]) / 2

    #**Step 7: Ensure valid data remains after processing**
    if cluster_priority.empty:
        print(f"⚠️ No valid data remaining for {file}. Skipping...")
        continue

    #**Step 8: Compute Priority Score**
    W_interactions = 0.3
    W_scRNA = 0.25
    W_robustness = 0.45

    cluster_priority["Priority Score"] = (
        W_interactions * cluster_priority["Num_Robust_Interactions"] +
        W_scRNA * cluster_priority["Mean_scRNA_score"] +
        W_robustness * cluster_priority["Mean_Robustness"]
    )

    #**Step 9: Apply MinMax Scaling to Priority Score**
    scaler = MinMaxScaler(feature_range=(0.05, 1))
    cluster_priority["Priority Score"] = scaler.fit_transform(cluster_priority[["Priority Score"]])

    #**Step 10: Generate Prioritized PPI pairs**
    def generate_ppi_pairs(row, poi):
        """Generates PPI pairs while ensuring no self-interactions."""
        interactors = row["Contributing_Interactors"].split(", ")
        interactors = [partner for partner in interactors if partner != poi]
        return ", ".join([f"{poi}-{partner}" for partner in interactors])

    cluster_priority["Prioritized_PPI"] = cluster_priority.apply(lambda row: generate_ppi_pairs(row, protein_of_interest), axis=1)

    #**Step 11: Keep output format consistent**
    cluster_priority = cluster_priority[[
        "Tissue", "Cell Type", "Total_Cell_Count",
        "POI_Expression", "%Cells_Expressing_POI", "%Cells_Expressing_Interactors",
        "Num_Robust_Interactions", "Contributing_Interactors", "Prioritized_PPI",
        "Mean_scRNA_score", "Mean_Robustness", "Priority Score"
    ]]

    #**Step 12: Sort by Num_Robust_Interactions (primary) and Priority Score (secondary)**
    cluster_priority = cluster_priority.sort_values(by=["Num_Robust_Interactions", "Priority Score"], ascending=[False, False]).reset_index(drop=True)

    #**Step 13: Save output file with correct naming**
    output_file = os.path.join(input_dir, file.replace("_S1_H1", "_Ranked_Priority_Final"))
    cluster_priority.to_excel(output_file, index=False)

    print(f"✅ Successfully saved prioritized clusters: {output_file}")

    #**Step 14: Display Top Ranked Clusters**
    print("\n🏆 Top 10 Ranked Clusters:\n", cluster_priority.head(10))

📂 Found 51 files for processing.

📂 Processing Priority Ranking for: ./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/TPM3_PPI_filtered_S1_H1_17042025.xlsx
🧬 Identified Protein of Interest: TPM3
✅ Successfully saved prioritized clusters: ./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/TPM3_PPI_filtered_Ranked_Priority_Final_17042025.xlsx

🏆 Top 10 Ranked Clusters:
         Tissue                                  Cell Type  Total_Cell_Count  \
0  bone marrow              megakaryocyte progenitor cell              2656   
1  bone marrow            pre-conventional dendritic cell              3704   
2        blood  activated CD4-positive, alpha-beta T cell             14124   
3       kidney   effector CD4-positive, alpha-beta T cell              2205   
4        blood  activated CD8-positive, alpha-beta T cell              6248   
5  bone marrow                  common myeloid pr

In [9]:
#Manually compute ranked PPI list for LITAF_PPI_filtered_V2_noUBA52_17042025.xlsx 

input_dir = "./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/"

#**Check if LITAF file exists in input directory**
litaf_filename = "LITAF_PPI_filtered_V2_noUBA52_17042025.xlsx"
litaf_path = os.path.join(input_dir, litaf_filename)

#**Check if LITAF file exists**
if not os.path.exists(litaf_path):
    print(f"❌ ERROR: LITAF file not found in {input_dir}!")
    exit()

print(f"\n📂 Processing Priority Ranking for: {litaf_path}")
protein_of_interest = "LITAF"  # Manually set POI

#**Read the dataset**
df = pd.read_excel(litaf_path)

#**Ensure necessary columns exist**
required_columns = {"Tissue", "Cell Type", "Gene Symbol", "%Cells Expressing Gene", "Expression", "Cell Count"}

if not required_columns.issubset(df.columns):
    print(f"❌ ERROR: Missing required columns in {litaf_filename}. Skipping...")
    exit()

#**Step 1: Identify Robust Interactions (≥30% Expression)**
interaction_threshold = 30
df["Robust Interaction"] = df["%Cells Expressing Gene"] >= interaction_threshold

#**Step 2: Extract POI Expression & %Cells Expressing POI BEFORE groupby**
poi_df = df[df["Gene Symbol"] == protein_of_interest][["Tissue", "Cell Type", "Expression", "%Cells Expressing Gene"]]
poi_df.rename(columns={"Expression": "POI_Expression", "%Cells Expressing Gene": "%Cells_Expressing_POI"}, inplace=True)

#**Step 3: Compute Cluster-Level Metrics**
cluster_priority = df.groupby(["Tissue", "Cell Type"]).agg(
    Num_Robust_Interactions=("Gene Symbol", lambda x: df.loc[x.index, "Robust Interaction"].sum()),
    Contributing_Interactors=("Gene Symbol", lambda x: ", ".join(sorted(x[df.loc[x.index, "Robust Interaction"]]))),
    Mean_scRNA_score=("scRNA_score", "mean"),
    Total_Cell_Count=("Cell Count", "max")
).reset_index()

#**Step 4: Extract %Cells Expressing Interactors**
interactors_df = df[df["Robust Interaction"]][["Tissue", "Cell Type", "%Cells Expressing Gene"]]
interactors_avg = interactors_df.groupby(["Tissue", "Cell Type"])["%Cells Expressing Gene"].mean().reset_index()
interactors_avg.rename(columns={"%Cells Expressing Gene": "%Cells_Expressing_Interactors"}, inplace=True)

#**Step 5: Merge POI Expression & %Cells Expressing POI Values**
cluster_priority = cluster_priority.merge(poi_df, on=["Tissue", "Cell Type"], how="left")
cluster_priority = cluster_priority.merge(interactors_avg, on=["Tissue", "Cell Type"], how="left")

#**Step 6: Compute Mean Robustness**
cluster_priority["Mean_Robustness"] = (cluster_priority["%Cells_Expressing_POI"] + cluster_priority["%Cells_Expressing_Interactors"]) / 2

#**Step 7: Ensure valid data remains after processing**
if cluster_priority.empty:
    print(f"⚠️ No valid data remaining for {litaf_filename}. Skipping...")
    exit()

#**Step 8: Compute Priority Score**
W_interactions = 0.3
W_scRNA = 0.25
W_robustness = 0.45

cluster_priority["Priority Score"] = (
    W_interactions * cluster_priority["Num_Robust_Interactions"] +
    W_scRNA * cluster_priority["Mean_scRNA_score"] +
    W_robustness * cluster_priority["Mean_Robustness"]
)

#**Step 9: Apply MinMax Scaling to Priority Score**
scaler = MinMaxScaler(feature_range=(0.05, 1))
cluster_priority["Priority Score"] = scaler.fit_transform(cluster_priority[["Priority Score"]])

#**Step 10: Generate Prioritized PPI pairs**
def generate_ppi_pairs(row, poi):
    """Generates PPI pairs while ensuring no self-interactions."""
    interactors = row["Contributing_Interactors"].split(", ")
    interactors = [partner for partner in interactors if partner != poi]
    return ", ".join([f"{poi}-{partner}" for partner in interactors])

cluster_priority["Prioritized_PPI"] = cluster_priority.apply(lambda row: generate_ppi_pairs(row, protein_of_interest), axis=1)

#**Step 11: Keep output format consistent**
cluster_priority = cluster_priority[[
    "Tissue", "Cell Type", "Total_Cell_Count",
    "POI_Expression", "%Cells_Expressing_POI", "%Cells_Expressing_Interactors",
    "Num_Robust_Interactions", "Contributing_Interactors", "Prioritized_PPI",
    "Mean_scRNA_score", "Mean_Robustness", "Priority Score"
]]

#**Step 12: Sort by Num_Robust_Interactions & Priority Score**
cluster_priority = cluster_priority.sort_values(by=["Num_Robust_Interactions", "Priority Score"], ascending=[False, False]).reset_index(drop=True)

#**Step 13: Save output file with correct naming**
output_file = os.path.join(input_dir, "LITAF_CELLxGENE_gene_expression_Ranked_Priority_Final_V2_noUBA52_17042025.xlsx")
cluster_priority.to_excel(output_file, index=False)

print(f"✅ Successfully saved prioritized clusters for LITAF: {output_file}")

#**Step 14: Display Top ranked Clusters**
print("\n🏆 Top 10 Ranked Clusters for LITAF:\n", cluster_priority.head(10))


📂 Processing Priority Ranking for: ./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/LITAF_PPI_filtered_V2_noUBA52_17042025.xlsx
✅ Successfully saved prioritized clusters for LITAF: ./results_11032025/Jess_PPI_21032025/PPI_preprocessed_15042025/PPI_contextualization/filtered_Step1_H1/LITAF_CELLxGENE_gene_expression_Ranked_Priority_Final_V2_noUBA52_17042025.xlsx

🏆 Top 10 Ranked Clusters for LITAF:
            Tissue                                        Cell Type  \
0           heart                                       neutrophil   
1           heart                                      granulocyte   
2           heart                                       blood cell   
3   adrenal gland  effector memory CD8-positive, alpha-beta T cell   
4   adrenal gland                                    memory T cell   
5  adipose tissue                             innate lymphoid cell   
6  adipose tissue                              natural

Each {POI}_filtered_Ranked_Priority_Final_17042025.xlsx has the following structure

| Column Name                     | Description                                                                 |
|--------------------------------|-----------------------------------------------------------------------------|
| Tissue                         | Name of the tissue where the cluster was identified in CZ Cell*Gene                         |
| Cell Type                      | Specific cell type/state within the tissue                                       |
| Total_Cell_Count               | Total number of cells profiled in that tissue–cell type combination        |
| POI_Expression                 | Average normalized expression level of the protein of interest (POI) in this cluster  |
| %Cells_Expressing_POI         | Percentage of cells in the cluster expressing the POI                      |
| %Cells_Expressing_Interactors | Average percentage of cells expressing the interactors in the cluster      |
| Num_Robust_Interactions        | Number of interactors co-expressed with ≥30% expression cut-off in the cluster     |
| Contributing_Interactors       | List of interactors contributing to robust co-expression with the POI      |
| Prioritized_PPI                | Formatted PPI pairs (POI–Interactor) prioritized for each cluster          |
| Mean_scRNA_score               | Average scRNA-based functional score for all genes in the cluster          |
| Mean_Robustness                | Mean of `%Cells_Expressing_POI` & `%Cells_Expressing_Interactors`        |
| Priority Score                 | Final scaled score combining robustness, interactions, and expression (not necessary to use)      |
