## Copairs
* **Details of the analysis in this notebook:**
* **Data from :** CDoT
* **Plates compared:**
    * BR00122248 -
    * BR00122249 -
* **Objective:** To understand the mAP of the plates stained with the new set of dyes.
* **Normalization:** Negcon normalization
* **mAP calculation:** mAP is calculated as difference to controls.

In [1]:
import logging
from pathlib import Path

import numpy as np
import pandas as pd
from copairs.map import run_pipeline

logging.basicConfig(format="%(levelname)s:%(asctime)s:%(name)s:%(message)s")
logging.getLogger("copairs").setLevel(logging.INFO)

In [2]:
### Reading the dataframe

### Load batches

In [3]:

names_batches = {"batch3": "2023_05_17_Batch3", "batch5": "2023_08_02_Batch5"}
batches = {
    name: pd.read_csv(
        Path("gct") / batch / f"{batch}_normalized_feature_select_negcon_batch.csv.gz"
    )
    for name, batch in names_batches.items()
}

### Analysis - Plate wise with respect to control DMSO wells
#### Defining parameters to compute map

In [4]:
pert_col = "Metadata_broad_sample"
control_col = "Metadata_control_type"

In [5]:
pos_sameby = [pert_col]
pos_diffby = []

neg_sameby = []
neg_diffby = [control_col]
null_size = 10000

### Batch 3 and 5


In [6]:
copairs_dir = Path("copairs_csv")
aggregated = {}
for name, batch in batches.items():
    metadata_names = [c for c in batch.columns if c.startswith("Metadata")]
    feature_names = [c for c in batch.columns if not c.startswith("Metadata")]
    feats = batch[feature_names].values
    dframe = batch[metadata_names]
    dframe[control_col].fillna("trt", inplace=True)
    result = run_pipeline(
        dframe, feats, pos_sameby, pos_diffby, neg_sameby, neg_diffby, null_size
    )
    result.to_csv(copairs_dir / f"Result_Negcon_wrt_Controls_{name}.csv")
    from copairs.map import aggregate

    aggregated[name] = aggregate(result, sameby=pos_sameby, threshold=0.05)
    aggregated[name].to_csv(
        copairs_dir / f"Aggregate_result_Negcon_wrt_Controls_{name}.csv"
    )

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dframe[control_col].fillna("trt", inplace=True)
INFO:2023-10-20 10:10:51,350:copairs:Indexing metadata...


INFO:2023-10-20 10:10:51,357:copairs:Finding positive pairs...


INFO:2023-10-20 10:10:51,359:copairs:Finding negative pairs...


INFO:2023-10-20 10:10:51,388:copairs:Computing positive similarities...


  0%|          | 0/1 [00:00<?, ?it/s]

INFO:2023-10-20 10:10:51,417:copairs:Computing negative similarities...


  0%|          | 0/1 [00:00<?, ?it/s]

INFO:2023-10-20 10:10:51,600:copairs:Building rank lists...


INFO:2023-10-20 10:10:51,605:copairs:Computing average precision...


  ap_scores = np.add.reduceat(pr_k * rel_k_list, cutoffs) / num_pos
INFO:2023-10-20 10:10:51,607:copairs:Computing p-values...


  0%|          | 0/2 [00:00<?, ?it/s]

INFO:2023-10-20 10:10:51,628:copairs:Creating result DataFrame...


INFO:2023-10-20 10:10:51,630:copairs:Finished.


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dframe[control_col].fillna("trt", inplace=True)
INFO:2023-10-20 10:10:51,681:copairs:Indexing metadata...


INFO:2023-10-20 10:10:51,688:copairs:Finding positive pairs...


INFO:2023-10-20 10:10:51,690:copairs:Finding negative pairs...


INFO:2023-10-20 10:10:51,765:copairs:Computing positive similarities...


  0%|          | 0/1 [00:00<?, ?it/s]

INFO:2023-10-20 10:10:51,787:copairs:Computing negative similarities...


  0%|          | 0/1 [00:00<?, ?it/s]

INFO:2023-10-20 10:10:51,935:copairs:Building rank lists...


INFO:2023-10-20 10:10:51,940:copairs:Computing average precision...


  ap_scores = np.add.reduceat(pr_k * rel_k_list, cutoffs) / num_pos
INFO:2023-10-20 10:10:51,941:copairs:Computing p-values...


  0%|          | 0/2 [00:00<?, ?it/s]

INFO:2023-10-20 10:10:51,963:copairs:Creating result DataFrame...


INFO:2023-10-20 10:10:51,964:copairs:Finished.


#### Merge all results

In [7]:

combined_df = pd.merge(
    *aggregated.values(),
    on="Metadata_broad_sample",
    suffixes=[f"_{batch}" for batch in aggregated.keys()],
)

In [8]:
moa_metadata = pd.read_csv(copairs_dir / "LC00009948_MoA_Common_Names.csv")
moa_metadata = moa_metadata.rename(columns={"BRD with batch": "Metadata_broad_sample"})

##### Extracting BRD ID from BROAD sample name

In [9]:
def BRD_ID(i):
    if type(i) != float:
        ID = i.split("-")
        return ID[1]

In [10]:
combined_df["BRD ID"] = combined_df["Metadata_broad_sample"].map(BRD_ID)
combined_moa_df = pd.merge(combined_df, moa_metadata, on="BRD ID")

### Generating columns for difference in mAP

In [11]:

combined_moa_df["batch3_vs_batch5"] = (
    combined_moa_df["mean_average_precision_batch3"]
    - combined_moa_df["mean_average_precision_batch5"]
)

In [12]:
combined_moa_df.to_csv(
    copairs_dir / "PrecisionValues_with_MoA_allplates_Negcon_wrt_Controls_48and49.csv"
)