# m3DinAI — Demo Notebook (Synthetic Data)

This demo shows the **end-to-end idea** of the m3DinAI pipeline using **synthetic data**.

**What you will see:**
- Create a small toy dataset that mimics extracted features + treatments.
- Standardize features and compute a UMAP embedding.
- Visualize the embedding with matplotlib.
- Run a simple Welch's t-test vs DMSO on a toy feature.

> For real experiments, run the Python scripts in `src/` on your imaging data and adjust paths as described in the README.


In [None]:
# Imports
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from scipy.stats import ttest_ind
import umap

print("Libraries loaded.")

## 1) Create a synthetic feature table
We simulate a small dataset with a few features and treatment labels and save it as an Excel file to mimic the pipeline outputs.

In [None]:
np.random.seed(42)
n = 180
treatments = np.random.choice(["DMSO", "Anthracyclines", "Taxane"], size=n, p=[0.34, 0.33, 0.33])

# Simulate three simple features with slight treatment effects
base = np.random.normal(0, 1, (n, 3))
shift = np.zeros((n, 3))
shift[treatments == "Anthracyclines", 0] += 0.6
shift[treatments == "Taxane", 1]       += 0.6
shift[treatments == "DMSO", 2]         += 0.2
X = base + shift

df = pd.DataFrame(
    X, columns=["Area", "Perimeter", "Circularity"]
)
df["Treatment"] = treatments

# Create results folder and save example file
os.makedirs("results/demo", exist_ok=True)
df.to_excel("results/demo/demo_spheroid_features_trat.xlsx", index=False)
df.head()

## 2) UMAP embedding on standardized features
We standardize numerical columns and compute a 2D UMAP embedding for quick visualization.

In [None]:
num_cols = ["Area", "Perimeter", "Circularity"]
X_scaled = StandardScaler().fit_transform(df[num_cols])

reducer = umap.UMAP(n_neighbors=15, min_dist=0.1, random_state=42)
emb = reducer.fit_transform(X_scaled)
emb[:3]

## 3) Plot UMAP (matplotlib)
We color by treatment codes (no custom colors used).

In [None]:
codes = pd.Categorical(df["Treatment"]).codes
plt.figure(figsize=(6, 5))
plt.scatter(emb[:, 0], emb[:, 1], c=codes, s=25)
plt.xlabel("UMAP 1")
plt.ylabel("UMAP 2")
plt.title("UMAP demo (synthetic data)")
plt.grid(True)
plt.show()

## 4) Simple Welch's t-test vs DMSO
We compare one toy feature (e.g., `Area`) between each treatment and DMSO using Welch's t-test.

In [None]:
def p_to_star(p):
    return ("****" if p < 1e-4 else
            "***"  if p < 1e-3 else
            "**"   if p < 1e-2 else
            "*"    if p < 0.05 else
            "ns")

groups = {t: df.loc[df["Treatment"] == t, "Area"].values for t in df["Treatment"].unique()}
dmso = groups["DMSO"]

results = {}
for t in ["Anthracyclines", "Taxane"]:
    stat, p_raw = ttest_ind(groups[t], dmso, equal_var=False)
    results[t] = {"t": stat, "p_value": float(p_raw), "sig": p_to_star(p_raw)}

results

## 5) Save quick summary to CSV
We save the summary table under `results/demo/`.

In [None]:
summary_df = (
    pd.DataFrame(results).T
    .reset_index()
    .rename(columns={"index": "Treatment"})
)
summary_path = "results/demo/welch_summary_demo.csv"
summary_df.to_csv(summary_path, index=False)
summary_df

### Done
- Synthetic features written to: `results/demo/demo_spheroid_features_trat.xlsx`
- Welch summary: `results/demo/welch_summary_demo.csv`
- Example UMAP figure displayed above.

You can adapt this notebook to load real feature tables produced by `src/1_feature_extraction.py` and continue the exact same analysis steps.