
# 🧪 Autonomous Activity: Dimensionality Reduction on Embryo Development Timelapse

In this hands-on activity, you will apply a variety of dimensionality reduction techniques to analyze real microscopy data. The dataset contains time-lapse images of normal and mutant embryos. Each image stack has approximately 450 frames.

---

**🎯 Objectives:**
- Explore, visualize, and preprocess multi-frame `.tif` images.
- Test and compare different data normalization strategies (e.g., [0,1] scaling vs StandardScaler).
- Use PCA, SVD, t-SNE, UMAP, and Autoencoders to extract and visualize developmental trajectories.
- Identify biological differences in developmental dynamics between embryo types.
- Reflect on the advantages, limitations, and behaviors of each technique.

📁 **Dataset:** [Google Drive Link](https://drive.google.com/drive/folders/1_qxqm-v5yCrme3pAW2rjyOOXIeQDuV54?usp=drive_link)



## 1. Load and Explore the Dataset

Each `.tif` file contains ~450 grayscale frames. Your first task is to:
- Load the 3 `.tif` files using `tifffile.imread`.
- Normalize each image stack using two strategies:
  - [0, 1] Min-Max normalization
  - Standardization using `StandardScaler`
- Plot a few representative frames across time for each embryo.



## 2. Preprocess the Data

Flatten each frame and construct a matrix `X` with shape `(n_frames, n_pixels)` for each normalization method. Then create a label vector:

- `label = 0` → control embryo
- `label = 1` → mutant 1
- `label = 2` → mutant 2

This step will prepare the input data for dimensionality reduction.



> 🧪 **Experiment**: Try running all dimensionality reduction methods with both versions of the input (`X_minmax` and `X_standard`) and compare how they affect embeddings and separability.



## 3. PCA (Principal Component Analysis)

Apply PCA on both normalized datasets:
- Visualize the 2D PCA embedding colored by embryo type.
- Compare the separation between classes for `X_minmax` and `X_standard`.
- Plot the **explained variance ratio** and **cumulative variance** for each.

> 🧠 Tip: Use `PCA(n_components=2)` for plotting and `PCA(n_components=50)` to analyze cumulative variance.



## 4. SVD (Singular Value Decomposition)

Apply SVD to both datasets and analyze:
- The decay of singular values on a log scale.
- The cumulative energy of singular values.
- Compare how quickly each normalization captures energy.

> 🔍 Insight: SVD reveals the inherent structure of your dataset. A sharper drop often suggests stronger linear compressibility.



## 5. t-SNE

Use t-SNE to capture local structure and dynamics:
- Run t-SNE with different perplexities `[5, 30, 100]`.
- Plot the 2D embeddings and analyze how clusters behave.
- Compare results between both normalized inputs.

> ⏳ Note: t-SNE is computationally intensive and sensitive to hyperparameters.



## 6. UMAP

Use UMAP for global structure visualization:
- Try combinations of `n_neighbors` and `min_dist`.
- Compare embeddings across normalization methods.
- Observe both local clustering and trajectory smoothness.

> 📌 UMAP is faster and often preserves better continuity in developmental trajectories.



## 7. Autoencoder

Train a neural autoencoder to learn a 2D latent space:
- Architecture: `input → 128 → 32 → 2 → 32 → 128 → output`
- Use ReLU activations and MSE loss.
- Plot the latent 2D representations colored by embryo.

> 💡 Autoencoders are flexible nonlinear methods that may capture dynamics not seen by linear projections.



## 8. Final Reflection

Write a short report answering the following:

1. What differences in developmental dynamics did you observe?
2. Which method best captured biologically relevant features?
3. Which normalization (Min-Max or StandardScaler) led to better embeddings?
4. At which point do mutant trajectories diverge from normal?
5. How consistent were the results across methods (PCA, t-SNE, UMAP, Autoencoders)?

> 📝 Submit this as a 1-page summary or short presentation.
