
## **Assignment: Dimensionality Reduction on Suitable Datasets**  

**Objective:** Apply dimensionality reduction techniques to datasets where they are most appropriate and analyze their effectiveness.  

---

### **Task Description**  
You will apply **8 dimensionality reduction methods** to **4 datasets** chosen for their compatibility with specific techniques. Your goal is to:  
1. Reduce dimensions while preserving structure.  
2. Critically evaluate why certain methods suit specific datasets.  

---


### **Datasets & Methods**  
| **Dataset**               | **Techniques**                                  | **Reason**                                                                 | **Load Data**                                                                                     |  
|---------------------------|------------------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|  
| **MNIST**                  | PCA, t-SNE, UMAP, KPCA, Autoencoder            | Image data with spatial correlations; ideal for linear/non-linear methods.| ``` from tensorflow.keras.datasets import mnist; (X_train, y_train), (X_test, y_test) = mnist.load_data() ``` |  
| **Titanic (categorical)** | MCA                                            | Mixed categorical variables (e.g., class, sex, embarked).                 | ``` import seaborn as sns; titanic = sns.load_dataset('titanic') ```                        |  
| **EEG Signals**            | ICA                                            | Blind source separation for time-series signals.                          | ``` import mne; from mne.datasets import sample; data_path = sample.data_path(); raw = mne.io.read_raw_fif(data_path / 'MEG' / 'sample' / 'sample_audvis_filt-0-40_raw.fif') ``` |  
| **Psychological Survey**  | FA                                             | Latent factor discovery in Likert-scale questionnaire data.               | ``` import numpy as np; X = np.random.randint(1, 6, size=(500, 20)) ```                     |  

---

#### **Explanation of Load Data Column**
1. **MNIST**:  
   - Use `tensorflow.keras.datasets.mnist` to load the dataset.  
   - The dataset is split into training and testing sets by default.  

2. **Titanic**:  
   - Use `seaborn.load_dataset('titanic')` to load the dataset directly.  
   - Alternatively, you can download it from Kaggle.  

3. **EEG Signals**:  
   - Use the `mne` library to load a sample EEG dataset.  
   - The `sample_audvis_filt-0-40_raw.fif` file contains preprocessed EEG data.  

4. **Psychological Survey**:  
   - Simulate synthetic data using `numpy.random.randint` to create a dataset of 500 respondents and 20 Likert-scale questions.  

---

Let me know if you need further clarification or help with implementing the dimensionality reduction techniques!
---

### **Requirements**  
####  **Data Preparation**  
   - **MNIST**: Load 10,000 samples, normalize to `[0, 1]`.  
   - **Titanic**: Use categorical features (e.g., `pclass`, `sex`, `embarked`).  
   - **EEG Signals**: Use `mne.datasets.sample.data_path()` or a synthetic signal dataset.  
   - **Psychological Survey**: Simulate data with 20 Likert-scale questions (1–5) for 500 respondents.  

####  **Dimensionality Reduction**  
   - **MNIST**: Apply PCA, t-SNE, UMAP, KPCA (RBF kernel), and a simple autoencoder (2D/3D latent space).  
   - **Titanic**: Use MCA (`prince.MCA`) on categorical features.  
   - **EEG Signals**: Apply ICA (`FastICA`) to separate 2–3 latent sources.  
   - **Psychological Survey**: Use FA (`FactorAnalysis`) to extract 2–3 latent factors.  

####  **Visualization & Evaluation**  
   - **MNIST**:  
     - Visualize 2D/3D embeddings colored by digit labels.  
     - Compute **reconstruction error (MSE)** for PCA/KPCA/Autoencoder.  
     - Train a logistic regression classifier on 2D features and report accuracy.  
   - **Titanic**:  
     - Plot MCA embeddings colored by `survived` status.  
     - Interpret category contributions to dimensions.  
   - **EEG Signals**:  
     - Plot separated ICA components and compare to raw signals.  
   - **Psychological Survey**:  
     - Interpret FA factors (e.g., "neuroticism" vs. "openness").  

####  **Comparison**  
   - For MNIST methods: Compare runtime, reconstruction error, and classification accuracy.  
   - For all methods: Write a brief reflection on why the dataset was suitable for the technique.  

####  **Critical Analysis**  
Answer:  
   - Why is MCA a better fit for Titanic than MNIST?  
   - How does ICA’s assumption of non-Gaussianity help with EEG signals?  
   - When would you prefer FA over PCA for survey data?  
   - Why do autoencoders outperform PCA on MNIST (if they do)?  

---

### **Deliverables**  
1. **Code**: Separate Python scripts/Jupyter notebooks for each dataset.  
2. **Visualizations**: Embeddings, component plots, and factor interpretations.  
---


### **Tips**  
- For ICA on EEG data, use `mne.preprocessing.ICA` for practical relevance.  
- Simulate survey data using `np.random.randint(1, 6, size=(500, 20))`.  
- Use `plotly` for interactive 3D visualizations of MNIST embeddings.  

---


**Good Luck!**  
*“Without data, you’re just a person with an opinion.” – W. Edwards Deming*  

---
