# <h3 align="center">__Module 10 Activity__</h3>
# <h3 align="center">__Assigned at the start of Module 10__</h3>
# <h3 align="center">__Due at the end of Module 10__</h3><br>



# Weekly Discussion Forum Participation

Each week, you are required to participate in the module’s discussion forum. The discussion forum consists of the week's Module Activity, which is released at the beginning of the module. You must complete/attempt the activity before you can post about the activity and anything that relates to the topic. 

## Grading of the Discussion

### 1. Initial Post:
Create your thread by **Day 5 (Saturday night at midnight, PST).**

### 2. Responses:
Respond to at least two other posts by **Day 7 (Monday night at midnight, PST).**

---

## Grading Criteria:

Your participation will be graded as follows:

### Full Credit (100 points):
- Submit your initial post by **Day 5.**
- Respond to at least two other posts by **Day 7.**

### Half Credit (50 points):
- If your initial post is late but you respond to two other posts.
- If your initial post is on time but you fail to respond to at least two other posts.

### No Credit (0 points):
- If both your initial post and responses are late.
- If you fail to submit an initial post and do not respond to any others.

---

## Additional Notes:

- **Late Initial Posts:** Late posts will automatically receive half credit if two responses are completed on time.
- **Substance Matters:** Responses must be thoughtful and constructive. Comments like “Great post!” or “I agree!” without further explanation will not earn credit.
- **Balance Participation:** Aim to engage with threads that have fewer or no responses to ensure a balanced discussion.

---

## Avoid:
- A number of posts within a very short time-frame, especially immediately prior to the posting deadline.
- Posts that complement another post, and then consist of a summary of that.


# Module Activity: Exploring and Applying Unsupervised Learning Techniques

## Objectives:
1. Practice implementing basic unsupervised learning algorithms using Python.
2. Visualize and interpret results from clustering, dimensionality reduction, and anomaly detection.
3. Understand the role of distance metrics, eigenvalues, and thresholds in algorithm performance.

---

## Instructions:

### Step 1: Data Exploration
- Work with the iris dataset loaded using Python's `scikit-learn`.
- Explore the dataset by plotting the data points to observe natural groupings and potential outliers.

#### Code Snippet:
```python
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

data = load_iris()
X = data.data[:, :2]  # Use the first two features
plt.scatter(X[:, 0], X[:, 1], c='gray')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Data Exploration')
plt.show()


## Step 2: Clustering with K-Means
- Implement K-Means clustering using `scikit-learn`, choosing \( k=3 \).
- Visualize the results with cluster centroids and assigned labels.
- Discuss how initialization affects the results by trying multiple random seeds.

### Code Snippet:
```python
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50)
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='X', s=100)
plt.title('K-Means Clustering')
plt.show()


## Step 3: Dimensionality Reduction with PCA (20 Minutes)
- Perform Principal Component Analysis (PCA) to reduce the dataset to 2D.
- Visualize the transformed dataset and discuss how PCA retains variance.

### Code Snippet:
```python
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(data.data)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=data.target, cmap='viridis', s=50)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA Dimensionality Reduction')
plt.show()


## Step 4: Anomaly Detection with DBSCAN (20 Minutes)
- Apply DBSCAN to detect anomalies in the dataset.
- Tune the `eps` and `min_samples` parameters and visualize outliers.
- Discuss how varying these parameters impacts results.

### Code Snippet:
```python
from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, min_samples=5)
dbscan_labels = dbscan.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=dbscan_labels, cmap='viridis', s=50)
plt.title('DBSCAN Anomaly Detection')
plt.show()


## Discussion Questions:
1. What differences did you observe when changing the parameters for K-Means and DBSCAN?
2. How does PCA simplify data while retaining key patterns?
3. Why are distance metrics crucial in these algorithms, and how would results change with different metrics?

---

## Deliverables:
Students will:
- Submit their modified code and plots for each task.
- Write a brief reflection (2-3 sentences per task) on what they learned and found challenging.
