# Module 4: Practice Exercises

Welcome to the practice exercises for Module 4. These questions will test your understanding of the core Scikit-Learn workflow, from data preparation to model training and evaluation.

**Instructions:**
1.  For each problem, read the description and write the necessary code in the cell below it.
2.  Run the cell to see the output and check if your solution is correct.
3.  Solutions are provided at the end of the notebook for you to compare against.

---

### Exercise 1: Train-Test Split

**Task:** You are given features `X` and a target `y` from the Iris dataset. Use Scikit-Learn to split this data into training and testing sets. Use 30% of the data for testing and set `random_state=42` for reproducibility. Print the shapes of `X_train` and `X_test`.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X = iris.data
y = iris.target

# Your code here

### Exercise 2: Training a Regression Model

**Task:** You are given a simple dataset where `X` is the number of hours studied and `y` is the exam score. Train a `LinearRegression` model on this data. After training, predict the score for a student who studied for 9 hours.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Data: Hours Studied vs. Exam Score
X_study = np.array([2, 3, 5, 6, 8]).reshape(-1, 1)
y_score = np.array([65, 70, 78, 80, 88])

# Your code here

### Exercise 3: Training a Classification Model

**Task:** Using the `Social_Network_Ads.csv` dataset, train a `RandomForestClassifier` to predict whether a user will purchase a product. Use 'Age' and 'EstimatedSalary' as features. After training the model on the full dataset (no train-test split for this exercise), evaluate its accuracy.

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

df_ads = pd.read_csv('../02_Data_Analysis_and_Wrangling/data/Social_Network_Ads.csv')

# Your code here

### Exercise 4: Evaluating a Classifier

**Task:** You are given the true labels and the predicted labels from a classification model. Generate and print the **classification report** to see the precision, recall, and f1-score for each class.

In [None]:
from sklearn.metrics import classification_report

y_true = [0, 1, 1, 0, 1, 0, 0, 1, 1, 1]
y_pred = [0, 1, 0, 0, 1, 0, 1, 1, 1, 0]

# Your code here

### Exercise 5: Feature Scaling

**Task:** You have a simple dataset with two features that have very different scales. Use the `StandardScaler` to scale the features. Print the original and the scaled data.

In [None]:
from sklearn.preprocessing import StandardScaler

data = np.array([[25, 50000], [35, 120000], [45, 80000]])

# Your code here

---

## Solutions

**Solution 1:**
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of X_test: {X_test.shape}")
```

**Solution 2:**
```python
import numpy as np
from sklearn.linear_model import LinearRegression
X_study = np.array([2, 3, 5, 6, 8]).reshape(-1, 1)
y_score = np.array([65, 70, 78, 80, 88])
model = LinearRegression()
model.fit(X_study, y_score)
predicted_score = model.predict([[9]])
print(f"Predicted score for 9 hours of study: {predicted_score[0]:.2f}")
```

**Solution 3:**
```python
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
df_ads = pd.read_csv('../02_Data_Analysis_and_Wrangling/data/Social_Network_Ads.csv')
X = df_ads[['Age', 'EstimatedSalary']]
y = df_ads['Purchased']
model = RandomForestClassifier(random_state=42)
model.fit(X, y)
predictions = model.predict(X)
accuracy = accuracy_score(y, predictions)
print(f"Model accuracy: {accuracy:.2f}")
```

**Solution 4:**
```python
from sklearn.metrics import classification_report
y_true = [0, 1, 1, 0, 1, 0, 0, 1, 1, 1]
y_pred = [0, 1, 0, 0, 1, 0, 1, 1, 1, 0]
print(classification_report(y_true, y_pred))
```

**Solution 5:**
```python
from sklearn.preprocessing import StandardScaler
import numpy as np
data = np.array([[25, 50000], [35, 120000], [45, 80000]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
print("Original Data:")
print(data)
print("\nScaled Data:")
print(scaled_data)
```