# 👩‍💻 AutoML vs. Manual Modeling: Which One Wins?

Time Estimate: 60 minutes

## 📋 Overview

In this engaging lab activity, you'll immerse yourself in the exciting world of machine learning by directly comparing AutoML solutions with traditional manual modeling. This exercise gives you firsthand experience, allowing you to weigh the strengths and weaknesses of each approach when solving a real-world problem. By the end, you'll gain critical insights into when to choose one method over the other based on data, context, and objectives.

## 🎯 Learning Outcomes

By the end of this lab, you will be able to:

- Implement manual modeling techniques using scikit-learn  
- Perform hyperparameter tuning using GridSearchCV or RandomizedSearchCV  
- Set up and run AutoML experiments using auto-sklearn or PyCaret  
- Compare the effectiveness of AutoML and manual modeling approaches

## 🖥️ Tasks

### Task 1: Dataset Preprocessing

**Context:** Proper dataset selection and preprocessing ensures the data is clean and ready for modeling.  
**Steps:**

1. Ensure the dataset is preprocessed: handle missing data, normalize features as needed, and split into training and testing sets.

**💡 Tip:** Use `train_test_split` from `sklearn.model_selection` for data splitting.

In [None]:
# Import necessary libraries
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import autosklearn.classification

# Load data
iris = load_iris()

# Task 1: Dataset Preprocessing
# ... your code here

**⚙️ Test Your Work:**  
The dataset should show the features and corresponding labels, demonstrating the preprocessing steps.

### Task 2: Manual Modeling Approach

**Context:** Manual modeling involves constructing pipelines, choosing algorithms, and performing hyperparameter tuning.  
**Steps:**

1. Construct a pipeline manually using scikit-learn. Choose a few algorithms such as logistic regression, decision trees, or SVM for initial model construction.  
2. Perform hyperparameter tuning using `GridSearchCV` or `RandomizedSearchCV` to optimize model performance.  
3. Record the results, including accuracy, precision, recall, and computational time.

**💡 Tip:** Use `GridSearchCV` for systematic hyperparameter tuning.

In [None]:
# Task 2: Manual Modeling Approach
# ... your code here

**⚙️ Test Your Work:**  
Plots should clearly show the comparison of actual vs. predicted values for the manually tuned models.  
Legends should correctly identify each model and data series.

### Task 3: AutoML Approach using auto-sklearn

**Context:** AutoML tools automate feature engineering, model selection, and hyperparameter tuning.  
**Steps:**

1. Set up and run an AutoML experiment with your chosen dataset using auto-sklearn.
2. Compare best models based on their capability using similar metrics (accuracy, precision, recall) while noting time efficiency.

**💡 Tip:** Use `AutoSklearnClassifier` from `autosklearn`

In [None]:
# Task 3: AutoML Approach using auto-sklearn
# ... your code here

**⚙️ Test Your Work:**  
Plots should clearly show the comparison of actual vs. predicted values for the AutoML models.  
Legends should correctly identify each AutoML model and data series.


### Task 4: Compare and Analyze Results

**Context:** Comparing results helps evaluate the strengths and weaknesses of AutoML vs. manual modeling approaches.  
**Steps:**

1. Compare the results from both the manual and AutoML approaches. Identify differences in performance, time efficiency, resource usage, and overall experience.  
2. Consider the variety of models tried, accuracy trade-offs, and ease of implementation.

**💡 Tip:** Use visualizations or statistical summaries to aid comparison.

In [None]:
# Task 4: Compare and Analyze Results
# ... your code here







**⚙️ Test Your Work:**  
Plots should clearly illustrate the performance comparison between manual and AutoML approaches.

Legends should correctly identify each approach and the corresponding performance metrics.

## ✅ Success Checklist

- Successfully selected and preprocessed the dataset  
- Implemented and tuned manual modeling techniques  
- Set up and run AutoML experiments  
- Compared and analyzed results from both approaches  
- Provided reflections and recommendations based on findings

## 🔍 Common Issues & Solutions

**Problem:** Dataset not loading correctly.  
**Solution:** Verify the data source and ensure proper loading using `pandas`.

**Problem:** Hyperparameter tuning errors.  
**Solution:** Check the parameter grid and ensure compatibility with the chosen algorithm.

**Problem:** AutoML tools not functioning.  
**Solution:** Ensure correct setup and usage of auto-sklearn or PyCaret.

## 🔑 Key Points

- Manual modeling allows precise control over model construction and tuning.  
- AutoML tools simplify the process by automating feature engineering, model selection, and hyperparameter tuning.  
- Comparing results helps understand the strengths, weaknesses, and suitable applications of each approach.

## Exemplar Solution

After completing this activity (or if you get stuck!), take a moment to review the exemplar solution. This sample solution can offer insights into different techniques and approaches.

Reflect on what you can learn from the exemplar solution to improve your coding skills.

Remember, multiple solutions can exist for some problems; the goal is to learn and grow as a programmer by exploring various approaches.

Use the exemplar solution as a learning tool to enhance your understanding and refine your approach to coding challenges.

<details>    
<summary><strong>Click HERE to see an exemplar solution</strong></summary>  

## 💻 Reference Solution

```py
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import autosklearn.classification

# Load and preprocess dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Manual Model - Logistic Regression
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
log_pred = log_reg.predict(X_test)
log_acc = accuracy_score(y_test, log_pred)

# Example manual hyperparameter tuning for Decision Tree
param_grid = {'max_depth': [3, 5, 7, None]}
tree = GridSearchCV(DecisionTreeClassifier(), param_grid, scoring='accuracy')
tree.fit(X_train, y_train)
tree_pred = tree.best_estimator_.predict(X_test)
tree_acc = accuracy_score(y_test, tree_pred)

# AutoML Model - Auto-sklearn
automl = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=60, per_run_time_limit=30, memory_limit=6000)
automl.fit(X_train, y_train)
automl_pred = automl.predict(X_test)
automl_acc = accuracy_score(y_test, automl_pred)

print(f"Manual Logistic Regression Accuracy: {log_acc}")
print(f"Best Decision Tree Accuracy: {tree_acc}")
print(f"AutoML Best Model Accuracy: {automl_acc}")
```
</details>
