<div style="border:solid blue 2px; padding: 20px">

**Overall Summary of the Project**

Hi Bailey! You’ve done a great job tuning a Random Forest and presenting validation/test accuracy along with classification metrics. Here’s some feedback, step by step:

1. **Introduction & Context**  
   - 🔴 *Missing:* Add a brief intro at the top explaining the business goal (recommend Smart vs. Ultra) and the 0.75 accuracy threshold.
     
   - 📋 *Why:* Sets context before launching into code.

2. **Data Exploration**  
   - ✅ You load and display the first rows with `df.head()`.  
   - 🔄 *Suggestion:* Show `df.info()` and `df.describe()` to confirm no missing values, and print `df['is_ultra'].value_counts(normalize=True)` to highlight class balance.

3. **Data Splitting**  
   <code>
   X_temp, X_test, y_temp, y_test = train_test_split(…)  
   X_train, X_valid, y_train, y_valid = train_test_split(…)  
   </code>  
   - ✅ Correct 60/20/20 split with `random_state=42`.  
   - 📌 *Tip:* Pull these two splits into one consolidated block at the top so all models share the same subsets.

4. **Model & Hyperparameter Tuning**  
   <code>
   grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=3, scoring='accuracy')  
   grid_search.fit(X_train, y_train)  
   best_model = grid_search.best_estimator_  
   </code>  
   - ✅ You use `GridSearchCV` to tune `n_estimators`, `max_depth`, and `min_samples_split`.  
   - 🔍 *Suggestion:* Print out `grid_search.best_params_` so readers know exactly which combination was selected.

5. **Validation & Test Evaluation**  
   <code>
   valid_accuracy = accuracy_score(y_valid, valid_preds)  
   test_accuracy  = accuracy_score(y_test,  test_preds)  
   </code>  
   - ✅ You report both validation (79.5%) and test (82.1%) accuracy.  
   - 📈 *Next step:* Compare against a simple baseline:  
     <code>
     from sklearn.dummy import DummyClassifier  
     dummy = DummyClassifier(strategy='most_frequent')  
     dummy.fit(X_train, y_train)  
     print('Baseline accuracy:', dummy.score(X_test, y_test))  
     </code>  
     This shows how much you’ve improved over “always predict Smart.”

6. **Detailed Metrics & Insights**  
   <code>
   print(classification_report(y_test, test_preds))  
   print(confusion_matrix(y_test, test_preds))  
   </code>  
   - ✅ Great use of precision, recall, and F1.  
   - 🔄 *Suggestion:* Comment on what these numbers mean—e.g., “Recall for Ultra is only 0.54, so about half of the true Ultra users are missed.”

7. **Conclusion & Next Steps**  
   - 🔄 *Critical:* Add a closing section that:
     1. **Restates** the final test accuracy and best hyperparameters.
     2. **Reflects** on model strengths (high Smart-plan accuracy) and weaknesses (Ultra recall).
     3. **Proposes** improvements such as class weighting, SMOTE for Ultra class, or trying other models (e.g., XGBoost).

---

**Overall Summary**

- **Strengths:** Solid GridSearch, clear evaluation, and thorough metrics.  
- **Opportunities:** Context-setting intro, baseline comparison, deeper discussion of confusion matrix insights, and a richer conclusion with next steps.

You’re on the right track—these tweaks will make your report even clearer and more actionable. Keep up the great work! 🚀 

---

**Status: waiting for changes**

Do not hesitate to reach out to me if you have any questions regarding your review :) We are here to help you succeed!

**Reviewer: Matias - Discord: mcoustasse**

<div style="border:solid blue 2px; padding: 20px">

**Overall Summary of the Project Iter 2**

Congrats on your approval, Bailey! ;)

Megaline, a mobile phone company, noticed that a lot of their customers are still using old plans. They want to figure out which of their newer plans — Smart or Ultra — would be the best fit for each user based on how they use their phone. In this project, we're building a model that looks at user behavior and predicts whether they should be on the Smart or Ultra plan. It's a binary classification problem, and we’re aiming for at least **75% accuracy** on the test data to make sure the model’s reliable. We’ll be using a Random Forest classifier to get the best possible results and will check performance with accuracy and classification metrics.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

df = pd.read_csv('users_behavior.csv')
df.head()

features = df.drop(columns='is_ultra')
target = df['is_ultra']

X_temp, X_test, y_temp, y_test = train_test_split(features, target, test_size=0.2, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42) # 0.25 * 0.8 = 0.2

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5]
}
model = RandomForestClassifier(random_state=42)
grid_search = GridSearchCV(model, param_grid, cv=3, scoring='accuracy')
grid_search.fit(X_train, y_train)




best_model = grid_search.best_estimator_

valid_preds = best_model.predict(X_valid)
valid_accuracy = accuracy_score(y_valid, valid_preds)
print(f"Validation Accuracy: {valid_accuracy:.4f}")


test_preds = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, test_preds)
print(f"Test Accuracy: {test_accuracy:.4f}")


print("Classification Report:")
print(classification_report(y_test, test_preds))

print("Confusion Matrix:")
print(confusion_matrix(y_test, test_preds))

Validation Accuracy: 0.7947
Test Accuracy: 0.8212
Classification Report:
              precision    recall  f1-score   support

           0       0.83      0.94      0.88       455
           1       0.78      0.54      0.64       188

    accuracy                           0.82       643
   macro avg       0.81      0.74      0.76       643
weighted avg       0.82      0.82      0.81       643

Confusion Matrix:
[[427  28]
 [ 87 101]]


In this project, I built a machine learning model to predict whether a Megaline user would switch to the Ultra or Smart plan, based on their monthly call, message, and internet usage. After performing an initial data exploration, I split the dataset into training (60%), validation (20%), and test (20%) sets. I trained a Random Forest Classifier and used GridSearchCV to tune hyperparameters such as `n_estimators`, `max_depth`, and `min_samples_split`. The final model achieved a test accuracy of **82.1%**, exceeding the required threshold of 75%. The model performed especially well on users who stayed on the Smart plan, while classification of Ultra plan users showed room for improvement.