Megaline, a mobile phone company, noticed that a lot of their customers are still using old plans. They want to figure out which of their newer plans — Smart or Ultra — would be the best fit for each user based on how they use their phone. In this project, we're building a model that looks at user behavior and predicts whether they should be on the Smart or Ultra plan. It's a binary classification problem, and we’re aiming for at least **75% accuracy** on the test data to make sure the model’s reliable. We’ll be using a Random Forest classifier to get the best possible results and will check performance with accuracy and classification metrics.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

df = pd.read_csv('users_behavior.csv')
df.head()

features = df.drop(columns='is_ultra')
target = df['is_ultra']

X_temp, X_test, y_temp, y_test = train_test_split(features, target, test_size=0.2, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42) # 0.25 * 0.8 = 0.2

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5]
}
model = RandomForestClassifier(random_state=42)
grid_search = GridSearchCV(model, param_grid, cv=3, scoring='accuracy')
grid_search.fit(X_train, y_train)




best_model = grid_search.best_estimator_

valid_preds = best_model.predict(X_valid)
valid_accuracy = accuracy_score(y_valid, valid_preds)
print(f"Validation Accuracy: {valid_accuracy:.4f}")


test_preds = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, test_preds)
print(f"Test Accuracy: {test_accuracy:.4f}")


print("Classification Report:")
print(classification_report(y_test, test_preds))

print("Confusion Matrix:")
print(confusion_matrix(y_test, test_preds))

Validation Accuracy: 0.7947
Test Accuracy: 0.8212
Classification Report:
              precision    recall  f1-score   support

           0       0.83      0.94      0.88       455
           1       0.78      0.54      0.64       188

    accuracy                           0.82       643
   macro avg       0.81      0.74      0.76       643
weighted avg       0.82      0.82      0.81       643

Confusion Matrix:
[[427  28]
 [ 87 101]]


In this project, I built a machine learning model to predict whether a Megaline user would switch to the Ultra or Smart plan, based on their monthly call, message, and internet usage. After performing an initial data exploration, I split the dataset into training (60%), validation (20%), and test (20%) sets. I trained a Random Forest Classifier and used GridSearchCV to tune hyperparameters such as `n_estimators`, `max_depth`, and `min_samples_split`. The final model achieved a test accuracy of **82.1%**, exceeding the required threshold of 75%. The model performed especially well on users who stayed on the Smart plan, while classification of Ultra plan users showed room for improvement.