# Behaviour analysis

## Table of Contents

1. [Introduction](#introduction)
2. [Objectives](#objectives)
3. [Import libraries](#import-libraries)
4. [Read the dataset](#read-the-dataset)
5. [Define features and target](#define-features-and-target)
6. [Split the data](#split-the-data)
7. [Train the models](#train-the-models)
   - [Training Model](#training-model)
8. [Validation](#validation)
9. [Evaluate the model](#evaluate-the-model)
10. [Hyperparameter Tuning](#hyperparameter-tuning)
11. [New Model (Random Forest)](#new-model-random-forest)
12. [Conclusions](#conclusions)


## Introduction


In this project, we will develop various models to analyze customer behavior and recommend one of Megaline's new plans: Smart or Ultra. Only models with an accuracy greater than or equal to 0.75 will be considered.

## Objectives

- Develop different models for decision-making.

- Select and analyze the model with the highest possible accuracy.

## Import libraries

In [2]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import set_config
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier


## Read the dataset

In [3]:
# Read the dataset
df = pd.read_csv("users_behavior.csv")

## Define features and target

In [5]:
# Create a new column
df['ideal_plan'] = 0

# Define features and target
features = df.drop(['is_ultra', 'mb_used', 'messages', 'minutes', 'ideal_plan'], axis=1)
target = df['is_ultra']  # Changed to 'is_ultra' to have a meaningful target

## Split the data

In [6]:
# Split the data into training and temporary set
X_train, X_temp, y_train, y_temp = train_test_split(features, target, test_size=0.25, random_state=12345)

# Split the temporary set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.25, random_state=12345)

## Train the models

### Training Model

In [7]:
# Training model
model = DecisionTreeClassifier(random_state=12345)
model.fit(features, target)

## Validation

In [8]:
# Validate the model on the validation set
val_predictions = model.predict(X_val)
val_accuracy = accuracy_score(y_val, val_predictions)
print(f'Validation Accuracy: {val_accuracy:.2f}')

Validation Accuracy: 0.76


## Evaluate the model

In [9]:
# Evaluate the model on the test set
test_predictions = model.predict(X_test)
test_accuracy = accuracy_score(y_test, test_predictions)
print(f'Test Accuracy: {test_accuracy:.2f}')

Test Accuracy: 0.79


## Hyperparameter Tuning

In [10]:
for depth in range(1, 6):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth)
    model.fit(X_train, y_train)
    predictions_val = model.predict(X_val)
    accuracy_val = accuracy_score(y_val, predictions_val)
    print(f'max_depth = {depth}: Validation Accuracy = {accuracy_val:.2f}')

    # Evaluar en el conjunto de prueba
    predictions_test = model.predict(X_test)
    accuracy_test = accuracy_score(y_test, predictions_test)
    print(f'max_depth = {depth}: Test Accuracy = {accuracy_test:.2f}')

max_depth = 1: Validation Accuracy = 0.75
max_depth = 1: Test Accuracy = 0.77
max_depth = 2: Validation Accuracy = 0.76
max_depth = 2: Test Accuracy = 0.79
max_depth = 3: Validation Accuracy = 0.76
max_depth = 3: Test Accuracy = 0.79
max_depth = 4: Validation Accuracy = 0.75
max_depth = 4: Test Accuracy = 0.79
max_depth = 5: Validation Accuracy = 0.75
max_depth = 5: Test Accuracy = 0.79


## New Model (Random Forest)

In [11]:
# Initialize variables to store the best score and the best number of estimators
best_score = 0
best_est = 0

# Loop to evaluate different values of n_estimators
for est in range(1, 11):  # Range from 1 to 10
    model = RandomForestClassifier(random_state=54321, n_estimators=est)  # Set the number of trees to the current value
    model.fit(X_train, y_train)  # Train the model on the training set
    score = model.score(X_val, y_val)  # Calculate the accuracy score on the validation set
    if score > best_score:
        best_score = score  # Save the best accuracy score
        best_est = est  # Save the number of estimators that correspond to the best accuracy score

print("The accuracy of the best model on the validation set (n_estimators = {}): {:.2f}".format(best_est, best_score))

The accuracy of the best model on the validation set (n_estimators = 10): 0.75


In [12]:
# Train the final model with the optimal number of estimators using the entire training set
final_model = RandomForestClassifier(random_state=54321, n_estimators=best_est)
final_model.fit(X_train, y_train)

# Make predictions on the test set
test_predictions = final_model.predict(X_test)

# Calculate the accuracy on the test set
test_accuracy = accuracy_score(y_test, test_predictions)
print(f'Accuracy of the model on the test set: {test_accuracy:.2f}')

Accuracy of the model on the test set: 0.78


Although it is not a significant improvement, we can use our random forest model as the final result, as it analyzes more trees to make decisions.

## Conclusions

Our project showed that all the implemented models met the minimum acceptance criteria, validated with the test sets, although additional models were created.