# Megaline

### Description
Mobile company Megaline is not happy to see that many of its customers are using legacy plans. They want to develop a model that can analyze customer behavior and recommend one of Megaline's new plans: Smart or Ultra.

You have access to behavioral data for subscribers who have already switched to the new plans (from the Statistical Data Analysis course project). For this classification task you must create a model that chooses the correct plan. Since you have already done the step of processing the data, you can jump right into creating the model.

Develop a model with the greatest possible accuracy. In this project, the accuracy threshold is 0.75. Use the dataset to check accuracy.

In [1]:
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [13]:
# Load the CSV file into a DataFrame
df = pd.read_csv("datasets/users_behavior.csv")

In [11]:
df.head()
#df.info()

#df.describe()

In [15]:
features = df.drop(['mb_used'], axis=1)
target = df['mb_used']

# Segment the data into training and validation sets
features_train, features_valid, target_train, target_valid = train_test_split(
    features, target, test_size=0.25, random_state=12345)

# Split training set into training and test set
features_train, features_test, target_train, target_test = train_test_split(
    features_train, target_train, test_size=0.33, random_state=12345)

# Check the resulting sets
print("Training set size:", features_train.shape)
print("Validation set size:", features_valid.shape)
print("Test set size:", features_test.shape)

In [35]:
best_model = None
best_result = 0
best_depth = 0

#Hyperparameters
for depth in range(1, 6):
    model = DecisionTreeRegressor(max_depth=depth, random_state=12345) # initializes the model builder with the parameters random_state=12345 and max_depth=depth
    model.fit(features_train, target_train) # Train the model
    predictions_valid = model.predict(features_valid) # Get the predictions
    result = mean_squared_error(target_valid, predictions_valid)**0.5 # Evaluate the model


    if result > best_result:
        best_model = model
        best_result = result
        best_depth = depth


print(f"RECM = {best_result}")



# Investigate the quality of different models by changing the hyperparameters.

In [36]:
#Hyperparameters
for depth in range(3, 8):
    model = DecisionTreeRegressor(max_depth=depth, random_state=12345) # initializes the model builder with the parameters random_state=12345 and max_depth=depth
    model.fit(features_train, target_train) # Train the model
    predictions_valid = model.predict(features_valid) # Get the predictions
    result = mean_squared_error(target_valid, predictions_valid)**0.5 # Evaluate the model


    if result > best_result:
        best_model = model
        best_result = result
        best_depth = depth


print(f"RECM = {best_result}")


In [38]:
#Hyperparameters
for depth in range(1, 9):
    model = DecisionTreeRegressor(max_depth=depth, random_state=12345) # initializes the model builder with the parameters random_state=12345 and max_depth=depth
    model.fit(features_train, target_train) # Entrenar el modelo
    predictions_valid = model.predict(features_valid) # Get the predictions
    result = mean_squared_error(target_valid, predictions_valid)**0.5 # Evaluate the model


    if result > best_result:
        best_model = model
        best_result = result
        best_depth = depth


print(f"RECM = {best_result}")

# It seems that changing the hyperparameters does not actually alter the validation set

# Additional task: do a sanity test on the model.

In [41]:
worst_model = None
worst_result = 0
worst_depth = 0

for depth in range(1, 10):
    model = DecisionTreeRegressor(max_depth=depth, random_state=12345)
    model.fit(features_train, target_train)
    predictions_valid = model.predict(features_valid)
    result = mean_squared_error(target_valid, predictions_valid)**0.5
    
    # Compare and save the worst model
    if result > worst_result:
        worst_model = model
        worst_result = result
        worst_depth = depth

print(f"Worst model = {worst_depth}): {worst_result}")

print(f"Best model = {best_depth}): {best_result}")

# Conclusion
It seems that in this case, both the "worst sanity model" model performs as well as the "best sanity model"