# Smart or Ultra?
* Mobile operator Megaline is dissatisfied with the fact that many of its customers are using old plans. They want to develop a model that can analyze customer behavior and recommend one of Megaline's newer plans: Smart or Ultra.
* You have access to behavioral data from subscribers who have already switched to the new plans (from the Statistical Data Analysis course project). For this classification task, you need to develop a model that will choose the right plan. As you have already carried out the data pre-processing stage, you can go straight to creating the model.
* Develop a model with the highest possible accuracy. In this project, the limit for accuracy is 0.75. Check the accuracy using the test data set.


## Initialization & Visualization
* Load libraries and data visualization

In [1]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import joblib
from joblib import dump

In [2]:
df = pd.read_csv("C:\\Users\\Guilherme\\Downloads\\users_behavior.csv")
df

Unnamed: 0,calls,minutes,messages,mb_used,is_ultra
0,40.0,311.90,83.0,19915.42,0
1,85.0,516.75,56.0,22696.96,0
2,77.0,467.66,86.0,21060.45,0
3,106.0,745.53,81.0,8437.39,1
4,66.0,418.74,1.0,14502.75,0
...,...,...,...,...,...
3209,122.0,910.98,20.0,35124.90,1
3210,25.0,190.36,0.0,3275.61,0
3211,97.0,634.44,70.0,13974.06,0
3212,64.0,462.32,90.0,31239.78,0


## Feature and Target creation

In [3]:
features = df.drop(['is_ultra'], axis=1)
target = df['is_ultra']
print(features.shape)
print(target.shape)

(3214, 4)
(3214,)


## Creating Training models

In [4]:
model = DecisionTreeClassifier(random_state=12345)

model.fit(features, target)

## Division of Data into Training, Testing and Validation

In [5]:
df_train, df_meio = train_test_split(df, test_size=0.4, random_state=12345)

df_valid, df_test = train_test_split(df_meio, test_size=0.5, random_state=12345)

features_train = df_train.drop(['is_ultra'], axis=1)
target_train = df_train['is_ultra']

features_test = df_test.drop(['is_ultra'], axis=1)
target_test = df_test['is_ultra'] 

features_valid = df_valid.drop(['is_ultra'], axis=1)
target_valid = df_valid['is_ultra'] 


print(features_train.shape)
print(target_train.shape)
print(features_test.shape)
print(target_test.shape)
print(features_valid.shape)
print(target_valid.shape)

(1928, 4)
(1928,)
(643, 4)
(643,)
(643, 4)
(643,)


* All data sets with their respective values (training 60%, validation 20%, test 20%)

## Test DecisionTreeClassifier



In [6]:
best_model = None
best_result = 0
for depth in range(1, 11):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth) 
    model.fit(features_train,target_train) 
    predictions = model.predict(features_train) 
    result = accuracy_score(target_train, predictions) 
    if result > best_result:
        best_model = model
        best_result = result
        
print("Best model accuracy:", best_result)
print(depth)

Best model accuracy: 0.8890041493775933
10


In [7]:
best_model = None
best_result = 0
for depth in range(1, 11):
    model = DecisionTreeClassifier(random_state=12345, max_depth=depth) 
    model.fit(features_test,target_test) 
    predictions = model.predict(features_test) 
    result = accuracy_score(target_test, predictions) 
    if result > best_result:
        best_model = model
        best_result = result
        
print("Best model accuracy:", best_result)
print(depth)

Best model accuracy: 0.9097978227060654
10


* Training set and test set have similar results

In [8]:
for depth in range(1, 11):
        model = DecisionTreeClassifier(random_state=12345, max_depth=depth)
        model.fit(features_test,target_test)
        
        
        predictions_valid = model.predict(features_valid)

        print("max_depth =", depth, ": ", end='')
        print(accuracy_score(target_valid, predictions_valid))

max_depth = 1 : 0.7511664074650077
max_depth = 2 : 0.7869362363919129
max_depth = 3 : 0.7838258164852255
max_depth = 4 : 0.7869362363919129
max_depth = 5 : 0.76049766718507
max_depth = 6 : 0.7698289269051322
max_depth = 7 : 0.7636080870917574
max_depth = 8 : 0.7682737169517885
max_depth = 9 : 0.7620528771384136
max_depth = 10 : 0.7589424572317263


* The best DecisionTreeClassifier model for the validation set is max_depth= 2 or 4 with an accuracy of **78.69%**.

### Best Hyperparameter Visualization

In [9]:
finalmodelt= DecisionTreeClassifier(random_state=12345, max_depth=4)
finalmodelt.fit(features_test,target_test)

## RandomForestClassifier test

In [10]:
best_score = 0
best_est = 0
for est in range(1, 21): 
    model = RandomForestClassifier(random_state=12345, n_estimators=est) 
    model.fit(features_train,target_train) 
    score = model.score(features_valid,target_valid) 
    if score > best_score:
        best_score = score
        best_est = est

print("The accuracy of the best model in the validation set (n_estimators = {}): {}".format(best_est, best_score))

The accuracy of the best model in the validation set (n_estimators = 18): 0.7931570762052877


* The best RandomForestClassifier model for the validation set is with n_estimators=18 with an accuracy of **79.31%**.

### Best Hyperparameter Visualization

In [11]:
final_modelrt = RandomForestClassifier(random_state=54321, n_estimators=18) 
final_modelrt.fit(features_train, target_train)

## Teste LogisticRegression

In [12]:
model = LogisticRegression(random_state=12345, solver='liblinear')
model.fit(features_train, target_train) 
score_train = model.score(features_train,target_train) 
score_valid = model.score(features_valid, target_valid) 

print("Accuracy of the logistic regression model on the training set:", score_train)
print("Accuracy of the logistic regression model in the validation set:", score_valid)

Accuracy of the logistic regression model on the training set: 0.7479253112033195
Accuracy of the logistic regression model in the validation set: 0.7542768273716952


* The LogisticRegression model had an accuracy of **75.4%** in the validation set.

## Example of how to save models


* Using joblib I will save the best DecisionTreeClassifier model for the example.

In [18]:
dump(finalmodelt, 'finalmodelt.joblib') 

['finalmodelt.joblib']

In [19]:
modelo = joblib.load('finalmodelt.joblib')
modelo

## Conclusions

### Conclusion DecisionTreeClassifier

* This model had good accuracy with **78.69%**, although not the highest, but in this case it performed well. Normally this model has low accuracy and high speed, and the speed was confirmed.

### Conclusion RandomForestClassifier

* This model had the highest accuracy with **79.31%**. Normally this model has high accuracy at a low speed, which was confirmed in this project. 

### Conclusion LogisticRegression

* This model had the lowest accuracy with only **75.42%**. Normally this model has average accuracy, which was not proven in this project, but its speed is indeed high.

## General conclusion

* I believe that the model that stood out the most was RandomForestClassifier, as it lived up to expectations, with the highest accuracy of almost 80%. Its speed was in fact the lowest, but I don't think this is an obstacle, as its high accuracy makes it worthwhile. I also chose this model because of the method it works with, and I think it's ideal for this situation.