# Prueba Técnica Arkon

Autor: Juan Carlos Hernández Rangel<br>
Desarrollo: 30-noviembre-2022

## Problema

En Los Ángeles existe un sistema compartido de bicicletas que brinda datos anónimos acerca
del uso del servicio. La tabla que se proporciona contiene el histórico de viajes que se han
realizado desde 2016 y contiene una columna que es de particular interés y que se buscará
analizar a más profundidad: Passholder_type. 

## Metodología

### Importar librerías

In [18]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import yaml
import joblib

from feature_engine import transformation as vt
from scipy.stats import zscore
from scipy.stats import skew, kurtosis

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import classification_report, confusion_matrix, f1_score
from sklearn.model_selection import train_test_split, KFold
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

from xgboost import XGBClassifier

### Importar conjunto de datos

In [6]:
path_entrenamiento = '../Data/Train_Data_Clean.csv'
dataFrame = pd.read_csv(path_entrenamiento, low_memory=False)

In [7]:
with open(r'../Modelo/config.yml') as file:
    val = yaml.load(file, Loader=yaml.FullLoader)

### Elección de Modelos

In [8]:
X = dataFrame.drop(val["variable_dependiente"], axis=1)
y = dataFrame[val["variable_dependiente"]]

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)

#### Decision Tree

In [10]:
modelo_DT = DecisionTreeClassifier()
modelo_DT.fit(X_train, y_train)
pred_DT = modelo_DT.predict(X_test)
print(classification_report(y_test, pred_DT))

              precision    recall  f1-score   support

           0       0.59      0.60      0.60     41030
           1       0.76      0.74      0.75     77456
           2       0.29      0.32      0.30      7793
           3       0.40      0.42      0.41      6531
           4       0.17      0.19      0.18      2339

    accuracy                           0.65    135149
   macro avg       0.44      0.45      0.45    135149
weighted avg       0.66      0.65      0.65    135149



#### K-Nearest Neighbors

In [11]:
modelo_KNN = KNeighborsClassifier()
modelo_KNN.fit(X_train, y_train)
pred_KNN = modelo_KNN.predict(X_test)
print(classification_report(y_test, pred_KNN))

              precision    recall  f1-score   support

           0       0.53      0.51      0.52     41030
           1       0.67      0.81      0.73     77456
           2       0.27      0.05      0.08      7793
           3       0.18      0.03      0.05      6531
           4       0.08      0.00      0.01      2339

    accuracy                           0.62    135149
   macro avg       0.34      0.28      0.28    135149
weighted avg       0.57      0.62      0.58    135149



#### Random Forest

In [12]:
modelo_RF = RandomForestClassifier()
modelo_RF.fit(X_train, y_train)
pred_RF = modelo_RF.predict(X_test)
print(classification_report(y_test, pred_RF))

              precision    recall  f1-score   support

           0       0.73      0.67      0.70     41030
           1       0.77      0.91      0.84     77456
           2       0.69      0.25      0.37      7793
           3       0.81      0.32      0.46      6531
           4       0.74      0.11      0.18      2339

    accuracy                           0.76    135149
   macro avg       0.75      0.45      0.51    135149
weighted avg       0.75      0.76      0.74    135149



#### Ada Boost

In [19]:
modelo_AB = AdaBoostClassifier()
modelo_AB.fit(X_train, y_train)
pred_AB = modelo_AB.predict(X_test)
print(classification_report(y_test, pred_AB))

              precision    recall  f1-score   support

           0       0.63      0.51      0.56     41030
           1       0.70      0.71      0.71     77456
           2       0.33      0.00      0.00      7793
           3       0.11      0.27      0.16      6531
           4       0.06      0.18      0.09      2339

    accuracy                           0.58    135149
   macro avg       0.37      0.34      0.30    135149
weighted avg       0.62      0.58      0.59    135149



#### XGBClassifier

In [14]:
modelo_XGB = XGBClassifier(objective='multi:softmax')
modelo_XGB.fit(X_train, y_train)
pred_XGB = modelo_XGB.predict(X_test)
print(classification_report(y_test, pred_XGB))



              precision    recall  f1-score   support

           0       0.69      0.60      0.64     41030
           1       0.73      0.91      0.81     77456
           2       0.49      0.06      0.11      7793
           3       0.74      0.17      0.27      6531
           4       0.78      0.02      0.04      2339

    accuracy                           0.72    135149
   macro avg       0.69      0.35      0.38    135149
weighted avg       0.71      0.72      0.68    135149



#### Comparaciones

In [20]:
joblib.dump(modelo_RF, '../Modelo/modelo_random_forest.joblib') 

['../Modelo/modelo_random_forest.joblib']