# Borrador de GridSearch (Clasificación)

## Pre-gridsearch: eligiendo que modelos usar

**Candidatos**
- Linear SVC (baseline)
- SVC (no lineal)
- KNeighbours
- RandomForestClassifier
- DecisionTreeClassifier
- MLP (red-neuronal de sklearn)

In [1]:
from preprocessing import train_and_evaluate_clf, custom_features
from sklearn.model_selection import train_test_split
import pandas as pd

import time
import math

def timeSince(since):
    now = time.time_ns()
    s = now - since
    return s*10**(-9)

In [2]:
df_train = pd.read_pickle('train.pickle')
df_train = custom_features(df_train)
X_train, X_eval, y_train, y_eval = train_test_split(df_train, df_train['rating'], test_size=0.3, random_state=0, stratify=df_train['rating'])

In [3]:
from sklearn.svm import LinearSVC

baseline = LinearSVC(random_state=0,max_iter=10000)
train_and_evaluate_clf(baseline,X_train,y_train,X_eval,y_eval)

Resultados clasificación LinearSVC
                 precision    recall  f1-score   support

          Mixed       0.30      0.31      0.30       497
Mostly Positive       0.26      0.23      0.24       512
       Negative       0.42      0.33      0.37       387
       Positive       0.33      0.44      0.38       610
  Very Positive       0.41      0.30      0.35       359

       accuracy                           0.33      2365
      macro avg       0.34      0.32      0.33      2365
   weighted avg       0.33      0.33      0.33      2365



In [4]:
from sklearn.svm import SVC
# from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier

clasificadores = [
    SVC(random_state=0),
    KNeighborsClassifier(),
    RandomForestClassifier(random_state=0),
    DecisionTreeClassifier(random_state=0),
    MLPClassifier(early_stopping =True,max_iter = 100, random_state=0)
]

In [5]:
for clf in clasificadores:
    start = time.time_ns()
    train_and_evaluate_clf(clf,X_train,y_train,X_eval,y_eval)
    print("Time elapsed for {} method: {} seconds\n".format(type(clf).__name__,timeSince(start)))

Resultados clasificación SVC
                 precision    recall  f1-score   support

          Mixed       0.33      0.28      0.30       497
Mostly Positive       0.27      0.15      0.20       512
       Negative       0.38      0.33      0.35       387
       Positive       0.32      0.65      0.43       610
  Very Positive       0.58      0.14      0.23       359

       accuracy                           0.34      2365
      macro avg       0.38      0.31      0.30      2365
   weighted avg       0.36      0.34      0.31      2365

Time elapsed for SVC method: 5.831981818 seconds

Resultados clasificación KNeighborsClassifier
                 precision    recall  f1-score   support

          Mixed       0.27      0.35      0.31       497
Mostly Positive       0.25      0.32      0.28       512
       Negative       0.36      0.22      0.28       387
       Positive       0.32      0.36      0.34       610
  Very Positive       0.36      0.14      0.20       359

       accuracy

# Borrador de GridSearch (regresión)

**Candidatos**:
- Lasso
- ElasticNet
- Ridge
- SVR Lineal
- SVR polinomial
- SVR RBF
- Bagging
- DecisionTree
- RandomForest
- GradientBoosting
- ExtraTreesRegressor
- AdaBoostRegressor
- etc

In [2]:
from sklearn.svm import SVR
from sklearn.linear_model import Lasso, ElasticNet, Ridge
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor, GradientBoostingRegressor, RandomForestRegressor, ExtraTreesRegressor
from sklearn.ensemble import AdaBoostRegressor, HistGradientBoostingRegressor, VotingRegressor, StackingRegressor

In [3]:
regresores = [
    # Lasso(random_state=0),
    ElasticNet(random_state=0),
    Ridge(random_state=0),
    SVR(kernel='linear'),
    SVR(kernel='poly'),
    SVR(kernel='rbf'),
    KNeighborsRegressor(),
    DecisionTreeRegressor(random_state=0),
    BaggingRegressor(random_state=0),
    GradientBoostingRegressor(random_state=0),
    RandomForestRegressor(random_state=0),
    ExtraTreesRegressor(random_state=0),
    AdaBoostRegressor(random_state=0),
    HistGradientBoostingRegressor(random_state=0),
    # VotingRegressor(estimators=[])
    # StackingRegressor(estimators=[])
]

In [4]:
df_train = pd.read_pickle('train.pickle')
df_train = custom_features(df_train)
X_train, X_eval, y_train, y_eval = train_test_split(df_train, df_train['estimated_sells'], test_size=0.3, random_state=0)

In [5]:
from preprocessing import train_and_evaluate_reg

for clf in regresores:
    start = time.time_ns()
    train_and_evaluate_reg(clf,X_train,y_train,X_eval,y_eval)
    print("Time elapsed for {} method: {} seconds\n".format(type(clf).__name__,timeSince(start)))

Resultados regresión ElasticNet


  f = msb / msw


Error cuadrático medio = 1671862250468.4912
Score R2 = 0.06743416144268821
Time elapsed for ElasticNet method: 3.20046093 seconds

Resultados regresión Ridge


  f = msb / msw


Error cuadrático medio = 1671714745834.077
Score R2 = 0.0675164396226311
Time elapsed for Ridge method: 2.1293839510000003 seconds

Resultados regresión SVR


  f = msb / msw


Error cuadrático medio = 1826399676596.286
Score R2 = -0.018766914241011623
Time elapsed for SVR method: 7.227484970000001 seconds

Resultados regresión SVR


  f = msb / msw


Error cuadrático medio = 1828406595677.1208
Score R2 = -0.01988637499503687
Time elapsed for SVR method: 7.291538781000001 seconds

Resultados regresión SVR


  f = msb / msw


Error cuadrático medio = 1828419560737.7725
Score R2 = -0.01989360691420683
Time elapsed for SVR method: 7.567372447 seconds

Resultados regresión KNeighborsRegressor


  f = msb / msw


Error cuadrático medio = 1618006553204.5938
Score R2 = 0.09747490401338832
Time elapsed for KNeighborsRegressor method: 2.641175048 seconds

Resultados regresión DecisionTreeRegressor


  f = msb / msw


Error cuadrático medio = 4151985370512.4297
Score R2 = -1.3159801099908788
Time elapsed for DecisionTreeRegressor method: 3.583755426 seconds

Resultados regresión BaggingRegressor


  f = msb / msw


Error cuadrático medio = 849260002373.9049
Score R2 = 0.5262822244804723
Time elapsed for BaggingRegressor method: 10.100534665000001 seconds

Resultados regresión GradientBoostingRegressor


  f = msb / msw


Error cuadrático medio = 873959181886.295
Score R2 = 0.5125050062633643
Time elapsed for GradientBoostingRegressor method: 3.123649127 seconds

Resultados regresión RandomForestRegressor


  f = msb / msw


Error cuadrático medio = 848514036795.0035
Score R2 = 0.5266983245601458
Time elapsed for RandomForestRegressor method: 78.73350649700001 seconds

Resultados regresión ExtraTreesRegressor


  f = msb / msw


Error cuadrático medio = 3731250678040.489
Score R2 = -1.081294027937588
Time elapsed for ExtraTreesRegressor method: 120.09851663100001 seconds

Resultados regresión AdaBoostRegressor


  f = msb / msw


Error cuadrático medio = 4798858708829.279
Score R2 = -1.6768064741358777
Time elapsed for AdaBoostRegressor method: 2.9039158520000004 seconds

Resultados regresión HistGradientBoostingRegressor


  f = msb / msw


TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.