# Proyecto: El Desafío de Don Rene

**MDS7202: Laboratorio de Programación Científica para Ciencia de Datos**

### Cuerpo Docente:

- Profesores: Matías Rojas - Mauricio Araneda
- Auxiliar: Ignacio Meza D.
- Ayudante: Rodrigo Guerra

*Por favor, lean detalladamente las instrucciones de la tarea antes de empezar a escribir.*




# Proyecto

### Equipo:

- Nicolás Cabello
- Esteban Muñoz

- Usuario CodaLab: NicoCabello

- Equipo CodaLab: Never Gonna Give You Up

### Link de repositorio de GitHub: `https://github.com/NicoCabello/Laboratorio_Progra_Cientifica`




## 1. Introducción

El objetivo de este proyecto consiste en generar dos modelos de predicción aplicados a datos de videojuegos.
El primer modelo es aplicado a un problema de clasificación del *rating* (evaluación) de un juego, con las posibilidades de ser `('Negative', 'Mixed', 'Mostly Positive', 'Positive', 'Very Positive')`. El segundo modelo se aplica a un problemas de regresión de la cantidad de *ventas* que alcance un juego.

Las métricas a utilizar para evaluar la calidad de los modelos son `f1_weighted` para la clasificación y `r_2` para la regresión.
La métrica `f1` es una media armónica entre las métricas `precision` y `recall`, reduciendo el desempeño si los valores de ambas métricas son muy diferentes, pero además considera un peso promedio por cada clase mejorando el problema del desbalance en las clases.
Es muy útil para el problema de clasificación ya que considera el desbalance que puede haber en las clases, además de combinar de manera armónica dos métricas comunmente utilizadas.

En cuanto a la métrica `r_2`, representa la proporción de varianza de los valores predichos por el modelo con respecto a la los target reales.
Es útil para regresiones indicando cuanto se acercan los valores predichos de los valores reales, por lo que es una métrica acorde para el problema de regresión en la ventas.


Los datos que proveen es un dataset con 7881 ejemplos que describen una observación de la información de videojuegos, tales como su nombre, los developers y publishers, la descripción de 
Son 16 atributos y la variables objetivos son de la categoría a la que pertenece, además de la edad mínima, plataformas para la que está disponible, entre otros datos.
Las variables de interés son del tipo numérico para el caso de *ventas* y *ordinal* para el caso del *rating* (datos categóricos con orden).

---
## 2. Análisis Exploratorio de Datos

In [1]:
def mount_drive(path):
    try:
        from google.colab import drive

        drive.mount("/content/drive")
        %cd {path}
    except: 
        print('Ignorando conexión drive-colab')

In [2]:
mount_drive("/content/drive/My Drive/MDS7202/Proyecto")

Mounted at /content/drive
/content/drive/My Drive/MDS7202/Proyecto


In [3]:
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline

from sklearn.model_selection import train_test_split 

# Pre-procesamiento
from sklearn.feature_selection import SelectPercentile, f_classif
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction.text import CountVectorizer

# Clasificación
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

#Regresión
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression

# Metricas de evaluación
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, median_absolute_error
from sklearn.metrics import classification_report
from sklearn.metrics import f1_score

# Librería para plotear
!pip install --upgrade plotly
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Librería para NLP
!pip install nltk
import nltk
from nltk import word_tokenize  
from nltk.stem import PorterStemmer
nltk.download('punkt')

# Grilla
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV

# Descarga de figuras
!pip install -U kaleido

from collections import Counter


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting plotly
  Downloading plotly-5.11.0-py2.py3-none-any.whl (15.3 MB)
[K     |████████████████████████████████| 15.3 MB 14.6 MB/s 
Installing collected packages: plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 5.5.0
    Uninstalling plotly-5.5.0:
      Successfully uninstalled plotly-5.5.0
Successfully installed plotly-5.11.0


In [336]:
df_train = pd.read_pickle("train.pickle")
df_test = pd.read_pickle("test.pickle")

In [337]:
df_train.head()

Unnamed: 0,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,tags,achievements,average_playtime,price,short_description,estimated_sells,rating
0,An Aspie Life,2018-03-29,1,Bradley Hennessey;Joe Watson,EnderLost Studios,windows,0,Single-player;Steam Achievements,Adventure;Casual;Free to Play;Indie;Simulation,Free to Play;Adventure;Indie,23,0,0.0,One day your roommate Leaves for no reason. Yo...,3914,Mixed
1,GhostControl Inc.,2014-06-06,1,bumblebee,Application Systems Heidelberg,windows;mac;linux,0,Single-player;Steam Achievements;Steam Trading...,Casual;Indie;Simulation;Strategy,Turn-Based;Indie;Simulation,53,65,10.99,Manage a team of ghosthunters and free London ...,10728,Mixed
2,Deponia,2012-08-06,1,Daedalic Entertainment,Daedalic Entertainment,windows;mac;linux,0,Single-player;Steam Achievements;Steam Trading...,Adventure;Indie,Adventure;Point & Click;Comedy,19,217,6.99,"In Deponia, the world has degenerated into a v...",635792,Positive
3,Atlas Reactor,2016-10-04,1,Trion Worlds,Trion Worlds,windows,0,Multi-player;Online Multi-Player;Steam Achieve...,Free to Play;Strategy,Free to Play;Multiplayer;Strategy,121,1240,0.0,SEASON 6 NOW LIVE! The battle for Atlas contin...,253864,Positive
4,CHUCHEL,2018-03-07,1,Amanita Design,Amanita Design,windows;mac,0,Single-player;Steam Achievements;Steam Trading...,Adventure;Casual;Indie,Adventure;Indie;Casual,7,245,7.99,CHUCHEL is a comedy adventure game from the cr...,49818,Mostly Positive


Al visualizar los datos de cada columna, es fácil percatarse que algunas variables de texto cuentan con el caracter ";". La siguiente celda muestra algunos ejemplos de ello.

In [338]:
df_train[["developer", "platforms", "genres", "tags"]].head()

Unnamed: 0,developer,platforms,genres,tags
0,Bradley Hennessey;Joe Watson,windows,Adventure;Casual;Free to Play;Indie;Simulation,Free to Play;Adventure;Indie
1,bumblebee,windows;mac;linux,Casual;Indie;Simulation;Strategy,Turn-Based;Indie;Simulation
2,Daedalic Entertainment,windows;mac;linux,Adventure;Indie,Adventure;Point & Click;Comedy
3,Trion Worlds,windows,Free to Play;Strategy,Free to Play;Multiplayer;Strategy
4,Amanita Design,windows;mac,Adventure;Casual;Indie,Adventure;Indie;Casual


Las variables que cuentan con este formato son variables categóricas, de modo que se permite que cada juego posea más de una categoría por cada una de estas columnas. Esto quiere decir que no se puede simplemente utilizar un encoding para estas variables y se debe realizar un procesamiento anterior, o utilizar alguna técnica de NLP que se encargue de recoger la mayor cantidad de información de estas categorías.

In [339]:
print("Existencia de juegos duplicados en el dataset:", df_train.duplicated(subset=["name"]).any())
print("Dimensiones del dataset: ", df_train.shape, end="\n\n")

missing = df_train.isna().sum()
unique = df_train.nunique()
dtype = df_train.dtypes

pd.DataFrame({"Missing": missing, "Unique": unique, "Dtype": dtype})

Existencia de juegos duplicados en el dataset: False
Dimensiones del dataset:  (7881, 16)



Unnamed: 0,Missing,Unique,Dtype
name,0,7881,object
release_date,0,2251,object
english,0,2,int64
developer,0,5365,object
publisher,0,3992,object
platforms,0,5,object
required_age,0,6,int64
categories,0,1933,object
genres,0,844,object
tags,0,3981,object


No existen datos duplicados ni tampoco hay valores faltantes en el dataset. También se puede notar que, como es de esperar, la mayoría de las variables de tipo texto poseen una alta cardinalidad. Aún así, existen algunas columnas de tipo texto con baja cardinalidad, por lo que es factible utilizar algún tipo de encoding sobre estas variables.

El dataset proporcionado para el entrenamiento contiene 7881 muestras y 16 columnas. De acuerdo a la información proporcionada 9 de estas columnas poseen datos de tipo string, 5 columnas contienen datos enteros, 1 columna consiste en datos de tipo float y una columna guarda datos categóricos.

La columna *release_date* contiene en realidad datos de tipo Date, mientras que la columa *english* posee datos binarios nominales. Además, se debe pensar en estrategias para el encoding de variables tipo texto como *categories* o *genres*, pues los valores de estas variables corresponden a colecciones de datos categóricos. Por otra parte, los valores de la columna *rating* contienen un orden natural, por lo que estricatemente hablando se tratan de datos ordinales.

- **Columnas de tipo texto**: name, developer, publisher, platforms, categories, genres, tags, short_description.
- **Columnas de tipo entero**: required_age, achievements, average_playtime, estimated_sells
- **Columnas de tipo Date**: release_date.
- **Columnas de tipo nominal**: english.
- **Columnas de tipo float**: price.
- **Columnas de tipo ordinal**: rating.

Sabiendo lo anterior, se procede a transformar las variables english y release_date a sus respectivos tipos de datos.

In [340]:
df_train = df_train.astype({
    "release_date": "datetime64",
    "english": "category"
    })

df_test = df_test.astype({
    "release_date": "datetime64",
    "english": "category"
    })

Uno de los valores importantes a considerar corresponde a la correlación. Por lo mismo, se procede a calcular la matriz de correlación entre las variables numéricas del dataset.

In [16]:
df_corr = df_train.corr()
fig = px.imshow(
    df_corr,
    text_auto=".2f",
    title="Matriz de correlación",
    color_continuous_scale=px.colors.sequential.Blues
)
fig.show()

# fig.write_image("corr_matrix.png")

Se observa que las variables numéricas no presentan correlaciones significativas entre sí. Por un lado, es bueno considerando que no se tiene el problema de sesgar el modelo hacia una característica en particular al querer incluir 2 o más features, por lo que no se requiere de la utilización de herramientas como PCA. Por otra parte, la variable *estimated_sells* es una de las variables que se quiere predecir, por lo que sería ideal que alguna de las variables numéricas tenga alguna correlación con esta variable, sin embargo esto no ocurre.

Para conocer un poco mejor la distribución de las variables numéricas, se procede a obtener algunas de sus estadísticas.

In [20]:
df_train.describe()

Unnamed: 0,required_age,achievements,average_playtime,price,estimated_sells
count,7881.0,7881.0,7881.0,7881.0,7881.0
mean,0.78924,43.170156,439.29679,8.431342,210576.7
std,3.55538,265.399206,3303.162083,8.755668,1513926.0
min,0.0,0.0,0.0,0.0,3600.0
25%,0.0,0.0,0.0,1.99,9724.0
50%,0.0,15.0,27.0,6.99,21508.0
75%,0.0,35.0,251.0,11.39,73573.0
max,18.0,9821.0,190625.0,78.99,79441290.0


De la información desplegada en la celda anterior, se puede extraer que la variable *required_age* comprende valores entre 0 y 18. Es natural pensar que se trata entonces de una variable ordinal, en donde existan una acotada cantidad de valores dependiendo de la clasificación del videojuego. La variable *achievements* se concentra en valores menores a 35, siendo al menos un 25% de las muestras del dataset juegos sin logros; en cuánto al máximo, se puede notar que existe un outlier que se escapa de la distribución, por lo que al momento de revisar esta vafiable se debe chequear si este dato consiste efectivamente en un outlier o en un dato mal inputado.

La variable *average_playtime* cuenta con un valo mínimo igual a 0, por lo que se infiere que esta variable se encuentra en horas. La mitad de los juegos cuenta con un playtime promedio de 27 horas o menos, aunque hay un 25% de ellos que superan las 251 horas de juego. El valor máximo se escapa completamente del resto de los valores, pero en este caso no hay como verificar si el dato es correcto o no.

El precio de los juegos cuenta con una gran cantidad de videojuegos a bajo costo. La mitad de ellos son gratuitos o cuestan menos de 7 dólares, indicando una gran concentración de valores en un intervalo pequeño. El tercel cuartil se encuentra en 11.39 dólares mientras que el máximo precio es de 79 dólares, aunque por conocimiento previo se sabe que es totalmente factible que hayan juegos por ese precio.

Finalmente, la variable a predecir *estimated_sells* oscila entre valores del orden de $10^3$ y $10^7$, indicando que existe una gran diferencia entre los juegos éxito de ventas de los que no lo son, lo que supone un problema difícil al momento de entrenar un regresor.

In [352]:
df_train.loc[df_train["achievements"] == df_train.achievements.max()]

Unnamed: 0,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,tags,achievements,average_playtime,price,short_description,estimated_sells,rating,release_month
3503,LOGistICAL,2017-02-15,1,Sacada,Sacada,windows,0,Single-player;Steam Achievements;Steam Trading...,Casual;Indie;Strategy,Casual;Strategy;Indie,9821,0,6.99,LOGistICAL is a strategy puzzle game where you...,6880,Mostly Positive,2


El juego que cuenta con la mayor cantidad de logros se llama LOGistICAL. Al revisar este juego el sitio de Steam, se pudo comprobar que efectivamente el juego cuenta con esa cantidad de logros, por lo que no se trata de un dato mal inputado. Esto quiere decir entonces que es totalmente factible que algunos juegos cuenten con cantidades superiores a 1000 logros. 

In [21]:
df_train.describe(include=["O", "category"])

Unnamed: 0,name,english,developer,publisher,platforms,categories,genres,tags,short_description,rating
count,7881,7881,7881,7881,7881,7881,7881,7881,7881,7881
unique,7881,2,5365,3992,5,1933,844,3981,7848,5
top,An Aspie Life,1,"KOEI TECMO GAMES CO., LTD.",Ubisoft,windows,Single-player,Action;Indie,Action;Adventure;Indie,Minimal physical puzzle with explosions,Positive
freq,1,7769,32,94,4589,756,505,68,11,2031


La variables de tipo texto y/o categóricas presentan en genral una alta cardinalidad. Esto es de esperar, pues como se comentó previamente existen columnas con variables categóricas que admiten una o más categorias. En el caso de la variable *english*, se observa que la mayoría de los videojuegos están disponibles en inglés y muy pocos no lo están.

Las plataformas consisten únicamente en los 3 principales sistemas operativos de computadores, por lo que la cardinalida de esta variable es baja y permite un encoding más fácil.

Para el análisis posterior de algunas variables de tipo texto, se ha implementado una función que permite codificar sus valores en columnas tipo One Hot Encoding. Esto será útil posteriormente para analizar individualmente cada uno de los valores y su relación con las variables objetivo.

### Predicción de ventas

Se ha planteado la hipótesis sobre la utilidad de obtener el mes en que fue estrenado un videojuego, pues puede ocurrir que existan fechas festivas en las que la gente suela comprar o regalar más videojuegos.

In [343]:
df_train["release_month"] = df_train.release_date.dt.month
df_test["release_month"] = df_test.release_date.dt.month

In [None]:
fig = px.histogram(df_train, x="release_month", y="estimated_sells", title="Ventas estimadas por mes")
fig.update_layout(bargap=0.2)
fig.show()
fig.write_image("ventas_mes.png")

Del gráfico anterior se observa que existe una diferencia lo suficientemente grande entre las ventas por mes, por lo que se ha decidido utilizar esta variable para el entrenamiento.

In [None]:
fig = px.box(df_train, y="estimated_sells", color="english", log_y=True, title="estimated_sells según disponibilidad de inglés")
fig.show()
fig.write_image("english_sells.png")

Existe una diferencia entre las ventas estimadas de juegos con idioma inglés y sin inglés, por lo que podría ser útil utilizarla para el regresor.

In [354]:
fig = px.box(df_train, y="estimated_sells", color="platforms", log_y=True)
fig.show()

Las estadísticas para las ventas de juego de acuerdo a su disponibilidad en plataformas entregan una ligera diferencia. Si bien es cierto no es tan grande, es posible que la plataforma en que se encuentra un videojuego sea un buen indicador para estimar sus ventas.

In [None]:
fig = px.scatter(df_train, y="estimated_sells", x="required_age", log_y=True)
fig.show()

Se puede observar que existe una diferencia entre las ventas estimadas de acuerdo a la edad mínima. Aunque la diferencia entre las ventas de los juegos sin restricción de edad s¿y los que exigen +18 no es apreciable, se cree que al utilizar esta variable junto a otras puede ayudar a estimar la cantidad de ventas.

In [None]:
from plotly.io import write_image
fig = px.scatter(df_train, y="estimated_sells", x="achievements", log_y=True, log_x=True, title="Scatter plot: estimated_sells vs achievements")
fig.show()
fig.write_image("achievements_sells.png")

In [None]:
fig = px.scatter(df_train, y="estimated_sells", x="average_playtime", log_y=True, log_x=True, title ="Scatter plot: estimated_sells vs average_playtime")
fig.show()
fig.write_image("avg_playtime_sells.png")

Las 2 últimas variables numéricas muestran una dispersión en lugar de algún tipo de relación con la cantidad de ventas estimadas. Se cree que estas variables no serán de utilidad para este problema.

In [None]:
fig = px.scatter(df_train, y="estimated_sells", x="price", log_y=True)

fig.show()

Si bien es cierto las ventas estimadas no tienen un comportamiento tan heterogéneo de acuerdo al precio. Sin embargo, si se puede notar para algunos precios hay una distribución distinta de ventas.

### Predicción de críticas

In [None]:
fig = px.histogram(df_train, x="rating", color="required_age", barmode="group", title="Cantidad de ratings por cada required_age")
fig.show()
fig.write_image("age_rating.png")

In [None]:
fig = px.box(df_train, y="achievements", color="rating", log_y=True, title="Distribución de achievements según rating")
fig.show()
fig.write_image("achievements_rating.png")

In [None]:
fig = px.box(df_train, y="average_playtime", color="rating", log_y=True, title="Distribución de average_playtime según rating")
fig.show()
fig.write_image("avg_playtime_rating.png")

In [None]:
fig = px.box(df_train, y="price", color="rating", title="Distribución de price según rating")
fig.show()
fig.write_image("price_rating.png")

---

## 3. Preparación de Datos

Para preparar nuestros datos, primero copiamos el dataframe original para modificarlo.
Se extrae la columna de mes de las fechas.
Se crea la columna `developer_list` con la cual se trabaja y se obtienen los mejores `N` deveolpers del conjunto, mientras que los demás son catalogados como `Other`.
Esto se realiza con el fin de mantener solo a los considerados previamente como relevantes.

Se realiza un procedimiento similar para los publishers creando la columna `publisher_list` que finalmente contendrá solo los mejores `publishers`.

In [None]:
def bestN( df, col_name, n):
  '''
  Obtiene los mejores N de una columna.
  '''
  all = []
  for row in df_train2[col_name]:
    all.extend( row )

  count = Counter(all)

  ordered_count = list(sorted(count.items(), key=lambda item: item[1], reverse=True))
  # top 20
  unique = len( ordered_count )
  topN = ordered_count[:n]

  topNnames = [el[0] for el in topN]

  print(f'Hay {unique} developers diferentes')
  print(f'Los {n} registros de {col_name} con más juegos son {topN}')

  return topNnames

In [None]:
def count_best(list_in,list_best_N):
  '''
  Recibe una lista con los datos de una fila del dataframe, y una lista con los
  los top N. Retorna los primeros elementos de la lista que se encuentren
  en la segunda.
  '''
  c = 0
  best_elements = []
  for el in list_in:
    if el in list_best_N: 
      best_elements.append(el)
      c+=1
  return best_elements

In [None]:
df_train2 = df_train.copy()
df_train2["developer_list"] = df_train2.developer.apply(lambda x: [dev for dev in x.split(';')])

# preparacion para datos de fechas
df_train2["month"] = df_train2.release_date.dt.month.astype("category")

In [None]:
# Agrupacion de casos especiales
df_train2["developer_list"] = df_train2.developer_list.apply(lambda x: ["Ubisoft" if "Ubisoft" in dev else dev for dev in x])
df_train2["developer_list"] = df_train2.developer_list.apply(lambda x: ["Feral Interactive" if "Feral Interactive" in dev else dev for dev in x])
df_train2["developer_list"] = df_train2.developer_list.apply(lambda x: ["Capcom" if ("Capcom" in dev or "CAPCOM" in dev) else dev for dev in x])

# Evitando repeticiones de los casos especiales en string separado por ';
df_train2["developer_list"] = df_train2.developer_list.apply(lambda x: list(set(x)))  # une a los repetidos en un mismo registro
df_train2["developer"] = df_train2.developer_list.apply(lambda x: ';'.join(list(set(x))))

In [None]:
# Si en el registro esta uno de los mejores, absorbe al resto
n_devs = 10
topNdevs = bestN(df_train2, "developer_list", n=n_devs)
df_train2["developer_list"] = df_train2.developer_list.apply(lambda x: count_best(x, topNdevs) if len( count_best(x, topNdevs) ) >= 1 else ["Other"])

df_top_devs = df_train2.loc[df_train2.developer_list.apply(lambda x: any(dev in x for dev in topNdevs))]  # nuevo df para graficar los n mejores
df_train2["developer_list"] = df_train2.developer_list.apply(lambda x: x[0])  # selecciona solo el primero de los mejores

In [None]:
# modificacion de publisher
n_pub = 20
# separacion de publisher en una lista por fila
df_train2["publisher_list"] = df_train2.publisher.apply(lambda x: [pub for pub in x.split(';')])

# Obtiene el top N de publishers
topNpublisher = bestN(df_train2, "publisher_list", n_pub)
df_train2["publisher_list"] = df_train2.publisher_list.apply(lambda x: x[0])

# Agrupa los que no esten dentro de los mejores en categoria "Other"
df_train2["publisher_list"] = df_train2.publisher_list.apply(lambda x: x if x in topNpublisher else "Other")

In [None]:
features = df_train2.drop(columns=["estimated_sells", "rating", "short_description"])
target_sells = df_train2["estimated_sells"]
target_rating = df_train2["rating"]

In [None]:
# gráfico de los developers mas concurridos
fig = px.box(df_top_devs,
              y="estimated_sells",
              color="developer",
              title=f"Ventas estimadas para las {n_devs} developers más recurrentes",
              log_y=True)
fig.show()

In [None]:
# gráfico de los publishers mas concurridos
df_top_publisher = df_train2.loc[df_train2.publisher_list.apply(lambda x: any(dev in x for dev in topNpublisher))]

fig = px.box(df_top_publisher,
              y="estimated_sells",
              color="developer",
              title=f"Ventas estimadas para las {n_pub} publishers más recurrentes",
              log_y=True)
fig.show()

In [None]:
fig = px.histogram(df_train2, x="rating", color="developer_list", barmode="group", title="Cantidad de ratings por cada developer")
fig.show()

In [None]:
fig = px.histogram(df_train2, x="rating", color="publisher_list", barmode="group", title="Cantidad de ratings por cada publisher")
fig.show()

### Transformación de datos test para realizar una predicción.

Se replica el mismo procedimiento para test.

In [None]:
df_test2 = df_test.copy()

df_test2["developer_list"] = df_test2.developer.apply(lambda x: [dev for dev in x.split(';')])

# preparacion para datos de fechas
df_test2["month"] = df_test2.release_date.dt.month.astype("category")

In [None]:
# Agrupacion de casos especiales
df_test2["developer_list"] = df_test2.developer_list.apply(lambda x: ["Ubisoft" if "Ubisoft" in dev else dev for dev in x])
df_test2["developer_list"] = df_test2.developer_list.apply(lambda x: ["Feral Interactive" if "Feral Interactive" in dev else dev for dev in x])
df_test2["developer_list"] = df_test2.developer_list.apply(lambda x: ["Capcom" if ("Capcom" in dev or "CAPCOM" in dev) else dev for dev in x])

# Evitando repeticiones de los casos especiales en string separado por ';
df_test2["developer_list"] = df_test2.developer_list.apply(lambda x: list(set(x)))  # une a los repetidos en un mismo registro
df_test2["developer"] = df_test2.developer_list.apply(lambda x: ';'.join(list(set(x))))

In [None]:
# Si en el registro esta uno de los mejores, absorbe al resto
# Si tengo 2 o mas de los mejores N, se crean registros
n_devs = 10
topNdevs = bestN(df_test2, "developer_list", n=n_devs)
df_test2["developer_list"] = df_test2.developer_list.apply(lambda x: count_best(x, topNdevs) if len( count_best(x, topNdevs) ) >= 1 else ["Other"])

df_top_devs = df_test2.loc[df_test2.developer_list.apply(lambda x: any(dev in x for dev in topNdevs))]  # nuevo df para graficar los n mejores
df_test2["developer_list"] = df_test2.developer_list.apply(lambda x: x[0])  # selecciona solo el primero de los mejores

In [None]:
# modificacion de publisher
n_pub = 20
# separacion de publisher en una lista por fila
df_test2["publisher_list"] = df_test2.publisher.apply(lambda x: [pub for pub in x.split(';')])

# Obtiene el top N de publishers
topNpublisher = bestN(df_test2, "publisher_list", n_pub)
df_test2["publisher_list"] = df_test2.publisher_list.apply(lambda x: x[0])

# Agrupa los que no esten dentro de los mejores en categoria "Other"
df_test2["publisher_list"] = df_test2.publisher_list.apply(lambda x: x if x in topNpublisher else "Other")

---

## 4. Baseline

Consideramos una función de evaluación para la regresión.
Se trasnforman las columnas de la siguiente forma:
- MinMaxScaler: `required_age`.
- SantardardScaler: `achievements`, `average_playtime`, `price`.
- OrdinalEncoder: `month`.
- One Hot Encoder: `english`, `deveolper_list`, `publisher_list`, `platforms`.

El resto de las filas son ignoradas en este primer aproach.

In [None]:
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, median_absolute_error

# funcion de evaluacion para la regresion
def evaluate(y_test, y_pred):

    print('MSE:', mean_squared_error(y_test, y_pred), '\n')
    print('RMSE:', mean_squared_error(y_test, y_pred, squared=False))
    print('MAE:', mean_absolute_error(y_test, y_pred))
    print('MedAE:', median_absolute_error(y_test, y_pred), '\n')
    print('R²:', r2_score(y_test, y_pred))

In [None]:
# # separacion en train y test
from sklearn.model_selection import train_test_split

X_train_reg, X_test_reg, y_train_reg, y_test_reg  = train_test_split(
    features, target_sells, shuffle=True, test_size=0.3, random_state=33
)

X_train_clf, X_test_clf, y_train_clf, y_test_clf  = train_test_split(
    features, target_rating, shuffle=True, test_size=0.3, random_state=33
)

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder
from sklearn.feature_selection import SelectPercentile, f_classif
from sklearn.svm import SVC
from sklearn.metrics import classification_report
# transformando columnas

ct1 = ColumnTransformer(
    [
        (
            "MinMax",
            MinMaxScaler(),
            [
                "required_age",
            ],
        ),
        (
            "Standard",
            StandardScaler(),
            [
                "achievements",
                "average_playtime",
                "price",
            ]
        ),
     
        ("Ordinal", OrdinalEncoder(), ["month"]),
     
        ( "Ohe",
          OneHotEncoder(sparse=False, handle_unknown='ignore'),
          [
              "english",
              "developer_list",
              "platforms",
              "publisher_list",
          ]),
    ]
)

# pipline para regresion
pipe_reg = Pipeline([("Preprocesamiento", ct1),
                    ('Feature selection', SelectPercentile(f_classif, percentile=90)),
                    ('regresor', LinearRegression())])

# pipline para clasificacion
pipe_clf = Pipeline([("Preprocesamiento", ct1),
                   ('Feature selection', SelectPercentile(f_classif, percentile=90)),
                    ('Classifier', SVC(C=1, kernel="rbf"))])

Una vez realizado un column transformer y preparado las pipelines respectivas, se generan las grillas presentadas a continuación, que consideran los modelos `SVM` y `Random Forest` para la clasificación, mientras que para la regresión se usan los modelos `Linea Regresion`, `Random Forest Regresor`, `Ridge` y `Lasso`.

In [None]:
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso

# grilla de parametros para clasificacion
params_clf = [
    {
    'Feature selection__percentile': [20, 40, 60, 80],
     'Classifier': [SVC()],
     'Classifier__C': [0.1, 1, 10]
    },
    {
    'Feature selection__percentile': [20, 40, 60, 80],
     'Classifier': [RandomForestClassifier(random_state=42)],
    'Classifier__n_estimators': [100, 500],
    'Classifier__min_samples_split': [2, 3]
    }
]

# grilla de parametros para regresion
params_reg = [
    {
    'Feature selection__percentile': [20, 40, 60, 80],
     'regresor': [LinearRegression()],
    },
    {
    'Feature selection__percentile': [20, 40, 60, 80],
    'regresor': [RandomForestRegressor(random_state=42)],
    'regresor__n_estimators': [100, 200, 300],
    'regresor__min_samples_split': [2, 3, 4]
    },
    {
    'Feature selection__percentile': [20, 40, 60, 80],
    'regresor': [Ridge()],
    'regresor__alpha': [0.1, 0.5, 1, 5],
    },
    {
    'Feature selection__percentile': [20, 40, 60, 80],
    'regresor': [Lasso()],
    'regresor__alpha': [0.1, 0.5, 1, 5],
    }
]

Se ajustan las grillas con los datos.

In [None]:
# se ajusta grilla con los mejores parametros para clasificacion
search_clf = HalvingGridSearchCV(
              pipe_clf,
              params_clf,
              cv=3,
              random_state=42,
              verbose=10).fit(X_train_clf, y_train_clf)

n_iterations: 4
n_required_iterations: 4
n_possible_iterations: 4
min_resources_: 204
max_resources_: 5516
aggressive_elimination: False
factor: 3
----------
iter: 0
n_candidates: 28
n_resources: 204
Fitting 3 folds for each of 28 candidates, totalling 84 fits
[CV 1/3; 1/28] START Classifier=SVC(), Classifier__C=0.1, Feature selection__percentile=20
[CV 1/3; 1/28] END Classifier=SVC(), Classifier__C=0.1, Feature selection__percentile=20;, score=(train=0.259, test=0.162) total time=   0.0s
[CV 2/3; 1/28] START Classifier=SVC(), Classifier__C=0.1, Feature selection__percentile=20
[CV 2/3; 1/28] END Classifier=SVC(), Classifier__C=0.1, Feature selection__percentile=20;, score=(train=0.237, test=0.221) total time=   0.0s
[CV 3/3; 1/28] START Classifier=SVC(), Classifier__C=0.1, Feature selection__percentile=20
[CV 3/3; 1/28] END Classifier=SVC(), Classifier__C=0.1, Feature selection__percentile=20;, score=(train=0.301, test=0.343) total time=   0.0s
[CV 1/3; 2/28] START Classifier=SVC(), C

Se presenta el resultado de la clasificación sobre el conjunto de test (split del conjunto df_train).

In [None]:
# evaluacion sobre conjunto test
y_pred_clf = search_clf.predict(X_test_clf)

# resultados de la prediccion
print(classification_report(y_test_clf, y_pred_clf))

print('Los mejores parametros del clasificador son:', search_clf.best_params_)

                 precision    recall  f1-score   support

          Mixed       0.26      0.25      0.25       497
Mostly Positive       0.19      0.02      0.04       521
       Negative       0.34      0.38      0.36       389
       Positive       0.28      0.65      0.39       588
  Very Positive       0.42      0.03      0.06       370

       accuracy                           0.29      2365
      macro avg       0.30      0.27      0.22      2365
   weighted avg       0.29      0.29      0.23      2365

Los mejores parametros del clasificador son: {'Classifier': SVC(C=10), 'Classifier__C': 10, 'Feature selection__percentile': 20}


Se realiza el proceso de ajuste y predicción de manera análoga para la regresión.

In [None]:
# se ajusta grilla de regresión
search_reg = HalvingGridSearchCV(
              pipe_reg,
              params_reg,
              cv=3,
              random_state=42,
              verbose=10).fit(X_train_reg, y_train_reg)

n_iterations: 4
n_required_iterations: 4
n_possible_iterations: 4
min_resources_: 204
max_resources_: 5516
aggressive_elimination: False
factor: 3
----------
iter: 0
n_candidates: 72
n_resources: 204
Fitting 3 folds for each of 72 candidates, totalling 216 fits
[CV 1/3; 1/72] START Feature selection__percentile=20, regresor=LinearRegression()
[CV 1/3; 1/72] END Feature selection__percentile=20, regresor=LinearRegression();, score=(train=0.071, test=0.102) total time=   0.0s
[CV 2/3; 1/72] START Feature selection__percentile=20, regresor=LinearRegression()
[CV 2/3; 1/72] END Feature selection__percentile=20, regresor=LinearRegression();, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 1/72] START Feature selection__percentile=20, regresor=LinearRegression()
[CV 3/3; 1/72] END Feature selection__percentile=20, regresor=LinearRegression();, score=(train=0.418, test=-9.179) total time=   0.0s
[CV 1/3; 2/72] START Feature selection__percentile=40, regresor=LinearRegression()
[CV 1/3


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 2/72] END Feature selection__percentile=40, regresor=LinearRegression();, score=(train=0.420, test=-9.143) total time=   0.0s
[CV 1/3; 3/72] START Feature selection__percentile=60, regresor=LinearRegression()
[CV 1/3; 3/72] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=0.154, test=0.088) total time=   0.0s
[CV 2/3; 3/72] START Feature selection__percentile=60, regresor=LinearRegression()
[CV 2/3; 3/72] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 3/72] START Feature selection__percentile=60, regresor=LinearRegression()
[CV 3/3; 3/72] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=0.433, test=-9.824) total time=   0.0s
[CV 1/3; 4/72] START Feature selection__percentile=80, regresor=LinearRegression()
[CV 1/3; 4/72] END Feature selection__percentile=80, regresor=LinearRegression();, score=(train=0.158, test=0.077) total time=   0.0s



invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 4/72] END Feature selection__percentile=80, regresor=LinearRegression();, score=(train=0.461, test=-11.518) total time=   0.0s
[CV 1/3; 5/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100
[CV 1/3; 5/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.878, test=0.100) total time=   0.2s
[CV 2/3; 5/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 2/3; 5/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 5/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide



[CV 3/3; 5/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.844, test=-4.470) total time=   0.2s
[CV 1/3; 6/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200
[CV 1/3; 6/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.876, test=0.128) total time=   0.3s
[CV 2/3; 6/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200
[CV 2/3; 6/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 6/72] START Feature selection__


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 6/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.846, test=-5.093) total time=   0.3s
[CV 1/3; 7/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 7/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.872, test=0.160) total time=   0.5s
[CV 2/3; 7/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300
[CV 2/3; 7/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 7/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 7/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.846, test=-4.766) total time=   0.5s
[CV 1/3; 8/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 1/3; 8/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.846, test=0.087) total time=   0.2s
[CV 2/3; 8/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100
[CV 2/3; 8/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 8/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 8/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.609, test=-5.502) total time=   0.2s
[CV 1/3; 9/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



invalid value encountered in true_divide



[CV 1/3; 9/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.841, test=0.124) total time=   0.3s
[CV 2/3; 9/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200
[CV 2/3; 9/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 9/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 9/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.604, test=-6.550) total time=   0.3s
[CV 1/3; 10/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 10/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.837, test=0.148) total time=   0.5s
[CV 2/3; 10/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300
[CV 2/3; 10/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 10/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 10/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.594, test=-6.367) total time=   0.5s
[CV 1/3; 11/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100
[CV 1/3; 11/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.799, test=0.095) total time=   0.2s
[CV 2/3; 11/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 11/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 11/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100
[CV 3/3; 11/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.564, test=-5.322) total time=   0.2s
[CV 1/3; 12/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



invalid value encountered in true_divide


invalid value encountered in true_divide



[CV 1/3; 12/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.793, test=0.141) total time=   0.3s
[CV 2/3; 12/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200
[CV 2/3; 12/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 12/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 12/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.565, test=-6.346) total time=   0.3s
[CV 1/3; 13/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 13/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.789, test=0.150) total time=   0.5s
[CV 2/3; 13/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300
[CV 2/3; 13/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 13/72] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 13/72] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.562, test=-6.188) total time=   0.5s
[CV 1/3; 14/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100
[CV 1/3; 14/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.878, test=-0.002) total time=   0.2s
[CV 2/3; 14/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 14/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 14/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 3/3; 14/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.840, test=-3.716) total time=   0.2s
[CV 1/3; 15/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



invalid value encountered in true_divide



[CV 1/3; 15/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.878, test=0.065) total time=   0.4s
[CV 2/3; 15/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200
[CV 2/3; 15/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 15/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 15/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.843, test=-4.449) total time=   0.4s
[CV 1/3; 16/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 16/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.875, test=0.089) total time=   0.5s
[CV 2/3; 16/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300
[CV 2/3; 16/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 16/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 16/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.843, test=-4.239) total time=   0.5s
[CV 1/3; 17/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 1/3; 17/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.843, test=0.017) total time=   0.2s
[CV 2/3; 17/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100
[CV 2/3; 17/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 17/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 17/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.610, test=-5.592) total time=   0.2s
[CV 1/3; 18/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



invalid value encountered in true_divide



[CV 1/3; 18/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.841, test=0.074) total time=   0.3s
[CV 2/3; 18/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200
[CV 2/3; 18/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 18/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 18/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.608, test=-6.631) total time=   0.4s
[CV 1/3; 19/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 19/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.837, test=0.095) total time=   0.5s
[CV 2/3; 19/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300
[CV 2/3; 19/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 19/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 19/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.596, test=-6.399) total time=   0.5s
[CV 1/3; 20/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 1/3; 20/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.798, test=0.053) total time=   0.2s
[CV 2/3; 20/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100
[CV 2/3; 20/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 20/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 20/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.565, test=-5.282) total time=   0.2s
[CV 1/3; 21/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



invalid value encountered in true_divide



[CV 1/3; 21/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.793, test=0.096) total time=   0.3s
[CV 2/3; 21/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200
[CV 2/3; 21/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 21/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 21/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.567, test=-6.350) total time=   0.3s
[CV 1/3; 22/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 22/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.787, test=0.103) total time=   0.5s
[CV 2/3; 22/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300
[CV 2/3; 22/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 22/72] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 22/72] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.562, test=-6.173) total time=   0.4s
[CV 1/3; 23/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100
[CV 1/3; 23/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.877, test=-0.029) total time=   0.2s
[CV 2/3; 23/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 2/3; 23/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 23/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100
[CV 3/3; 23/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.840, test=-8.432) total time=   0.2s
[CV 1/3; 24/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide



[CV 1/3; 24/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.879, test=0.040) total time=   0.6s
[CV 2/3; 24/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200
[CV 2/3; 24/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.1s
[CV 3/3; 24/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 24/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.843, test=-8.675) total time=   0.5s
[CV 1/3; 25/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 25/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.873, test=0.070) total time=   1.0s
[CV 2/3; 25/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300
[CV 2/3; 25/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 25/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 25/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.841, test=-8.337) total time=   0.5s
[CV 1/3; 26/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 1/3; 26/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.848, test=-0.001) total time=   0.2s
[CV 2/3; 26/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100
[CV 2/3; 26/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 26/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 26/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.627, test=-7.231) total time=   0.2s
[CV 1/3; 27/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



invalid value encountered in true_divide



[CV 1/3; 27/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.845, test=0.074) total time=   0.3s
[CV 2/3; 27/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200
[CV 2/3; 27/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 27/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 27/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.618, test=-8.201) total time=   0.3s
[CV 1/3; 28/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 28/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.840, test=0.093) total time=   0.5s
[CV 2/3; 28/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300
[CV 2/3; 28/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 28/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 28/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.604, test=-7.563) total time=   0.5s
[CV 1/3; 29/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100
[CV 1/3; 29/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.804, test=0.036) total time=   0.2s
[CV 2/3; 29/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 29/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 29/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 3/3; 29/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.589, test=-6.594) total time=   0.2s
[CV 1/3; 30/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200
[CV 1/3; 30/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.798, test=0.090) total time=   0.3s
[CV 2/3; 30/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200
[CV 2/3; 30/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 30/72] START Feature selec


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 30/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.584, test=-7.507) total time=   0.3s
[CV 1/3; 31/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 31/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.793, test=0.102) total time=   0.5s
[CV 2/3; 31/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300
[CV 2/3; 31/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 31/72] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 31/72] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.574, test=-7.080) total time=   0.5s
[CV 1/3; 32/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 1/3; 32/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.879, test=0.004) total time=   0.2s
[CV 2/3; 32/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100
[CV 2/3; 32/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 32/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 32/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.836, test=-7.708) total time=   0.2s
[CV 1/3; 33/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



invalid value encountered in true_divide



[CV 1/3; 33/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.881, test=0.066) total time=   0.4s
[CV 2/3; 33/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200
[CV 2/3; 33/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 33/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 33/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.842, test=-7.943) total time=   0.4s
[CV 1/3; 34/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 34/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.876, test=0.088) total time=   0.5s
[CV 2/3; 34/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300
[CV 2/3; 34/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 34/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 34/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.841, test=-7.130) total time=   0.5s
[CV 1/3; 35/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 1/3; 35/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.846, test=0.008) total time=   0.2s
[CV 2/3; 35/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100
[CV 2/3; 35/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 35/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 35/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.630, test=-6.708) total time=   0.2s
[CV 1/3; 36/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



invalid value encountered in true_divide



[CV 1/3; 36/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.845, test=0.072) total time=   0.4s
[CV 2/3; 36/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200
[CV 2/3; 36/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 36/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 36/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.618, test=-7.776) total time=   0.4s
[CV 1/3; 37/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 37/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.840, test=0.086) total time=   0.5s
[CV 2/3; 37/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300
[CV 2/3; 37/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 37/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 37/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.605, test=-7.328) total time=   0.5s
[CV 1/3; 38/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



invalid value encountered in true_divide



[CV 1/3; 38/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.805, test=0.031) total time=   0.2s
[CV 2/3; 38/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100
[CV 2/3; 38/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 38/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 38/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.590, test=-6.444) total time=   0.2s
[CV 1/3; 39/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



invalid value encountered in true_divide



[CV 1/3; 39/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.799, test=0.091) total time=   0.4s
[CV 2/3; 39/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200
[CV 2/3; 39/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 39/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 39/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.586, test=-7.562) total time=   0.3s
[CV 1/3; 40/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



invalid value encountered in true_divide



[CV 1/3; 40/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.794, test=0.104) total time=   0.5s
[CV 2/3; 40/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300
[CV 2/3; 40/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 40/72] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 40/72] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.576, test=-7.118) total time=   0.5s
[CV 1/3; 41/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.1
[CV 1/3; 41/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.1;, score=(train=0.071, test=0.103) total time=   0.1s
[CV 2/3; 41/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.1
[CV 2/3; 41/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.1;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 41/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.1
[CV 3/3; 41/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.1;, score=(train=0.418, test=-9.151) total time=   0.0s
[CV 1/3; 42/72] START Feature selection__percentile=20, regresor=Ridge(


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 42/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 42/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.5
[CV 3/3; 42/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=0.5;, score=(train=0.417, test=-9.044) total time=   0.0s
[CV 1/3; 43/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=1
[CV 1/3; 43/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=1;, score=(train=0.071, test=0.108) total time=   0.0s
[CV 2/3; 43/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=1
[CV 2/3; 43/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=1;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 43/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=1
[CV 3/3; 43/72] END Feature selection__percentile=20, regreso


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide



[CV 1/3; 44/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=5;, score=(train=0.065, test=0.114) total time=   0.0s
[CV 2/3; 44/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=5
[CV 2/3; 44/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 44/72] START Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=5
[CV 3/3; 44/72] END Feature selection__percentile=20, regresor=Ridge(), regresor__alpha=5;, score=(train=0.414, test=-8.174) total time=   0.0s
[CV 1/3; 45/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.1
[CV 1/3; 45/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.1;, score=(train=0.136, test=0.013) total time=   0.0s
[CV 2/3; 45/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.1
[CV 2/3; 45/72] END Feature selection__percentile=40, reg


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 45/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.1;, score=(train=0.420, test=-9.111) total time=   0.0s
[CV 1/3; 46/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.5
[CV 1/3; 46/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.5;, score=(train=0.117, test=0.116) total time=   0.0s
[CV 2/3; 46/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.5
[CV 2/3; 46/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 46/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.5
[CV 3/3; 46/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=0.5;, score=(train=0.420, test=-8.999) total time=   0.0s
[CV 1/3; 47/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=1
[CV 1/3; 47/72] END Feature selection__percentil


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 47/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=1;, score=(train=0.419, test=-8.876) total time=   0.0s
[CV 1/3; 48/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=5
[CV 1/3; 48/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=5;, score=(train=0.085, test=0.138) total time=   0.0s
[CV 2/3; 48/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=5
[CV 2/3; 48/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 48/72] START Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=5
[CV 3/3; 48/72] END Feature selection__percentile=40, regresor=Ridge(), regresor__alpha=5;, score=(train=0.415, test=-8.138) total time=   0.0s
[CV 1/3; 49/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=0.1
[CV 1/3; 49/72] END Feature selection__percentile=60, regres


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 49/72] END Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=0.1;, score=(train=0.433, test=-9.782) total time=   0.0s
[CV 1/3; 50/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=0.5
[CV 1/3; 50/72] END Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=0.5;, score=(train=0.149, test=0.101) total time=   0.0s
[CV 2/3; 50/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=0.5
[CV 2/3; 50/72] END Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=0.5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 50/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=0.5
[CV 3/3; 50/72] END Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=0.5;, score=(train=0.432, test=-9.641) total time=   0.0s
[CV 1/3; 51/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=1
[CV 1/3; 51/72] END Feature selection__percentil


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 51/72] END Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=1;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 51/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=1
[CV 3/3; 51/72] END Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=1;, score=(train=0.432, test=-9.497) total time=   0.0s
[CV 1/3; 52/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=5
[CV 1/3; 52/72] END Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=5;, score=(train=0.109, test=0.120) total time=   0.0s
[CV 2/3; 52/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=5
[CV 2/3; 52/72] END Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 52/72] START Feature selection__percentile=60, regresor=Ridge(), regresor__alpha=5
[CV 3/3; 52/72] END Feature selection__percentile=60, regresor=Ridg


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 1/3; 53/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=0.1;, score=(train=0.158, test=0.082) total time=   0.0s
[CV 2/3; 53/72] START Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=0.1
[CV 2/3; 53/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=0.1;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 53/72] START Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=0.1
[CV 3/3; 53/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=0.1;, score=(train=0.461, test=-11.333) total time=   0.0s
[CV 1/3; 54/72] START Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=0.5
[CV 1/3; 54/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=0.5;, score=(train=0.152, test=0.095) total time=   0.0s
[CV 2/3; 54/72] START Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=0.5
[CV 2/3; 54/72] END Feature selection__percent


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide



[CV 1/3; 55/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=1;, score=(train=0.145, test=0.104) total time=   0.0s
[CV 2/3; 55/72] START Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=1
[CV 2/3; 55/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=1;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 55/72] START Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=1
[CV 3/3; 55/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=1;, score=(train=0.456, test=-10.378) total time=   0.0s
[CV 1/3; 56/72] START Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=5
[CV 1/3; 56/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=5;, score=(train=0.110, test=0.119) total time=   0.0s
[CV 2/3; 56/72] START Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=5
[CV 2/3; 56/72] END Feature selection__percentile=80, regresor


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 56/72] END Feature selection__percentile=80, regresor=Ridge(), regresor__alpha=5;, score=(train=0.439, test=-8.938) total time=   0.0s
[CV 1/3; 57/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.1
[CV 1/3; 57/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.1;, score=(train=0.071, test=0.102) total time=   0.1s
[CV 2/3; 57/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.1
[CV 2/3; 57/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.1;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 57/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.1
[CV 3/3; 57/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.1;, score=(train=0.418, test=-9.179) total time=   0.0s
[CV 1/3; 58/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.5
[CV 1/3; 58/72] END Feature selection__percentil


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 58/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 58/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.5
[CV 3/3; 58/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=0.5;, score=(train=0.418, test=-9.179) total time=   0.0s
[CV 1/3; 59/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=1
[CV 1/3; 59/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=1;, score=(train=0.071, test=0.102) total time=   0.0s
[CV 2/3; 59/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=1
[CV 2/3; 59/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=1;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 59/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=1
[CV 3/3; 59/72] END Feature selection__percentile=20, regreso


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide



[CV 1/3; 60/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=5;, score=(train=0.071, test=0.102) total time=   0.0s
[CV 2/3; 60/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=5
[CV 2/3; 60/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 60/72] START Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=5
[CV 3/3; 60/72] END Feature selection__percentile=20, regresor=Lasso(), regresor__alpha=5;, score=(train=0.418, test=-9.178) total time=   0.0s
[CV 1/3; 61/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.1
[CV 1/3; 61/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.1;, score=(train=0.141, test=-0.155) total time=   0.0s
[CV 2/3; 61/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.1
[CV 2/3; 61/72] END Feature selection__percentile=40, re


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 61/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.1;, score=(train=0.420, test=-9.136) total time=   0.0s
[CV 1/3; 62/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.5
[CV 1/3; 62/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.5;, score=(train=0.141, test=-0.155) total time=   0.0s
[CV 2/3; 62/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.5
[CV 2/3; 62/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 62/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.5
[CV 3/3; 62/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=0.5;, score=(train=0.420, test=-9.136) total time=   0.0s
[CV 1/3; 63/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=1
[CV 1/3; 63/72] END Feature selection__percenti


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.669e+11, tolerance: 7.590e+09



[CV 3/3; 63/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=1;, score=(train=0.420, test=-9.138) total time=   0.0s
[CV 1/3; 64/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=5
[CV 1/3; 64/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=5;, score=(train=0.141, test=-0.153) total time=   0.0s
[CV 2/3; 64/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=5
[CV 2/3; 64/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 64/72] START Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=5
[CV 3/3; 64/72] END Feature selection__percentile=40, regresor=Lasso(), regresor__alpha=5;, score=(train=0.420, test=-9.142) total time=   0.0s
[CV 1/3; 65/72] START Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=0.1
[CV 1/3; 65/72] END Feature selection__percentile=60, regre


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.930e+10, tolerance: 1.759e+09


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



[CV 3/3; 65/72] END Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=0.1;, score=(train=0.433, test=-9.826) total time=   0.0s
[CV 1/3; 66/72] START Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=0.5
[CV 1/3; 66/72] END Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=0.5;, score=(train=0.154, test=0.087) total time=   0.0s
[CV 2/3; 66/72] START Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=0.5
[CV 2/3; 66/72] END Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=0.5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 66/72] START Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=0.5
[CV 3/3; 66/72] END Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=0.5;, score=(train=0.433, test=-9.826) total time=   0.0s
[CV 1/3; 67/72] START Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=1
[CV 1/3; 67/72] END Feature selection__percentil


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.934e+10, tolerance: 1.759e+09


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.567e+11, tolerance: 7.590e+09



[CV 3/3; 67/72] END Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=1;, score=(train=0.433, test=-9.825) total time=   0.0s
[CV 1/3; 68/72] START Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=5
[CV 1/3; 68/72] END Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=5;, score=(train=0.154, test=0.063) total time=   0.0s
[CV 2/3; 68/72] START Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=5
[CV 2/3; 68/72] END Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 68/72] START Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=5
[CV 3/3; 68/72] END Feature selection__percentile=60, regresor=Lasso(), regresor__alpha=5;, score=(train=0.433, test=-9.823) total time=   0.0s
[CV 1/3; 69/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=0.1
[CV 1/3; 69/72] END Feature selection__percentile=80, regres


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 6.506e+10, tolerance: 1.759e+09


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.211e+12, tolerance: 7.590e+09



[CV 3/3; 69/72] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=0.1;, score=(train=0.461, test=-11.516) total time=   0.0s
[CV 1/3; 70/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=0.5
[CV 1/3; 70/72] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=0.5;, score=(train=0.158, test=0.076) total time=   0.0s
[CV 2/3; 70/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=0.5
[CV 2/3; 70/72] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=0.5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 70/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=0.5
[CV 3/3; 70/72] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=0.5;, score=(train=0.461, test=-11.517) total time=   0.0s
[CV 1/3; 71/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1
[CV 1/3; 71/72] END Feature selection__percent


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 6.510e+10, tolerance: 1.759e+09


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.211e+12, tolerance: 7.590e+09


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 6.514e+10, tolerance: 1.759e+09


divide by zero encountered in true_divide


invalid value encountered in subtract


No features w

[CV 2/3; 71/72] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 71/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1
[CV 3/3; 71/72] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1;, score=(train=0.461, test=-11.518) total time=   0.1s
[CV 1/3; 72/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5
[CV 1/3; 72/72] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5;, score=(train=0.158, test=0.053) total time=   0.0s
[CV 2/3; 72/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5
[CV 2/3; 72/72] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 72/72] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5
[CV 3/3; 72/72] END Feature selection__percentile=80, regresor=Las


invalid value encountered in true_divide


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.212e+12, tolerance: 7.590e+09


invalid value encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


invalid value encountered in true_divide



72 fits failed out of a total of 216.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
4 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sklearn/model_selection/_validati

[CV 1/3; 1/24] START Feature selection__percentile=80, regresor=LinearRegression()
[CV 1/3; 1/24] END Feature selection__percentile=80, regresor=LinearRegression();, score=(train=0.172, test=-1.067) total time=   0.1s
[CV 2/3; 1/24] START Feature selection__percentile=80, regresor=LinearRegression()



divide by zero encountered in true_divide


divide by zero encountered in true_divide



[CV 2/3; 1/24] END Feature selection__percentile=80, regresor=LinearRegression();, score=(train=0.398, test=-10.092) total time=   0.1s
[CV 3/3; 1/24] START Feature selection__percentile=80, regresor=LinearRegression()
[CV 3/3; 1/24] END Feature selection__percentile=80, regresor=LinearRegression();, score=(train=0.128, test=-0.052) total time=   0.1s
[CV 1/3; 2/24] START Feature selection__percentile=60, regresor=LinearRegression()



divide by zero encountered in true_divide


divide by zero encountered in true_divide



[CV 1/3; 2/24] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=0.149, test=-1.037) total time=   0.1s
[CV 2/3; 2/24] START Feature selection__percentile=60, regresor=LinearRegression()
[CV 2/3; 2/24] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=nan, test=nan) total time=   0.1s
[CV 3/3; 2/24] START Feature selection__percentile=60, regresor=LinearRegression()
[CV 3/3; 2/24] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=0.107, test=-0.073) total time=   0.1s
[CV 1/3; 3/24] START Feature selection__percentile=40, regresor=LinearRegression()



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 1/3; 3/24] END Feature selection__percentile=40, regresor=LinearRegression();, score=(train=nan, test=nan) total time=   0.1s
[CV 2/3; 3/24] START Feature selection__percentile=40, regresor=LinearRegression()
[CV 2/3; 3/24] END Feature selection__percentile=40, regresor=LinearRegression();, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 3/24] START Feature selection__percentile=40, regresor=LinearRegression()
[CV 3/3; 3/24] END Feature selection__percentile=40, regresor=LinearRegression();, score=(train=nan, test=nan) total time=   0.0s
[CV 1/3; 4/24] START Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100
[CV 1/3; 4/24] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 2/3; 4/24] START Feature selection__percentile=20, regresor=RandomFor


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 3/3; 4/24] END Feature selection__percentile=20, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.1s
[CV 1/3; 5/24] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100
[CV 1/3; 5/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.1s
[CV 2/3; 5/24] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100
[CV 2/3; 5/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 5/24] START Feature selection__percentil


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 1/3; 6/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 2/3; 6/24] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200
[CV 2/3; 6/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 6/24] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200
[CV 3/3; 6/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 1/3; 7/24] START Feature selection__percentil


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 3/3; 7/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.1s
[CV 1/3; 8/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 1/3; 8/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.833, test=-3.031) total time=   0.8s
[CV 2/3; 8/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 2/3; 8/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.871, test=-0.750) total time=   0.8s
[CV 3/3; 8/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 3/3; 8/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.853, test=-1.178) total time=   0.8s
[CV 1/3; 9/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 1/3; 9/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.836, test=-3.390) total time=   0.5s
[CV 2/3; 9/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 2/3; 9/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.871, test=-0.798) total time=   0.5s
[CV 3/3; 9/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 3/3; 9/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.848, test=-1.243) total time=   0.5s
[CV 1/3; 10/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 1/3; 10/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.838, test=-3.795) total time=   0.3s
[CV 2/3; 10/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 2/3; 10/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.873, test=-1.006) total time=   0.3s
[CV 3/3; 10/24] START Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 3/3; 10/24] END Feature selection__percentile=80, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.841, test=-1.490) total time=   0.3s
[CV 1/3; 11/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 1/3; 11/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.223, test=-0.848) total time=   0.5s
[CV 2/3; 11/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300
[CV 2/3; 11/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 11/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 11/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.113, test=-0.095) total time=   0.5s
[CV 1/3; 12/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 1/3; 12/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.223, test=-0.864) total time=   0.3s
[CV 2/3; 12/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200
[CV 2/3; 12/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 12/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 12/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.113, test=-0.067) total time=   0.4s
[CV 1/3; 13/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 1/3; 13/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.222, test=-0.875) total time=   0.2s
[CV 2/3; 13/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100
[CV 2/3; 13/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 13/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 13/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.112, test=-0.046) total time=   0.2s
[CV 1/3; 14/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 1/3; 14/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.224, test=-0.849) total time=   0.5s
[CV 2/3; 14/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300
[CV 2/3; 14/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 14/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 14/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.115, test=-0.103) total time=   0.5s
[CV 1/3; 15/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 1/3; 15/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.223, test=-0.864) total time=   0.4s
[CV 2/3; 15/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200
[CV 2/3; 15/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 15/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 15/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.115, test=-0.079) total time=   0.3s
[CV 1/3; 16/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 1/3; 16/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.222, test=-0.874) total time=   0.2s
[CV 2/3; 16/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100
[CV 2/3; 16/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.1s
[CV 3/3; 16/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 16/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.114, test=-0.059) total time=   0.2s
[CV 1/3; 17/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 1/3; 17/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.224, test=-0.849) total time=   0.5s
[CV 2/3; 17/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300
[CV 2/3; 17/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 17/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 17/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=300;, score=(train=0.116, test=-0.125) total time=   0.5s
[CV 1/3; 18/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 1/3; 18/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.224, test=-0.863) total time=   0.3s
[CV 2/3; 18/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200
[CV 2/3; 18/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 18/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 18/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=200;, score=(train=0.115, test=-0.072) total time=   0.3s
[CV 1/3; 19/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 1/3; 19/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.223, test=-0.873) total time=   0.2s
[CV 2/3; 19/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100
[CV 2/3; 19/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 19/24] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 19/24] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=2, regresor__n_estimators=100;, score=(train=0.114, test=-0.057) total time=   0.2s
[CV 1/3; 20/24] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300
[CV 1/3; 20/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 2/3; 20/24] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300
[CV 2/3; 20/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 20/24] START Feature selection


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 21/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.0s
[CV 3/3; 21/24] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200
[CV 3/3; 21/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=nan, test=nan) total time=   0.1s
[CV 1/3; 22/24] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100
[CV 1/3; 22/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 2/3; 22/24] START Feature selection__per


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.


divide by zero encountered in true_divide



[CV 3/3; 22/24] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=nan, test=nan) total time=   0.0s
[CV 1/3; 23/24] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1
[CV 1/3; 23/24] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1;, score=(train=0.172, test=-1.066) total time=   0.1s
[CV 2/3; 23/24] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1



divide by zero encountered in true_divide


divide by zero encountered in true_divide



[CV 2/3; 23/24] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1;, score=(train=0.398, test=-10.091) total time=   0.1s
[CV 3/3; 23/24] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1
[CV 3/3; 23/24] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=1;, score=(train=0.128, test=-0.052) total time=   0.1s
[CV 1/3; 24/24] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5



divide by zero encountered in true_divide


divide by zero encountered in true_divide



[CV 1/3; 24/24] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5;, score=(train=0.172, test=-1.065) total time=   0.1s
[CV 2/3; 24/24] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5
[CV 2/3; 24/24] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5;, score=(train=0.398, test=-10.090) total time=   0.1s
[CV 3/3; 24/24] START Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5



divide by zero encountered in true_divide



34 fits failed out of a total of 72.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.8/dist-packages/sklearn/pipeline.py", line 394, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_base.py", line 662, in fit
    X, y = self._validate_data(
  File "/usr/local/lib/python3.8/dist-packages/sklearn/base.py", line 581, in _validate_data
    X, 

[CV 3/3; 24/24] END Feature selection__percentile=80, regresor=Lasso(), regresor__alpha=5;, score=(train=0.128, test=-0.051) total time=   0.1s
----------
iter: 2
n_candidates: 8
n_resources: 1836
Fitting 3 folds for each of 8 candidates, totalling 24 fits
[CV 1/3; 1/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100
[CV 1/3; 1/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.475, test=-0.260) total time=   0.5s
[CV 2/3; 1/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 2/3; 1/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.528, test=-0.056) total time=   0.4s
[CV 3/3; 1/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 3/3; 1/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.791, test=-0.003) total time=   0.4s
[CV 1/3; 2/8] START Feature selection__percentile=60, regresor=LinearRegression()
[CV 1/3; 2/8] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=0.303, test=-0.901) total time=   0.1s
[CV 2/3; 2/8] START Feature selection__percentile=60, regresor=LinearRegression()



divide by zero encountered in true_divide


divide by zero encountered in true_divide



[CV 2/3; 2/8] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=0.311, test=-0.235) total time=   0.1s
[CV 3/3; 2/8] START Feature selection__percentile=60, regresor=LinearRegression()
[CV 3/3; 2/8] END Feature selection__percentile=60, regresor=LinearRegression();, score=(train=0.072, test=0.052) total time=   0.1s
[CV 1/3; 3/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide


divide by zero encountered in true_divide



[CV 1/3; 3/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.494, test=-0.150) total time=   1.1s
[CV 2/3; 3/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 2/3; 3/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.508, test=-0.070) total time=   1.2s
[CV 3/3; 3/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 3/3; 3/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.790, test=-0.002) total time=   1.0s
[CV 1/3; 4/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 1/3; 4/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.258, test=-0.559) total time=   0.6s
[CV 2/3; 4/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 2/3; 4/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.635, test=0.112) total time=   0.8s
[CV 3/3; 4/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 3/3; 4/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=300;, score=(train=0.798, test=0.011) total time=   0.9s
[CV 1/3; 5/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 1/3; 5/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.259, test=-0.613) total time=   0.5s
[CV 2/3; 5/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 2/3; 5/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.639, test=0.063) total time=   0.6s
[CV 3/3; 5/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 3/3; 5/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.794, test=0.015) total time=   0.6s
[CV 1/3; 6/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 1/3; 6/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.259, test=-0.639) total time=   0.3s
[CV 2/3; 6/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 2/3; 6/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.616, test=0.131) total time=   0.4s
[CV 3/3; 6/8] START Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 3/3; 6/8] END Feature selection__percentile=40, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=100;, score=(train=0.803, test=0.011) total time=   0.3s
[CV 1/3; 7/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 1/3; 7/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.751, test=0.029) total time=   0.8s
[CV 2/3; 7/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 2/3; 7/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.653, test=-0.057) total time=   0.9s
[CV 3/3; 7/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 3/3; 7/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.825, test=-0.002) total time=   0.8s
[CV 1/3; 8/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 1/3; 8/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.505, test=-0.198) total time=   0.8s
[CV 2/3; 8/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 2/3; 8/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.529, test=-0.124) total time=   0.9s
[CV 3/3; 8/8] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 3/3; 8/8] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=200;, score=(train=0.784, test=-0.003) total time=   0.8s
----------
iter: 3
n_candidates: 3
n_resources: 5508
Fitting 3 folds for each of 3 candidates, totalling 9 fits
[CV 1/3; 1/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



One or more of the test scores are non-finite: [        nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
 -3.73681522         nan         nan         nan         nan         nan
   

[CV 1/3; 1/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.447, test=-0.145) total time=   0.8s
[CV 2/3; 1/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 2/3; 1/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.813, test=-0.058) total time=   1.1s
[CV 3/3; 1/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100



divide by zero encountered in true_divide



[CV 3/3; 1/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=100;, score=(train=0.777, test=-0.003) total time=   1.0s
[CV 1/3; 2/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 1/3; 2/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.453, test=-0.147) total time=   1.9s
[CV 2/3; 2/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 2/3; 2/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.826, test=-0.056) total time=   2.8s
[CV 3/3; 2/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300



divide by zero encountered in true_divide



[CV 3/3; 2/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=4, regresor__n_estimators=300;, score=(train=0.779, test=0.009) total time=   2.7s
[CV 1/3; 3/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 1/3; 3/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.481, test=-0.178) total time=   1.3s
[CV 2/3; 3/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 2/3; 3/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.846, test=-0.064) total time=   1.9s
[CV 3/3; 3/3] START Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200



divide by zero encountered in true_divide



[CV 3/3; 3/3] END Feature selection__percentile=60, regresor=RandomForestRegressor(random_state=42), regresor__min_samples_split=3, regresor__n_estimators=200;, score=(train=0.823, test=-0.003) total time=   1.9s



One or more of the test scores are non-finite: [        nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
 -3.73681522         nan         nan         nan         nan         nan
   

In [None]:
# resultados de la prediccion
y_pred_reg = search_reg.predict( X_test_reg )

# evaluacion de la regresion
evaluate(y_test_reg, y_pred_reg)

print('Los mejores parametros del regresor son:', search_reg.best_params_)

MSE: 1260778043487.012 

RMSE: 1122843.730662024
MAE: 225365.90775289663
MedAE: 21619.284579773113 

R²: 0.231664057714149
Los mejores parametros del regresor son: {'Feature selection__percentile': 60, 'regresor': RandomForestRegressor(min_samples_split=4, n_estimators=300, random_state=42), 'regresor__min_samples_split': 4, 'regresor__n_estimators': 300}


### Predicción sobre el conjunto de test

In [None]:
# Prediccion sobre el conjunto de test
y_pred_test_clf = search_clf.predict(df_test2)

y_pred_test_reg = search_reg.predict(df_test2)

---

## 5. Optimización del Modelo

Para mejorar los modelos se tomaron en cuenta las variables de tipo texto que previamente no se tomaron en cuenta.
Se eliminarion algunas variables que se cree que pueden empeorar el rendimiento del modelo.
Para reducir el tiempo de cómputo de la grilla se redujo el espacio de búsqueda a pocos parámetros y soloamente se buscó la cantidad la cantidad de features a buscar en el modelo.

In [217]:
# Librería Core del lab.
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline

from sklearn.model_selection import train_test_split 

# Pre-procesamiento
from sklearn.feature_selection import SelectPercentile, f_classif
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction.text import CountVectorizer

# Clasifación
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

#Regresión
from sklearn.ensemble import RandomForestRegressor

# Metricas de evaluación
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.metrics import cohen_kappa_score

# Librería para plotear
!pip install --upgrade plotly
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Proyecciones en baja dimensionalidad: UMAP
!pip install umap-learn

# Librería para NLP
!pip install nltk
import nltk
from nltk.corpus import stopwords
from nltk import word_tokenize  
from nltk.stem import PorterStemmer
nltk.download('stopwords')
nltk.download('punkt')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting umap-learn
  Downloading umap-learn-0.5.3.tar.gz (88 kB)
[K     |████████████████████████████████| 88 kB 5.5 MB/s 
Collecting pynndescent>=0.5
  Downloading pynndescent-0.5.8.tar.gz (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 89.9 MB/s 
Building wheels for collected packages: umap-learn, pynndescent
  Building wheel for umap-learn (setup.py) ... [?25l[?25hdone
  Created wheel for umap-learn: filename=umap_learn-0.5.3-py3-none-any.whl size=82829 sha256=0b880a74a0db5ce89dd1bf9ff4b210f5c22cda79fd72f9eebb1eed39b03a7829
  Stored in directory: /root/.cache/pip/wheels/a9/3a/67/06a8950e053725912e6a8c42c4a3a241410f6487b8402542ea
  Building wheel for pynndescent (setup.py) ... [?25l[?25hdone
  Created wheel for pynndescent: filename=pynndescent-0.5.8-py3-none-any.whl siz

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [218]:
class StemmerTokenizer:
    def __init__(self):
        self.ps = PorterStemmer()
    def __call__(self, doc):
        doc_tok = word_tokenize(doc)
        return [self.ps.stem(t) for t in doc_tok]

In [321]:
df_train.columns

Index(['name', 'release_date', 'english', 'developer', 'publisher',
       'platforms', 'required_age', 'categories', 'genres', 'tags',
       'achievements', 'average_playtime', 'price', 'short_description',
       'estimated_sells', 'rating', 'release_month'],
      dtype='object')

In [322]:
normalize_reg = [
    'release_month',
    'price'
    ]

normalize_clf = [
    'required_age',
    'price'
]

standardize_reg = [
    'achievements',
    'average_playtime'
]

standardize_clf = [
    'average_playtime'
]

bow = CountVectorizer(
    tokenizer = StemmerTokenizer(),
    ngram_range = (1,2),
    stop_words = [";"]
)

transformer_reg = ColumnTransformer(
    transformers=[
        ('Devs_bow', bow, 'developer'),
        ('Pubs_bow', bow, 'publisher'),
        ('Cat_bow', bow, 'categories'),
        ('Genres_bow', bow, 'genres'),
        ('tags_bow', bow, 'tags'),
        ('MinMaxScaler', MinMaxScaler(), normalize_reg),
        ('StandardScaler', StandardScaler(), standardize_reg)
    ]
)

transformer_clf = ColumnTransformer(
    transformers=[
        ('Devs_bow', bow, 'developer'),
        ('Pubs_bow', bow, 'publisher'),
        ('Cat_bow', bow, 'categories'),
        ('Genres_bow', bow, 'genres'),
        ('tags_bow', bow, 'tags'),
        ('MinMaxScaler', MinMaxScaler(), normalize_clf),
        ('StandardScaler', StandardScaler(), standardize_clf)
    ]
)

In [323]:
regressor_pipe = Pipeline([
    ('Preprocessing', transformer_reg),
    ('Feature selection', SelectPercentile(f_classif, percentile=90)),
    ('Regressor', RandomForestRegressor())
])

classification_pipe = Pipeline([
    ('Preprocessing', transformer_clf),
    ('Feature selection', SelectPercentile(f_classif, percentile=90)),
    ('Regressor', RandomForestClassifier())
])

In [324]:
X = df_train.drop(['rating', 'estimated_sells'], axis=1)
y_reg = df_train.estimated_sells
y_clf = df_train.rating

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X, y_reg, shuffle=True, test_size=0.3, random_state=33
)

X_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(
    X, y_clf, shuffle=True, test_size=0.3, random_state=33
)

In [250]:
def evaluate(y_test, y_pred):

    print('MSE:', mean_squared_error(y_test, y_pred), '\n')
    print('RMSE:', mean_squared_error(y_test, y_pred, squared=False))
    print('MAE:', mean_absolute_error(y_test, y_pred))
    print('MedAE:', median_absolute_error(y_test, y_pred), '\n')
    print('R²:', r2_score(y_test, y_pred))

In [270]:
params_reg = [
    {
    'Feature selection__percentile': [40, 60, 80],
    'Regressor': [SVR(), RandomForestRegressor(random_state=42)],
    }
]

search_reg = HalvingGridSearchCV(
    regressor_pipe,
    params_reg,
    cv=3,
    random_state=42,
    verbose=10).fit(X_train_reg, y_train_reg)

n_iterations: 2
n_required_iterations: 2
n_possible_iterations: 2
min_resources_: 1838
max_resources_: 5516
aggressive_elimination: False
factor: 3
----------
iter: 0
n_candidates: 6
n_resources: 1838
Fitting 3 folds for each of 6 candidates, totalling 18 fits
[CV 1/3; 1/6] START Feature selection__percentile=40, Regressor=SVR()...........



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 1/3; 1/6] END Feature selection__percentile=40, Regressor=SVR();, score=(train=nan, test=nan) total time=   3.3s
[CV 2/3; 1/6] START Feature selection__percentile=40, Regressor=SVR()...........



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 1/6] END Feature selection__percentile=40, Regressor=SVR();, score=(train=nan, test=nan) total time=   1.9s
[CV 3/3; 1/6] START Feature selection__percentile=40, Regressor=SVR()...........



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 3/3; 1/6] END Feature selection__percentile=40, Regressor=SVR();, score=(train=nan, test=nan) total time=   1.9s
[CV 1/3; 2/6] START Feature selection__percentile=40, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 1/3; 2/6] END Feature selection__percentile=40, Regressor=RandomForestRegressor(random_state=42);, score=(train=nan, test=nan) total time=   1.9s
[CV 2/3; 2/6] START Feature selection__percentile=40, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 2/6] END Feature selection__percentile=40, Regressor=RandomForestRegressor(random_state=42);, score=(train=nan, test=nan) total time=   2.0s
[CV 3/3; 2/6] START Feature selection__percentile=40, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 3/3; 2/6] END Feature selection__percentile=40, Regressor=RandomForestRegressor(random_state=42);, score=(train=nan, test=nan) total time=   2.0s
[CV 1/3; 3/6] START Feature selection__percentile=60, Regressor=SVR()...........



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 1/3; 3/6] END Feature selection__percentile=60, Regressor=SVR();, score=(train=nan, test=nan) total time=   1.8s
[CV 2/3; 3/6] START Feature selection__percentile=60, Regressor=SVR()...........



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 3/6] END Feature selection__percentile=60, Regressor=SVR();, score=(train=nan, test=nan) total time=   1.8s
[CV 3/3; 3/6] START Feature selection__percentile=60, Regressor=SVR()...........



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 3/3; 3/6] END Feature selection__percentile=60, Regressor=SVR();, score=(train=nan, test=nan) total time=   1.8s
[CV 1/3; 4/6] START Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 1/3; 4/6] END Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42);, score=(train=nan, test=nan) total time=   1.8s
[CV 2/3; 4/6] START Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 2/3; 4/6] END Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42);, score=(train=nan, test=nan) total time=   1.8s
[CV 3/3; 4/6] START Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide


invalid value encountered in subtract


No features were selected: either the data is too noisy or the selection test too strict.



[CV 3/3; 4/6] END Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42);, score=(train=nan, test=nan) total time=   1.8s
[CV 1/3; 5/6] START Feature selection__percentile=80, Regressor=SVR()...........



divide by zero encountered in true_divide



[CV 1/3; 5/6] END Feature selection__percentile=80, Regressor=SVR();, score=(train=-0.010, test=-0.031) total time=   3.1s
[CV 2/3; 5/6] START Feature selection__percentile=80, Regressor=SVR()...........



divide by zero encountered in true_divide



[CV 2/3; 5/6] END Feature selection__percentile=80, Regressor=SVR();, score=(train=-0.009, test=-0.024) total time=   3.1s
[CV 3/3; 5/6] START Feature selection__percentile=80, Regressor=SVR()...........



divide by zero encountered in true_divide



[CV 3/3; 5/6] END Feature selection__percentile=80, Regressor=SVR();, score=(train=-0.031, test=-0.007) total time=   3.0s
[CV 1/3; 6/6] START Feature selection__percentile=80, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide



[CV 1/3; 6/6] END Feature selection__percentile=80, Regressor=RandomForestRegressor(random_state=42);, score=(train=0.864, test=-0.113) total time=  18.6s
[CV 2/3; 6/6] START Feature selection__percentile=80, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide



[CV 2/3; 6/6] END Feature selection__percentile=80, Regressor=RandomForestRegressor(random_state=42);, score=(train=0.870, test=0.162) total time=  18.4s
[CV 3/3; 6/6] START Feature selection__percentile=80, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide



[CV 3/3; 6/6] END Feature selection__percentile=80, Regressor=RandomForestRegressor(random_state=42);, score=(train=0.892, test=0.019) total time=  20.3s
----------
iter: 1
n_candidates: 2
n_resources: 5514
Fitting 3 folds for each of 2 candidates, totalling 6 fits
[CV 1/3; 1/2] START Feature selection__percentile=60, Regressor=SVR()...........




12 fits failed out of a total of 18.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
6 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.8/dist-packages/sklearn/pipeline.py", line 394, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/usr/local/lib/python3.8/dist-packages/sklearn/svm/_base.py", line 190, in fit
    X, y = self._validate_data(
  File "/usr/local/lib/python3.8/dist-packages/sklearn/base.py", line 581, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/usr/loca

[CV 1/3; 1/2] END Feature selection__percentile=60, Regressor=SVR();, score=(train=-0.012, test=-0.034) total time=  12.2s
[CV 2/3; 1/2] START Feature selection__percentile=60, Regressor=SVR()...........



divide by zero encountered in true_divide



[CV 2/3; 1/2] END Feature selection__percentile=60, Regressor=SVR();, score=(train=-0.011, test=-0.031) total time=  12.3s
[CV 3/3; 1/2] START Feature selection__percentile=60, Regressor=SVR()...........



divide by zero encountered in true_divide



[CV 3/3; 1/2] END Feature selection__percentile=60, Regressor=SVR();, score=(train=-0.033, test=-0.010) total time=  12.0s
[CV 1/3; 2/2] START Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide



[CV 1/3; 2/2] END Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42);, score=(train=0.873, test=-0.409) total time= 1.7min
[CV 2/3; 2/2] START Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide



[CV 2/3; 2/2] END Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42);, score=(train=0.878, test=-0.401) total time= 1.4min
[CV 3/3; 2/2] START Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42)



divide by zero encountered in true_divide



[CV 3/3; 2/2] END Feature selection__percentile=60, Regressor=RandomForestRegressor(random_state=42);, score=(train=0.880, test=0.056) total time= 1.4min



One or more of the test scores are non-finite: [        nan         nan         nan         nan -0.02070131  0.02261739
 -0.02485879 -0.25137517]


One or more of the train scores are non-finite: [        nan         nan         nan         nan -0.01665023  0.87518904
 -0.01845471  0.87683114]


divide by zero encountered in true_divide



In [325]:
params_clf = [
    {
    'Feature selection__percentile': [40, 60, 80],
    'Regressor': [SVC(), RandomForestClassifier(random_state=42)],
    }
]

search_clf = HalvingGridSearchCV(
    regressor_pipe,
    params_clf,
    cv=3,
    random_state=42,
    verbose=10).fit(X_train_clf, y_train_clf)

n_iterations: 2
n_required_iterations: 2
n_possible_iterations: 2
min_resources_: 1838
max_resources_: 5516
aggressive_elimination: False
factor: 3
----------
iter: 0
n_candidates: 6
n_resources: 1838
Fitting 3 folds for each of 6 candidates, totalling 18 fits
[CV 1/3; 1/6] START Feature selection__percentile=40, Regressor=SVC()...........
[CV 1/3; 1/6] END Feature selection__percentile=40, Regressor=SVC();, score=(train=0.674, test=0.312) total time=   2.6s
[CV 2/3; 1/6] START Feature selection__percentile=40, Regressor=SVC()...........
[CV 2/3; 1/6] END Feature selection__percentile=40, Regressor=SVC();, score=(train=0.676, test=0.319) total time=   2.5s
[CV 3/3; 1/6] START Feature selection__percentile=40, Regressor=SVC()...........
[CV 3/3; 1/6] END Feature selection__percentile=40, Regressor=SVC();, score=(train=0.652, test=0.268) total time=   2.6s
[CV 1/3; 2/6] START Feature selection__percentile=40, Regressor=RandomForestClassifier(random_state=42)
[CV 1/3; 2/6] END Feature sel

In [307]:
y_search_reg = search_reg.predict(X_test_reg)
evaluate(y_test_reg, y_search_reg)

MSE: 1681365069334.453 

RMSE: 1296674.6196846967
MAE: 211596.22982794192
MedAE: 14488.033281318541 

R²: -0.024647614659153882


In [326]:
y_search_clf = search_clf.predict(X_test_clf)
print(classification_report(y_test_clf, y_search_clf))

                 precision    recall  f1-score   support

          Mixed       0.32      0.26      0.29       497
Mostly Positive       0.24      0.19      0.21       521
       Negative       0.46      0.35      0.39       389
       Positive       0.31      0.57      0.40       588
  Very Positive       0.46      0.23      0.31       370

       accuracy                           0.33      2365
      macro avg       0.36      0.32      0.32      2365
   weighted avg       0.35      0.33      0.32      2365



In [344]:
y_final_reg = search_reg.predict(df_test)
y_final_clf = search_clf.predict(df_test)

Pudimos ver que el modelo mejora con respecto al baseline en clasificación pero empeora en regresión.

---

## 6. Conclusiones

En base a los resultados obtenidos se puede concluir que el problema fue exitosamente resuelto, ya que los resultados obtenidos en la competencia superan ambos el baseline establecido.
Por esto mismo, se consiera que los resultados obtenidos son aceptables en el contexto del problema.

El rendimiento del baseline considera un valor de 0.23 de `f1-score weighted` con `r_2` de 0.23.
En el baseline se realizó un enfoque distinto utilizando `bag of words`, dando un `f1-score weighted` de 0.32 y un `r_2` de -0.024.
Con esto se puede ver que el resultado de la clasificación mejora, mientras que el de la regresión empeora.

Los resultados obtenidos nos edjan conformes, ya que consideramos que se cumplieron las metas del curso, entendiendo y aplicando diversos conceptos para obtener obtener buenos resultados.
También consideramos que enfoques más produndos de `NLP` y de análisis de variables pueden dar mejores resultados.
En cuanto a la competencia establecida, si bien genera un estrés de cumplir cierta base, una vez alcanzado dicho nivel aporta a mejorar y ver el contexto de rendimiento de otras personas que están resolviendo el mismo problema.
Los aprendizajes del proyecto fueron variados, partiendo aprender a analizar desde una perspectiva más global los datos, y pensando en el rendimiento del modelo en cada decisión tomada, lo cual permitió integrar muchos de los contenidos vistos en el curso.
Entre los aprendizajes no obtenidos fueron análsis más profundos de NLP que permitieran un mejor rendimiento.


<!-- Algunas respuestas que podrían plantearse pueden ser:

- ¿Pudieron resolver exitosamente el problema?
- ¿Son aceptables los resultados obtenidos?


- ¿Como fue el rendimiento del baseline?
- ¿Pudieron optimizar el baseline?
- ¿Que tanto mejoro el baseline con respecto a sus optimizaciones?


- ¿Estuvieron conformes con sus resultados?
- ¿Creen que hayan mejores formas de modelar el problema?
- ¿Creen que fue buena idea usar una competencia de por medio?
- ¿En general, qué aprendieron en el pryecto?¿Qué no aprendieron y les gustaría haber aprendido?
- Etc...

**OJO** si usted decide responder parte de estas preguntas, debe redactarlas en un formato de informe y no responderlas directamente. -->

---

<br>

### Anexo: Generación de Archivo Submit de la Competencia

Para subir los resultados obtenidos a la pagina de CodaLab utilice la función `generateFiles` entregada mas abajo. Esto es debido a que usted deberá generar archivos que respeten extrictamente el formato de CodaLab, de lo contario los resultados no se veran reflejados en la pagina de la competencia.

Para los resultados obtenidos en su modelo de clasificación y regresión, estos serán guardados en un archivo zip que contenga los archivos `predicctions_clf.txt` para la clasificación y `predicctions_rgr.clf` para la regresión. Los resultados, como se comento antes, deberan ser obtenidos en base al dataset `test.pickle` y en cada una de las lineas deberan presentar las predicciones realizadas.

Ejemplos de archivos:

- [ ] `predicctions_clf.txt`

        Mostly Positive
        Mostly Positive
        Negative
        Positive
        Negative
        Positive
        ...

- [ ] `predicctions_rgr.txt`

        16103.58
        16103.58
        16041.89
        9328.62
        107976.03
        194374.08
        ...



In [349]:
from zipfile import ZipFile
import os

def generateFiles(predict_data, clf_pipe, rgr_pipe):
    """Genera los archivos a subir en CodaLab

    Input
    predict_data: Dataframe con los datos de entrada a predecir
    clf_pipe: pipeline del clf
    rgr_pipe: pipeline del rgr

    Ouput
    archivo de txt
    """
    y_pred_clf = clf_pipe.predict(predict_data)
    y_pred_rgr = rgr_pipe.predict(predict_data)
    
    with open('./predictions_clf.txt', 'w') as f:
        for item in y_pred_clf:
            f.write("%s\n" % item)

    with open('./predictions_rgr.txt', 'w') as f:
        for item in y_pred_rgr:
            f.write("%s\n" % item)

    with ZipFile('predictions.zip', 'w') as zipObj2:
       zipObj2.write('predictions_rgr.txt')
       zipObj2.write('predictions_clf.txt')

    os.remove("predictions_rgr.txt")
    os.remove("predictions_clf.txt")

In [350]:
generateFiles(df_test, search_clf, search_reg)