## Kaggle – DataTops®
Luismi ha decidido cambiar de aires y, por eso, ha comprado una tienda de portátiles. Sin embargo, su única especialidad es Data Science, por lo que ha decidido crear un modelo de ML para establecer los mejores precios.

¿Podrías ayudar a Luismi a mejorar ese modelo?

## Métrica: 
Error de raíz cuadrada media (RMSE) es la desviación estándar de los valores residuales (errores de predicción). Los valores residuales son una medida de la distancia de los puntos de datos de la línea de regresión; RMSE es una medida de cuál es el nivel de dispersión de estos valores residuales. En otras palabras, le indica el nivel de concentración de los datos en la línea de mejor ajuste.


$$ RMSE = \sqrt{\frac{1}{n}\Sigma_{i=1}^{n}{\Big(\frac{d_i -f_i}{\sigma_i}\Big)^2}}$$


## Librerías

In [26]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from lightgbm import LGBMRegressor
from xgboost import XGBRegressor
import urllib.request
from sklearn.preprocessing import LabelEncoder, StandardScaler
from catboost import CatBoostRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.feature_selection import RFE
import pandas as pd




## Datos

In [27]:
# Para que funcione necesitas bajarte los archivos de datos de Kaggle 
df_train = pd.read_excel("df_joint.xlsx", index_col=0)
df_train.index.name = None
df_train

Unnamed: 0,Company,Product,TypeName,Ram,CPU_Brand,CPU_Speed,CPU_Model_number,Memory_Capacity,Memory_Type,GPU_Brand,...,OpSys_type,OpSys_number,Pixel_Count,DPI,Weight_per_Inch,CPU_Ratio,Ram_per_CPU,Pixel_Density,Storage_Efficiency,Price_in_euros
755,HP,G6,Notebook,8,Intel,2.0,1,256,SSD,Intel,...,Windows,10,2073600,8520.710059,0.119231,Intel_8,3.809524,8520.710059,130.612245,539.00
618,Dell,Inspiron,Gaming,16,Intel,2.6,3,1000,HDD,Nvidia,...,Windows,10,2073600,8520.710059,0.166026,Intel_16,5.925926,8520.710059,371.747212,879.01
909,HP,ProBook,Notebook,8,Intel,2.7,3,1000,HDD,Nvidia,...,Windows,10,2073600,8520.710059,0.130769,Intel_8,2.857143,8520.710059,467.289720,900.00
2,Apple,Macbook,Ultrabook,8,Intel,1.8,2,128,Storage,Intel,...,Mac,0,1296000,7326.587145,0.100752,Intel_8,4.210526,7326.587145,88.888889,898.94
286,Dell,Inspiron,Notebook,4,Intel,2.0,1,1000,HDD,AMD,...,Linux,0,2073600,8520.710059,0.144231,Intel_4,1.904762,8520.710059,425.531915,428.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
820,MSI,7RG,Gaming,16,Intel,2.8,3,512,HDD,Nvidia,...,Windows,10,2073600,6928.397207,0.167630,Intel_16,5.517241,6928.397207,170.666667,2079.20
948,Toshiba,Tecra,Notebook,4,Intel,2.3,2,128,SSD,Intel,...,Windows,10,2073600,10579.591837,0.105000,Intel_4,1.666667,10579.591837,81.528662,965.77
483,Dell,M5520,Workstation,8,Intel,2.8,3,256,SSD,Nvidia,...,Windows,10,2073600,8520.710059,0.114103,Intel_8,2.758621,8520.710059,136.170213,1538.63
1017,HP,Probook,Notebook,4,Intel,2.5,2,500,HDD,Intel,...,Windows,10,1049088,5352.489796,0.117143,Intel_4,1.538462,5352.489796,287.356322,900.67


## Exploración de los datos

In [28]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1303 entries, 755 to 421
Data columns (total 22 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Company             1303 non-null   object 
 1   Product             1303 non-null   object 
 2   TypeName            1303 non-null   object 
 3   Ram                 1303 non-null   int64  
 4   CPU_Brand           1303 non-null   object 
 5   CPU_Speed           1303 non-null   float64
 6   CPU_Model_number    1303 non-null   int64  
 7   Memory_Capacity     1303 non-null   int64  
 8   Memory_Type         1303 non-null   object 
 9   GPU_Brand           1303 non-null   object 
 10  GPU_Type            1303 non-null   object 
 11  GPU_Model           1303 non-null   object 
 12  OpSys_type          1303 non-null   object 
 13  OpSys_number        1303 non-null   int64  
 14  Pixel_Count         1303 non-null   int64  
 15  DPI                 1303 non-null   float64
 16  Weight_per

## Procesado de datos

Nuestro target es la columna `Price_in_euros`

-----------------------------------------------------------------------------------------------------------------

In [29]:
df_test = pd.read_csv("./data/test.csv", index_col=0)
df_test.index.name = None
df_test

Unnamed: 0,Company,Product,TypeName,Inches,ScreenResolution,Cpu,Ram,Memory,Gpu,OpSys,Weight
209,Lenovo,Legion Y520-15IKBN,Gaming,15.6,Full HD 1920x1080,Intel Core i7 7700HQ 2.8GHz,16GB,512GB SSD,Nvidia GeForce GTX 1060,No OS,2.4kg
1281,Acer,Aspire ES1-531,Notebook,15.6,1366x768,Intel Celeron Dual Core N3060 1.6GHz,4GB,500GB HDD,Intel HD Graphics 400,Linux,2.4kg
1168,Lenovo,V110-15ISK (i3-6006U/4GB/1TB/No,Notebook,15.6,1366x768,Intel Core i3 6006U 2.0GHz,4GB,1TB HDD,Intel HD Graphics 520,No OS,1.9kg
1231,Dell,Inspiron 7579,2 in 1 Convertible,15.6,IPS Panel Full HD / Touchscreen 1920x1080,Intel Core i5 7200U 2.5GHz,8GB,256GB SSD,Intel HD Graphics 620,Windows 10,2.191kg
1020,HP,ProBook 640,Notebook,14.0,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,4GB,256GB SSD,Intel HD Graphics 620,Windows 10,1.95kg
...,...,...,...,...,...,...,...,...,...,...,...
820,MSI,GE72MVR 7RG,Gaming,17.3,Full HD 1920x1080,Intel Core i7 7700HQ 2.8GHz,16GB,512GB SSD + 1TB HDD,Nvidia GeForce GTX 1070,Windows 10,2.9kg
948,Toshiba,Tecra Z40-C-12X,Notebook,14.0,IPS Panel Full HD 1920x1080,Intel Core i5 6200U 2.3GHz,4GB,128GB SSD,Intel HD Graphics 520,Windows 10,1.47kg
483,Dell,Precision M5520,Workstation,15.6,Full HD 1920x1080,Intel Core i7 7700HQ 2.8GHz,8GB,256GB SSD,Nvidia Quadro M1200,Windows 10,1.78kg
1017,HP,Probook 440,Notebook,14.0,1366x768,Intel Core i5 7200U 2.5GHz,4GB,500GB HDD,Intel HD Graphics 620,Windows 10,1.64kg


 ## 2. Replicar el procesado para ``test.csv``

In [30]:
import pandas as pd
import re

# Copia del dataframe original para transformaciones
df_test_processed = df_test.copy()

# 1) Mantener Company sin transformar (CatBoost maneja categóricas)

# 2) Transformar Product (primera palabra o última si es alfanumérico)
def extract_product_keyword(product):
    words = product.split()
    if words[-1].isalnum() and any(c.isalpha() for c in words[-1]) and any(c.isdigit() for c in words[-1]):
        return words[-1]  # Si es alfanumérico, tomar la última palabra
    return words[0]  # En otro caso, tomar la primera palabra

df_test_processed["Product"] = df_test_processed["Product"].apply(extract_product_keyword)

# 3) Mantener TypeName sin transformar

# 4) Transformar ScreenResolution tomando solo el último código alfanumérico
def extract_screen_res(resolution):
    matches = re.findall(r'\d+x\d+', resolution)
    return matches[-1] if matches else "Unknown"

df_test_processed["ScreenResolution"] = df_test_processed["ScreenResolution"].apply(extract_screen_res)

# 5) Dividir CPU en tres columnas
def extract_cpu_parts(cpu):
    parts = cpu.split()
    first_word = parts[0] if len(parts) > 0 else "Unknown"
    third_word = parts[2] if len(parts) > 2 else "Unknown"
    last_word = parts[-1] if len(parts) > 0 else "Unknown"
    return first_word, third_word, last_word

df_test_processed[["CPU_Brand", "CPU_Model", "CPU_Speed"]] = df_test_processed["Cpu"].apply(lambda x: pd.Series(extract_cpu_parts(x)))

# Transformar CPU_Model en una variable numérica con sentido
def categorize_cpu(cpu):
    if "i7" in cpu:
        return 3
    elif "i5" in cpu:
        return 2
    elif "i3" in cpu:
        return 1
    else:
        return 2

df_test_processed["CPU_Model_number"] = df_test_processed["CPU_Model"].apply(categorize_cpu)

# 6) Dividir Memory en dos columnas
def extract_memory_parts(memory):
    parts = memory.split()
    first_word = parts[0] if len(parts) > 0 else "Unknown"
    last_word = parts[-1] if len(parts) > 1 else "Unknown"
    return first_word, last_word

df_test_processed[["Memory_Capacity", "Memory_Type"]] = df_test_processed["Memory"].apply(lambda x: pd.Series(extract_memory_parts(x)))

# 7) Dividir Gpu en tres columnas
def extract_gpu_parts(gpu):
    parts = gpu.split()
    first_word = parts[0] if len(parts) > 0 else "Unknown"
    second_word = parts[1] if len(parts) > 1 else "Unknown"
    last_word = parts[-1] if len(parts) > 2 else "Unknown"
    return first_word, second_word, last_word

df_test_processed[["GPU_Brand", "GPU_Type", "GPU_Model"]] = df_test_processed["Gpu"].apply(lambda x: pd.Series(extract_gpu_parts(x)))

# 8) Reducir OpSys a seis categorías
def categorize_os(os):
    os = os.lower()
    if "windows" in os:
        return "Windows"
    elif "linux" in os:
        return "Linux"
    elif "mac" in os or "macos" in os:
        return "Mac"
    elif "android" in os:
        return "Android"
    elif "chrome" in os or "chromebook" in os:
        return "Chrome"
    else:
        return "Others"

df_test_processed["OpSys_type"] = df_test_processed["OpSys"].apply(categorize_os)
# Extraer el número del sistema operativo si existe
df_test_processed["OpSys_number"] = df_test_processed["OpSys"].str.extract(r'(\d+)')

# Convertir a número (NaN se llenará con 0 o algún otro valor si es necesario)
df_test_processed["OpSys_number"] = df_test_processed["OpSys_number"].fillna(0).astype(int)

# Limpiar y convertir columnas numéricas

# 1) Convertir Ram a numérico (eliminando "GB")
df_test_processed["Ram"] = df_test_processed["Ram"].str.replace("GB", "", regex=True).astype(int)

# 2) Convertir Weight a numérico (eliminando "kg")
df_test_processed["Weight"] = df_test_processed["Weight"].str.replace("kg", "", regex=True).astype(float)

# Rellenar valores faltantes en variables categóricas con la moda
for col in df_test_processed.select_dtypes(include=['object']).columns:
    df_test_processed[col].fillna(df_test_processed[col].mode()[0], inplace=True)

# Convertir columnas necesarias a numéricas
df_test_processed["CPU_Speed"] = df_test_processed["CPU_Speed"].str[:-3].astype(float)
df_test_processed["Memory_Capacity"] = df_test_processed["Memory_Capacity"].str.replace('TB', '000').str.replace('GB', '').astype(float)

# 3) Crear nuevas características
# Calcular el número total de píxeles
df_test_processed["Pixel_Count"] = df_test_processed["ScreenResolution"].apply(lambda x: int(x.split('x')[0]) * int(x.split('x')[1]) if 'x' in x else 0)

# Densidad de píxeles por pulgada (DPI)
df_test_processed["DPI"] = df_test_processed["Pixel_Count"] / (df_test_processed["Inches"] ** 2)

# Relación entre peso y tamaño de pantalla
df_test_processed["Weight_per_Inch"] = df_test_processed["Weight"] / df_test_processed["Inches"]

# Relación CPU y RAM
df_test_processed["CPU_Ratio"] = df_test_processed["CPU_Brand"].astype(str) + "_" + df_test_processed["Ram"].astype(str)

# Relación entre RAM y CPU
df_test_processed["Ram_per_CPU"] = df_test_processed["Ram"] / (df_test_processed["CPU_Speed"] + 0.1)

# Tamaño de pantalla en relación con píxeles
df_test_processed["Pixel_Density"] = df_test_processed["Pixel_Count"] / (df_test_processed["Inches"] ** 2)

# Eficiencia del almacenamiento (peso vs capacidad)
df_test_processed["Storage_Efficiency"] = df_test_processed["Memory_Capacity"] / (df_test_processed["Weight"] + 0.1)

# Mantener las columnas categóricas como strings para CatBoost
df_test_processed["Company"] = df_test_processed["Company"].astype(str)
df_test_processed["Product"] = df_test_processed["Product"].astype(str)
df_test_processed["TypeName"] = df_test_processed["TypeName"].astype(str)
df_test_processed["ScreenResolution"] = df_test_processed["ScreenResolution"].astype(str)
df_test_processed["CPU_Brand"] = df_test_processed["CPU_Brand"].astype(str)
df_test_processed["CPU_Model"] = df_test_processed["CPU_Model"].astype(str)
df_test_processed["Memory_Type"] = df_test_processed["Memory_Type"].astype(str)
df_test_processed["GPU_Brand"] = df_test_processed["GPU_Brand"].astype(str)
df_test_processed["GPU_Model"] = df_test_processed["GPU_Model"].astype(str)
df_test_processed["OpSys"] = df_test_processed["OpSys"].astype(str)

# Eliminar columnas originales que ya fueron transformadas
df_test_processed.drop(columns=["OpSys", "Cpu", "CPU_Model", "Memory", "Gpu", "ScreenResolution", "Weight", "Inches"], inplace=True)

# Renombrar el dataset final
df_test_processed = df_test_processed

df_test_processed

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_test_processed[col].fillna(df_test_processed[col].mode()[0], inplace=True)


Unnamed: 0,Company,Product,TypeName,Ram,CPU_Brand,CPU_Speed,CPU_Model_number,Memory_Capacity,Memory_Type,GPU_Brand,...,GPU_Model,OpSys_type,OpSys_number,Pixel_Count,DPI,Weight_per_Inch,CPU_Ratio,Ram_per_CPU,Pixel_Density,Storage_Efficiency
209,Lenovo,Legion,Gaming,16,Intel,2.8,3,512.0,SSD,Nvidia,...,1060,Others,0,2073600,8520.710059,0.153846,Intel_16,5.517241,8520.710059,204.800000
1281,Acer,Aspire,Notebook,4,Intel,1.6,2,500.0,HDD,Intel,...,400,Linux,0,1049088,4310.848126,0.153846,Intel_4,2.352941,4310.848126,200.000000
1168,Lenovo,V110-15ISK,Notebook,4,Intel,2.0,1,1000.0,HDD,Intel,...,520,Others,0,1049088,4310.848126,0.121795,Intel_4,1.904762,4310.848126,500.000000
1231,Dell,Inspiron,2 in 1 Convertible,8,Intel,2.5,2,256.0,SSD,Intel,...,620,Windows,10,2073600,8520.710059,0.140449,Intel_8,3.076923,8520.710059,111.741598
1020,HP,ProBook,Notebook,4,Intel,2.5,2,256.0,SSD,Intel,...,620,Windows,10,2073600,10579.591837,0.139286,Intel_4,1.538462,10579.591837,124.878049
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
820,MSI,7RG,Gaming,16,Intel,2.8,3,512.0,HDD,Nvidia,...,1070,Windows,10,2073600,6928.397207,0.167630,Intel_16,5.517241,6928.397207,170.666667
948,Toshiba,Tecra,Notebook,4,Intel,2.3,2,128.0,SSD,Intel,...,520,Windows,10,2073600,10579.591837,0.105000,Intel_4,1.666667,10579.591837,81.528662
483,Dell,M5520,Workstation,8,Intel,2.8,3,256.0,SSD,Nvidia,...,M1200,Windows,10,2073600,8520.710059,0.114103,Intel_8,2.758621,8520.710059,136.170213
1017,HP,Probook,Notebook,4,Intel,2.5,2,500.0,HDD,Intel,...,620,Windows,10,1049088,5352.489796,0.117143,Intel_4,1.538462,5352.489796,287.356322


In [31]:
df_test_processed.info()
df_test_processed.to_excel("df_test_processed.xlsx")

<class 'pandas.core.frame.DataFrame'>
Index: 391 entries, 209 to 421
Data columns (total 21 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Company             391 non-null    object 
 1   Product             391 non-null    object 
 2   TypeName            391 non-null    object 
 3   Ram                 391 non-null    int32  
 4   CPU_Brand           391 non-null    object 
 5   CPU_Speed           391 non-null    float64
 6   CPU_Model_number    391 non-null    int64  
 7   Memory_Capacity     391 non-null    float64
 8   Memory_Type         391 non-null    object 
 9   GPU_Brand           391 non-null    object 
 10  GPU_Type            391 non-null    object 
 11  GPU_Model           391 non-null    object 
 12  OpSys_type          391 non-null    object 
 13  OpSys_number        391 non-null    int32  
 14  Pixel_Count         391 non-null    int64  
 15  DPI                 391 non-null    float64
 16  Weight_per_

## Modelado

In [32]:

# Aplicar Label Encoding a todas las columnas categóricas
for col in df_train.select_dtypes(include=["object"]).columns:
    le = LabelEncoder()
    df_train[col] = le.fit_transform(df_train[col])

# Seleccionar solo columnas numéricas excluyendo "Price_in_euros"
numerical_cols = df_train.select_dtypes(include=["number"]).columns.tolist()
numerical_cols.remove("Price_in_euros")  # Excluir la variable objetivo

# Aplicar normalización con StandardScaler
scaler = StandardScaler()
df_train[numerical_cols] = scaler.fit_transform(df_train[numerical_cols])
df_test_processed[numerical_cols] = scaler.transform(df_test_processed[numerical_cols])  # Usar los mismos parámetros de train

print("Datos numéricos normalizados correctamente.")

# Separar variables predictoras y objetivo
X = df_train.drop(columns=["Price_in_euros"])  # Excluir variable objetivo
y = df_train["Price_in_euros"]

# Dividir en conjunto de entrenamiento y validación
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Identificar columnas categóricas (CatBoost maneja estas columnas directamente)
categorical_features = X.select_dtypes(include=["object"]).columns.tolist()

# Modelos a probar
models = {
    "CatBoost": CatBoostRegressor(loss_function="RMSE", verbose=0, random_state=42),
}

# Optimización de hiperparámetros
param_grids = {
    "CatBoost": {"depth": [6, 8, 10], "learning_rate": [0.01, 0.05, 0.1], "iterations": [500, 1000, 1500]}
}

best_models = {}

for name, model in models.items():
    if name in param_grids:  # Optimización solo para modelos con hiperparámetros
        search = RandomizedSearchCV(model, param_grids[name], n_iter=15, cv=3, scoring="neg_root_mean_squared_error", verbose=1, n_jobs=-1)
        search.fit(X_train, y_train)
        best_models[name] = search.best_estimator_
        print(f"{name} mejores hiperparámetros: {search.best_params_}")
    else:  # Para regresión lineal, entrenar directamente
        model.fit(X_train, y_train)
        best_models[name] = model

# Evaluar modelos optimizados
for name, model in best_models.items():
    y_pred = model.predict(X_val)
    rmse = np.sqrt(mean_squared_error(y_val, y_pred))
    print(f"{name} RMSE: {rmse}")

# Seleccionar el mejor modelo basado en RMSE
best_model_name = min(best_models, key=lambda k: np.sqrt(mean_squared_error(y_val, best_models[k].predict(X_val))))
best_model = best_models[best_model_name]
print(f"Mejor modelo seleccionado: {best_model_name}")

# Aplicar Label Encoding usando los mismos encoders de train
for col in df_test.select_dtypes(include=["object"]).columns:
    if col in df_train.columns:  # Asegurar que la columna existe en train
        le = LabelEncoder()
        le.fit(df_train[col])  # Usar las categorías aprendidas en train
        df_test[col] = df_test[col].map(lambda x: le.transform([x])[0] if x in le.classes_ else -1)


# Generar predicciones para Kaggle
X_test_filtered = df_test_processed  # Usar test filtrado con las mismas features

y_test_pred = best_model.predict(X_test_filtered)

# Guardar las predicciones en un archivo CSV para Kaggle
df_submission = pd.DataFrame({"laptop_ID": df_test.index, "Price_in_euros": y_test_pred})
df_submission.to_csv("submission_best_model.csv", index=False)

print(f"Predicciones guardadas en 'submission_best_model.csv' usando {best_model_name}.")

ValueError: could not convert string to float: 'Lenovo'

### 4. Sacar métricas, valorar los modelos 

Recuerda que en la competición se va a evaluar con la métrica de ``RMSE``.

------------------------------------------------------------

## Optimizacion

-----------------------------------------------------------------

## Una vez listo el modelo, toca predecir ``test.csv``

**RECUERDA: APLICAR LAS TRANSFORMACIONES QUE HAYAS REALIZADO EN `train.csv` a `test.csv`.**


Véase:
- Estandarización/Normalización
- Eliminación de Outliers
- Eliminación de columnas
- Creación de columnas nuevas
- Gestión de valores nulos
- Y un largo etcétera de técnicas que como Data Scientist hayas considerado las mejores para tu dataset.

### 1. Carga los datos de `test.csv` para predecir.


### 3. **¿Qué es lo que subirás a Kaggle?**

**Para subir a Kaggle la predicción esta tendrá que tener una forma específica.**

En este caso, la **MISMA** forma que `sample_submission.csv`. 

In [None]:
sample = pd.read_csv("./data/sample_submission.csv")
sample

Unnamed: 0,laptop_ID,Price_in_euros
0,209,1949.1
1,1281,805.0
2,1168,1101.0
3,1231,1293.8
4,1020,1832.6
...,...,...
386,820,474.3
387,948,1468.8
388,483,520.4
389,1017,515.1


### 4. Mete tus predicciones en un dataframe llamado ``submission``.

### 5. Pásale el CHEQUEADOR para comprobar que efectivamente está listo para subir a Kaggle.

In [None]:
def chequeador(df_to_submit):
    """
    Esta función se asegura de que tu submission tenga la forma requerida por Kaggle.
    
    Si es así, se guardará el dataframe en un `csv` y estará listo para subir a Kaggle.
    
    Si no, LEE EL MENSAJE Y HAZLE CASO.
    
    Si aún no:
    - apaga tu ordenador, 
    - date una vuelta, 
    - enciendelo otra vez, 
    - abre este notebook y 
    - leelo todo de nuevo. 
    Todos nos merecemos una segunda oportunidad. También tú.
    """
    if df_to_submit.shape == sample.shape:
        if df_to_submit.columns.all() == sample.columns.all():
            if df_to_submit.laptop_ID.all() == sample.laptop_ID.all():
                print("You're ready to submit!")
                submission.to_csv("submission.csv", index = False) #muy importante el index = False
                urllib.request.urlretrieve("https://www.mihaileric.com/static/evaluation-meme-e0a350f278a36346e6d46b139b1d0da0-ed51e.jpg", "gfg.png")     
                img = Image.open("gfg.png")
                img.show()   
            else:
                print("Check the ids and try again")
        else:
            print("Check the names of the columns and try again")
    else:
        print("Check the number of rows and/or columns and try again")
        print("\nMensaje secreto del TA: No me puedo creer que después de todo este notebook hayas hecho algún cambio en las filas de `test.csv`. Lloro.")