# Laptops market 💻
# Precio Portátiles 💻
## Veamos cómo de buenos somos prediciendo el valor de los productos.

### Overview
Nuestro jefe estaba buscando un@s maquinas para obtener datos de la competencia y poder aplicarle los conocimientos obtenidos para asignar precios a nuestra tienda de "MERIMARKT".
Lamentablemente se habían ido de vacaciones y nos lo ha pedido a nosotr@s …
💥🪓🔪

Nos toca arremangarnos las mangas y aplicar los conocimientos obtenidos en ML para obtener un modelo de predicción de precios de portátiles en función de sus marcas y prestaciones para poder lanzarlos a un precio competitivo al mercado.

### Evaluation
En esta tarea, utilizaremos el error absoluto medio (MAE) para evaluar la eficacia del modelo.

---
---

In [977]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, mean_absolute_percentage_error, root_mean_squared_error


#### DATASET

In [978]:
data = pd.read_csv('./data/train.csv')

In [979]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   ID                912 non-null    int64  
 1   Company           912 non-null    object 
 2   Product           912 non-null    object 
 3   TypeName          912 non-null    object 
 4   Inches            912 non-null    float64
 5   ScreenResolution  912 non-null    object 
 6   Cpu               912 non-null    object 
 7   Ram               912 non-null    object 
 8   Memory            912 non-null    object 
 9   Gpu               912 non-null    object 
 10  OpSys             912 non-null    object 
 11  Weight            912 non-null    object 
 12  Price_euros       912 non-null    float64
dtypes: float64(2), int64(1), object(10)
memory usage: 92.8+ KB


In [980]:
data.describe()

Unnamed: 0,ID,Inches,Price_euros
count,912.0,912.0,912.0
mean,666.192982,15.011404,1108.122873
std,384.873846,1.411744,714.597741
min,1.0,10.1,174.0
25%,330.5,14.0,589.0
50%,673.5,15.6,949.0
75%,998.5,15.6,1458.5
max,1320.0,18.4,6099.0


In [981]:
data.describe(include='all')

Unnamed: 0,ID,Company,Product,TypeName,Inches,ScreenResolution,Cpu,Ram,Memory,Gpu,OpSys,Weight,Price_euros
count,912.0,912,912,912,912.0,912,912,912,912,912,912,912,912.0
unique,,19,475,6,,35,104,8,36,91,9,158,
top,,Lenovo,Inspiron 3567,Notebook,,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8GB,256GB SSD,Intel HD Graphics 620,Windows 10,2.2kg,
freq,,208,21,522,,346,142,434,293,199,746,86,
mean,666.192982,,,,15.011404,,,,,,,,1108.122873
std,384.873846,,,,1.411744,,,,,,,,714.597741
min,1.0,,,,10.1,,,,,,,,174.0
25%,330.5,,,,14.0,,,,,,,,589.0
50%,673.5,,,,15.6,,,,,,,,949.0
75%,998.5,,,,15.6,,,,,,,,1458.5


In [982]:
data.head()

Unnamed: 0,ID,Company,Product,TypeName,Inches,ScreenResolution,Cpu,Ram,Memory,Gpu,OpSys,Weight,Price_euros
0,1002,Dell,Inspiron 5567,Notebook,15.6,1366x768,Intel Core i7 7500U 2.7GHz,8GB,1TB HDD,AMD Radeon R7 M445,Windows 10,2.36kg,749.0
1,867,Asus,X541NA (N4200/4GB/1TB/W10),Notebook,15.6,1366x768,Intel Pentium Quad Core N4200 1.1GHz,4GB,1TB HDD,Intel HD Graphics 505,Windows 10,2kg,449.0
2,966,Toshiba,Portege Z30-C-1CW,Notebook,13.3,Full HD 1920x1080,Intel Core i5 6200U 2.3GHz,8GB,256GB SSD,Intel HD Graphics 520,Windows 7,1.2kg,1460.0
3,767,Dell,Alienware 17,Gaming,15.6,IPS Panel 4K Ultra HD 3840x2160,Intel Core i7 7700HQ 2.8GHz,16GB,256GB SSD + 1TB HDD,Nvidia GeForce GTX 1070,Windows 10,4.42kg,2868.99
4,1241,Dell,Latitude E7270,Ultrabook,12.5,Full HD / Touchscreen 1920x1080,Intel Core i5 6300U 2.4GHz,8GB,256GB SSD,Intel HD Graphics 520,Windows 7,1.26kg,1713.37


#### DATA CLEANING

In [983]:
data = data.rename(columns=str.lower)

In [984]:

data = data.rename(columns={'price_euros' : 'price'})

In [985]:
data = data.drop('id', axis=1)

In [986]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   company           912 non-null    object 
 1   product           912 non-null    object 
 2   typename          912 non-null    object 
 3   inches            912 non-null    float64
 4   screenresolution  912 non-null    object 
 5   cpu               912 non-null    object 
 6   ram               912 non-null    object 
 7   memory            912 non-null    object 
 8   gpu               912 non-null    object 
 9   opsys             912 non-null    object 
 10  weight            912 non-null    object 
 11  price             912 non-null    float64
dtypes: float64(2), object(10)
memory usage: 85.6+ KB


In [987]:
# pd.get_dummies(data['company'], dtype=int)

In [988]:
data_object = 'company'
data = pd.concat([data, pd.get_dummies(data[data_object], prefix=data_object, dtype=int)], axis=1)
data.drop(columns=[data_object], inplace=True)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 30 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   product            912 non-null    object 
 1   typename           912 non-null    object 
 2   inches             912 non-null    float64
 3   screenresolution   912 non-null    object 
 4   cpu                912 non-null    object 
 5   ram                912 non-null    object 
 6   memory             912 non-null    object 
 7   gpu                912 non-null    object 
 8   opsys              912 non-null    object 
 9   weight             912 non-null    object 
 10  price              912 non-null    float64
 11  company_Acer       912 non-null    int64  
 12  company_Apple      912 non-null    int64  
 13  company_Asus       912 non-null    int64  
 14  company_Chuwi      912 non-null    int64  
 15  company_Dell       912 non-null    int64  
 16  company_Fujitsu    912 non

In [989]:
data_object = 'product'
data.drop(columns=[data_object], inplace=True)
data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 29 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   typename           912 non-null    object 
 1   inches             912 non-null    float64
 2   screenresolution   912 non-null    object 
 3   cpu                912 non-null    object 
 4   ram                912 non-null    object 
 5   memory             912 non-null    object 
 6   gpu                912 non-null    object 
 7   opsys              912 non-null    object 
 8   weight             912 non-null    object 
 9   price              912 non-null    float64
 10  company_Acer       912 non-null    int64  
 11  company_Apple      912 non-null    int64  
 12  company_Asus       912 non-null    int64  
 13  company_Chuwi      912 non-null    int64  
 14  company_Dell       912 non-null    int64  
 15  company_Fujitsu    912 non-null    int64  
 16  company_Google     912 non

In [990]:
data_object = 'typename'
data = pd.concat([data, pd.get_dummies(data[data_object], prefix=data_object, dtype=int)], axis=1)
data.drop(columns=[data_object], inplace=True)
data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 34 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   screenresolution             912 non-null    object 
 2   cpu                          912 non-null    object 
 3   ram                          912 non-null    object 
 4   memory                       912 non-null    object 
 5   gpu                          912 non-null    object 
 6   opsys                        912 non-null    object 
 7   weight                       912 non-null    object 
 8   price                        912 non-null    float64
 9   company_Acer                 912 non-null    int64  
 10  company_Apple                912 non-null    int64  
 11  company_Asus                 912 non-null    int64  
 12  company_Chuwi                912 non-null    int64  
 13  company_Dell        

In [991]:
def process_screen_resolution(resolution):
    result = {
        'screen_4K': 0,  
        'screen_HD': 0,
        'screen_Touchscreen': 0,  
        'screen_Retina': 0,
        'screen_Ultra': 0,
        'screen_width': None,  
        'screen_height': None
    }

    if '4K' in resolution:
        result['screen_4K'] = 1  
    if 'Full HD' in resolution or 'HD' in resolution:
        result['screen_HD'] = 1
    if 'Retina' in resolution:
        result['screen_Retina'] = 1
    
    match = re.search(r'(\d{3,4})x(\d{3,4})', resolution)
    if match:
        width, height = match.groups()
        result['screen_width'] = int(width)
        result['screen_height'] = int(height)

    return result

data_screen = data['screenresolution'].apply(process_screen_resolution)

data_screen = pd.DataFrame(data_screen.tolist())

data = pd.concat([data, data_screen], axis=1)

data.drop(columns=['screenresolution'], inplace=True)

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 40 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   cpu                          912 non-null    object 
 2   ram                          912 non-null    object 
 3   memory                       912 non-null    object 
 4   gpu                          912 non-null    object 
 5   opsys                        912 non-null    object 
 6   weight                       912 non-null    object 
 7   price                        912 non-null    float64
 8   company_Acer                 912 non-null    int64  
 9   company_Apple                912 non-null    int64  
 10  company_Asus                 912 non-null    int64  
 11  company_Chuwi                912 non-null    int64  
 12  company_Dell                 912 non-null    int64  
 13  company_Fujitsu     

In [992]:
def extract_cpu_info(cpu_string):
    cpu_intel = 0
    cpu_amd = 0
    cpu_ghz = None

    if 'Intel' in cpu_string:
        cpu_intel = 1
        match = re.search(r'(\d+\.?\d*)GHz', cpu_string)
        if match:
            cpu_ghz = float(match.group(1))
    
    if 'AMD' in cpu_string:
        cpu_amd = 1
        match = re.search(r'(\d+\.?\d*)GHz', cpu_string)
        if match:
            cpu_ghz = float(match.group(1))

    return pd.Series([cpu_intel, cpu_amd, cpu_ghz], index=['cpu_intel', 'cpu_amd', 'cpu_ghz'])

data[['cpu_intel', 'cpu_amd', 'cpu_ghz']] = data['cpu'].apply(extract_cpu_info)

data['cpu_intel'] = data['cpu_intel'].astype(int)
data['cpu_amd'] = data['cpu_amd'].astype(int)

data.drop(columns=['cpu'], inplace=True)

data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 42 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   ram                          912 non-null    object 
 2   memory                       912 non-null    object 
 3   gpu                          912 non-null    object 
 4   opsys                        912 non-null    object 
 5   weight                       912 non-null    object 
 6   price                        912 non-null    float64
 7   company_Acer                 912 non-null    int64  
 8   company_Apple                912 non-null    int64  
 9   company_Asus                 912 non-null    int64  
 10  company_Chuwi                912 non-null    int64  
 11  company_Dell                 912 non-null    int64  
 12  company_Fujitsu              912 non-null    int64  
 13  company_Google      

In [993]:
data.describe()


Unnamed: 0,inches,price,company_Acer,company_Apple,company_Asus,company_Chuwi,company_Dell,company_Fujitsu,company_Google,company_HP,...,screen_4K,screen_HD,screen_Touchscreen,screen_Retina,screen_Ultra,screen_width,screen_height,cpu_intel,cpu_amd,cpu_ghz
count,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,...,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0
mean,15.011404,1108.122873,0.082237,0.01864,0.118421,0.002193,0.226974,0.003289,0.002193,0.207237,...,0.037281,0.691886,0.0,0.015351,0.0,1890.050439,1067.899123,0.95614,0.04386,2.287895
std,1.411744,714.597741,0.274876,0.135325,0.323283,0.046804,0.419105,0.057291,0.046804,0.405549,...,0.189553,0.461967,0.0,0.123011,0.0,506.007321,290.338307,0.204895,0.204895,0.513277
min,10.1,174.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1366.0,768.0,0.0,0.0,0.9
25%,14.0,589.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1366.0,768.0,1.0,0.0,2.0
50%,15.6,949.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1920.0,1080.0,1.0,0.0,2.5
75%,15.6,1458.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1920.0,1080.0,1.0,0.0,2.7
max,18.4,6099.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,0.0,1.0,0.0,3840.0,2160.0,1.0,1.0,3.6


In [994]:
def extract_ram(ram_string):
    match = re.search(r'(\d+)GB', ram_string)
    if match:
        return int(match.group(1))
    return None

data['ram_numeric'] = data['ram'].apply(extract_ram)

data.drop(columns=['ram'], inplace=True)

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 42 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   memory                       912 non-null    object 
 2   gpu                          912 non-null    object 
 3   opsys                        912 non-null    object 
 4   weight                       912 non-null    object 
 5   price                        912 non-null    float64
 6   company_Acer                 912 non-null    int64  
 7   company_Apple                912 non-null    int64  
 8   company_Asus                 912 non-null    int64  
 9   company_Chuwi                912 non-null    int64  
 10  company_Dell                 912 non-null    int64  
 11  company_Fujitsu              912 non-null    int64  
 12  company_Google               912 non-null    int64  
 13  company_HP          

In [995]:
def extract_memory_info(memory_string):
    memory_types = {
        'memory_HDD': 0,
        'memory_SSD': 0,
        'memory_Flash': 0
    }
    memory_sizes = {    
        'memory_HDD_GB': 0,
        'memory_SSD_GB': 0,
        'memory_Flash_GB': 0
    }
    
    matches = re.findall(r'(\d+)([A-Za-z ]+)', memory_string)
    
    for match in matches:
        size = int(match[0])
        unit = match[1].strip().upper()
        
        if 'TB' in unit:
            size *= 1000  
        
        if 'HDD' in unit:
            memory_types['memory_HDD'] = 1
            memory_sizes['memory_HDD_GB'] += size
        elif 'SSD' in unit:
            memory_types['memory_SSD'] = 1
            memory_sizes['memory_SSD_GB'] += size
        elif 'FLASH STORAGE' in unit:
            memory_types['memory_Flash'] = 1
            memory_sizes['memory_Flash_GB'] += size
    
    result = {**memory_types, **memory_sizes}
    return pd.Series(result)

data[['memory_HDD', 'memory_SSD', 'memory_Flash', 'memory_HDD_GB', 'memory_SSD_GB', 'memory_Flash_GB']] = data['memory'].apply(extract_memory_info)

data.drop(columns=['memory'], inplace=True)

data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 47 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   gpu                          912 non-null    object 
 2   opsys                        912 non-null    object 
 3   weight                       912 non-null    object 
 4   price                        912 non-null    float64
 5   company_Acer                 912 non-null    int64  
 6   company_Apple                912 non-null    int64  
 7   company_Asus                 912 non-null    int64  
 8   company_Chuwi                912 non-null    int64  
 9   company_Dell                 912 non-null    int64  
 10  company_Fujitsu              912 non-null    int64  
 11  company_Google               912 non-null    int64  
 12  company_HP                   912 non-null    int64  
 13  company_Huawei      

In [996]:
data.describe()

Unnamed: 0,inches,price,company_Acer,company_Apple,company_Asus,company_Chuwi,company_Dell,company_Fujitsu,company_Google,company_HP,...,cpu_intel,cpu_amd,cpu_ghz,ram_numeric,memory_HDD,memory_SSD,memory_Flash,memory_HDD_GB,memory_SSD_GB,memory_Flash_GB
count,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,...,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0
mean,15.011404,1108.122873,0.082237,0.01864,0.118421,0.002193,0.226974,0.003289,0.002193,0.207237,...,0.95614,0.04386,2.287895,8.188596,0.4375,0.64364,0.059211,400.254386,180.074561,5.22807
std,1.411744,714.597741,0.274876,0.135325,0.323283,0.046804,0.419105,0.057291,0.046804,0.405549,...,0.204895,0.204895,0.513277,4.899827,0.496351,0.479186,0.236148,503.985613,184.580513,34.250715
min,10.1,174.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.9,2.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,14.0,589.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,2.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,15.6,949.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,2.5,8.0,0.0,1.0,0.0,0.0,180.0,0.0
75%,15.6,1458.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,2.7,8.0,1.0,1.0,0.0,1000.0,256.0,0.0
max,18.4,6099.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,3.6,32.0,1.0,1.0,1.0,2000.0,1024.0,512.0


In [997]:
data.head()

Unnamed: 0,inches,gpu,opsys,weight,price,company_Acer,company_Apple,company_Asus,company_Chuwi,company_Dell,...,cpu_intel,cpu_amd,cpu_ghz,ram_numeric,memory_HDD,memory_SSD,memory_Flash,memory_HDD_GB,memory_SSD_GB,memory_Flash_GB
0,15.6,AMD Radeon R7 M445,Windows 10,2.36kg,749.0,0,0,0,0,1,...,1,0,2.7,8,1,0,0,1000,0,0
1,15.6,Intel HD Graphics 505,Windows 10,2kg,449.0,0,0,1,0,0,...,1,0,1.1,4,1,0,0,1000,0,0
2,13.3,Intel HD Graphics 520,Windows 7,1.2kg,1460.0,0,0,0,0,0,...,1,0,2.3,8,0,1,0,0,256,0
3,15.6,Nvidia GeForce GTX 1070,Windows 10,4.42kg,2868.99,0,0,0,0,1,...,1,0,2.8,16,1,1,0,1000,256,0
4,12.5,Intel HD Graphics 520,Windows 7,1.26kg,1713.37,0,0,0,0,1,...,1,0,2.4,8,0,1,0,0,256,0


In [998]:
data_object = 'gpu'
# data = pd.concat([data, pd.get_dummies(data[data_object], prefix=data_object, dtype=int)], axis=1)
data.drop(columns=[data_object], inplace=True)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 46 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   opsys                        912 non-null    object 
 2   weight                       912 non-null    object 
 3   price                        912 non-null    float64
 4   company_Acer                 912 non-null    int64  
 5   company_Apple                912 non-null    int64  
 6   company_Asus                 912 non-null    int64  
 7   company_Chuwi                912 non-null    int64  
 8   company_Dell                 912 non-null    int64  
 9   company_Fujitsu              912 non-null    int64  
 10  company_Google               912 non-null    int64  
 11  company_HP                   912 non-null    int64  
 12  company_Huawei               912 non-null    int64  
 13  company_LG          

In [999]:
data_object = 'opsys'
data = pd.concat([data, pd.get_dummies(data[data_object], prefix=data_object, dtype=int)], axis=1)
data.drop(columns=[data_object], inplace=True)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 54 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   weight                       912 non-null    object 
 2   price                        912 non-null    float64
 3   company_Acer                 912 non-null    int64  
 4   company_Apple                912 non-null    int64  
 5   company_Asus                 912 non-null    int64  
 6   company_Chuwi                912 non-null    int64  
 7   company_Dell                 912 non-null    int64  
 8   company_Fujitsu              912 non-null    int64  
 9   company_Google               912 non-null    int64  
 10  company_HP                   912 non-null    int64  
 11  company_Huawei               912 non-null    int64  
 12  company_LG                   912 non-null    int64  
 13  company_Lenovo      

In [1000]:
def extract_weight(weight_string):
    match = re.search(r'(\d+(\.\d+)?)', weight_string)
    if match:
        return float(match.group(1))
    return None

data['weight_kg'] = data['weight'].apply(extract_weight)

data.drop(columns=['weight'], inplace=True)

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 54 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   price                        912 non-null    float64
 2   company_Acer                 912 non-null    int64  
 3   company_Apple                912 non-null    int64  
 4   company_Asus                 912 non-null    int64  
 5   company_Chuwi                912 non-null    int64  
 6   company_Dell                 912 non-null    int64  
 7   company_Fujitsu              912 non-null    int64  
 8   company_Google               912 non-null    int64  
 9   company_HP                   912 non-null    int64  
 10  company_Huawei               912 non-null    int64  
 11  company_LG                   912 non-null    int64  
 12  company_Lenovo               912 non-null    int64  
 13  company_MSI         

In [1001]:
data.head()

Unnamed: 0,inches,price,company_Acer,company_Apple,company_Asus,company_Chuwi,company_Dell,company_Fujitsu,company_Google,company_HP,...,opsys_Android,opsys_Chrome OS,opsys_Linux,opsys_Mac OS X,opsys_No OS,opsys_Windows 10,opsys_Windows 10 S,opsys_Windows 7,opsys_macOS,weight_kg
0,15.6,749.0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,2.36
1,15.6,449.0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,2.0
2,13.3,1460.0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,1.2
3,15.6,2868.99,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,4.42
4,12.5,1713.37,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,1,0,1.26


In [1002]:
columns = list(data.columns)

index_inches = columns.index('inches')
index_price = columns.index('price')

columns[index_inches], columns[index_price] = columns[index_price], columns[index_inches]

data = data[columns]

In [1003]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 54 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   price                        912 non-null    float64
 1   inches                       912 non-null    float64
 2   company_Acer                 912 non-null    int64  
 3   company_Apple                912 non-null    int64  
 4   company_Asus                 912 non-null    int64  
 5   company_Chuwi                912 non-null    int64  
 6   company_Dell                 912 non-null    int64  
 7   company_Fujitsu              912 non-null    int64  
 8   company_Google               912 non-null    int64  
 9   company_HP                   912 non-null    int64  
 10  company_Huawei               912 non-null    int64  
 11  company_LG                   912 non-null    int64  
 12  company_Lenovo               912 non-null    int64  
 13  company_MSI         

#### TRAIN


In [1004]:
features = ['company_Fujitsu', 'company_Mediacom', 'opsys_Android']
data = data.drop(features, axis=1)

In [1005]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 51 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   price                        912 non-null    float64
 1   inches                       912 non-null    float64
 2   company_Acer                 912 non-null    int64  
 3   company_Apple                912 non-null    int64  
 4   company_Asus                 912 non-null    int64  
 5   company_Chuwi                912 non-null    int64  
 6   company_Dell                 912 non-null    int64  
 7   company_Google               912 non-null    int64  
 8   company_HP                   912 non-null    int64  
 9   company_Huawei               912 non-null    int64  
 10  company_LG                   912 non-null    int64  
 11  company_Lenovo               912 non-null    int64  
 12  company_MSI                  912 non-null    int64  
 13  company_Microsoft   

In [1006]:
X = data.drop('price', axis=1)
y = data['price']


In [1007]:
X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 912 entries, 0 to 911
Data columns (total 50 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       912 non-null    float64
 1   company_Acer                 912 non-null    int64  
 2   company_Apple                912 non-null    int64  
 3   company_Asus                 912 non-null    int64  
 4   company_Chuwi                912 non-null    int64  
 5   company_Dell                 912 non-null    int64  
 6   company_Google               912 non-null    int64  
 7   company_HP                   912 non-null    int64  
 8   company_Huawei               912 non-null    int64  
 9   company_LG                   912 non-null    int64  
 10  company_Lenovo               912 non-null    int64  
 11  company_MSI                  912 non-null    int64  
 12  company_Microsoft            912 non-null    int64  
 13  company_Razer       

In [1008]:
X.describe()

Unnamed: 0,inches,company_Acer,company_Apple,company_Asus,company_Chuwi,company_Dell,company_Google,company_HP,company_Huawei,company_LG,...,memory_Flash_GB,opsys_Chrome OS,opsys_Linux,opsys_Mac OS X,opsys_No OS,opsys_Windows 10,opsys_Windows 10 S,opsys_Windows 7,opsys_macOS,weight_kg
count,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,...,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0,912.0
mean,15.011404,0.082237,0.01864,0.118421,0.002193,0.226974,0.002193,0.207237,0.001096,0.001096,...,5.22807,0.02193,0.051535,0.006579,0.048246,0.817982,0.004386,0.035088,0.012061,2.032112
std,1.411744,0.274876,0.135325,0.323283,0.046804,0.419105,0.046804,0.405549,0.033113,0.033113,...,34.250715,0.146535,0.221208,0.080888,0.214402,0.386071,0.066117,0.184103,0.10922,0.653772
min,10.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.69
25%,14.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.5
50%,15.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.04
75%,15.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.3
max,18.4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,512.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.6


In [1009]:
# scaler = StandardScaler()
# X = scaler.fit_transform(X)

# X_df = pd.DataFrame(X, columns=data.columns[1:])
# X_df.describe()

In [1010]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [1011]:
print('X_train:', X_train.shape)
print('X_test:', X_test.shape)
print('y_train:', y_train.shape)
print('y_test:', y_test.shape)


X_train: (729, 50)
X_test: (183, 50)
y_train: (729,)
y_test: (183,)


In [1012]:
model = LinearRegression()

model.fit(X_train, y_train)



In [1013]:
y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Absolute Error: {mae}')

mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

rmse = root_mean_squared_error(y_test, y_pred)
print(f'Root Mean Squared Error: {rmse}')

r2 = r2_score(y_test, y_pred)
print(f'R-squared: {r2}')


Mean Absolute Error: 230.34785024428106
Mean Squared Error: 111957.27595093877
Root Mean Squared Error: 334.6001732679449
R-squared: 0.7407832968079258


In [1014]:
predicciones = model.predict(X_test)

print(predicciones)

[1622.22153965 1111.15792254  971.84083928 1015.53601218 1685.4578946
  337.20087218  394.63549575  319.00596486  487.69272013  838.98551545
  263.55238477 2477.27969567  702.22779865 1118.54772872  397.69293824
 1299.99555375  462.38968649 1329.26832124  137.11302763  882.97298223
  600.1504275   984.49722647 1616.8685592   604.75633689 1432.44336371
  617.91267377 1077.04742053  942.16054238 1425.36576429 1141.84806198
  861.38699436 2087.12082358  421.01815068  988.74371404  324.4737917
  561.42727221  -81.56739764 2086.19072702 1222.54670573 1995.6516151
  857.33761277  919.68488544 2121.88466868  192.26096162  571.26303264
  303.74701108 1384.32589074  635.68354348 1072.10630178 1114.37325799
 1840.03341855 1811.65467872  804.48479896 1267.32162067 1203.00725582
 1036.9845628   257.66034529  441.35156309  727.04357087 1243.97205416
 1271.19203704  533.19554327 1154.10541925 1103.72229408 1864.39007412
  893.21960684 1039.41191021  522.45049574 1013.74960762  712.34236272
  305.591

#### PREDICT

In [1015]:
data_test = pd.read_csv('./data/test.csv')

In [1016]:
data_test = data_test.rename(columns=str.lower)

In [1017]:
data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                391 non-null    int64  
 1   company           391 non-null    object 
 2   product           391 non-null    object 
 3   typename          391 non-null    object 
 4   inches            391 non-null    float64
 5   screenresolution  391 non-null    object 
 6   cpu               391 non-null    object 
 7   ram               391 non-null    object 
 8   memory            391 non-null    object 
 9   gpu               391 non-null    object 
 10  opsys             391 non-null    object 
 11  weight            391 non-null    object 
dtypes: float64(1), int64(1), object(10)
memory usage: 36.8+ KB


In [1018]:
data_object = 'company'
data_test = pd.concat([data_test, pd.get_dummies(data_test[data_object], prefix=data_object, dtype=int)], axis=1)
data_test.drop(columns=[data_object], inplace=True)
data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 28 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 391 non-null    int64  
 1   product            391 non-null    object 
 2   typename           391 non-null    object 
 3   inches             391 non-null    float64
 4   screenresolution   391 non-null    object 
 5   cpu                391 non-null    object 
 6   ram                391 non-null    object 
 7   memory             391 non-null    object 
 8   gpu                391 non-null    object 
 9   opsys              391 non-null    object 
 10  weight             391 non-null    object 
 11  company_Acer       391 non-null    int64  
 12  company_Apple      391 non-null    int64  
 13  company_Asus       391 non-null    int64  
 14  company_Chuwi      391 non-null    int64  
 15  company_Dell       391 non-null    int64  
 16  company_Google     391 non

In [1019]:
data_object = 'product'
data_test.drop(columns=[data_object], inplace=True)
data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 27 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 391 non-null    int64  
 1   typename           391 non-null    object 
 2   inches             391 non-null    float64
 3   screenresolution   391 non-null    object 
 4   cpu                391 non-null    object 
 5   ram                391 non-null    object 
 6   memory             391 non-null    object 
 7   gpu                391 non-null    object 
 8   opsys              391 non-null    object 
 9   weight             391 non-null    object 
 10  company_Acer       391 non-null    int64  
 11  company_Apple      391 non-null    int64  
 12  company_Asus       391 non-null    int64  
 13  company_Chuwi      391 non-null    int64  
 14  company_Dell       391 non-null    int64  
 15  company_Google     391 non-null    int64  
 16  company_HP         391 non

In [1020]:
data_object = 'typename'
data_test = pd.concat([data_test, pd.get_dummies(data_test[data_object], prefix=data_object, dtype=int)], axis=1)
data_test.drop(columns=[data_object], inplace=True)
data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 32 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           391 non-null    int64  
 1   inches                       391 non-null    float64
 2   screenresolution             391 non-null    object 
 3   cpu                          391 non-null    object 
 4   ram                          391 non-null    object 
 5   memory                       391 non-null    object 
 6   gpu                          391 non-null    object 
 7   opsys                        391 non-null    object 
 8   weight                       391 non-null    object 
 9   company_Acer                 391 non-null    int64  
 10  company_Apple                391 non-null    int64  
 11  company_Asus                 391 non-null    int64  
 12  company_Chuwi                391 non-null    int64  
 13  company_Dell        

In [1021]:
def process_screen_resolution(resolution):
    result = {
        'screen_4K': 0,  
        'screen_HD': 0,
        'screen_Touchscreen': 0,  
        'screen_Retina': 0,
        'screen_Ultra': 0,
        'screen_width': None,  
        'screen_height': None
    }

    if '4K' in resolution:
        result['screen_4K'] = 1  
    if 'Full HD' in resolution or 'HD' in resolution:
        result['screen_HD'] = 1
    if 'Retina' in resolution:
        result['screen_Retina'] = 1
    
    match = re.search(r'(\d{3,4})x(\d{3,4})', resolution)
    if match:
        width, height = match.groups()
        result['screen_width'] = int(width)
        result['screen_height'] = int(height)

    return result

data_screen = data_test['screenresolution'].apply(process_screen_resolution)

data_screen = pd.DataFrame(data_screen.tolist())

data_test = pd.concat([data_test, data_screen], axis=1)

data_test.drop(columns=['screenresolution'], inplace=True)

data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 38 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           391 non-null    int64  
 1   inches                       391 non-null    float64
 2   cpu                          391 non-null    object 
 3   ram                          391 non-null    object 
 4   memory                       391 non-null    object 
 5   gpu                          391 non-null    object 
 6   opsys                        391 non-null    object 
 7   weight                       391 non-null    object 
 8   company_Acer                 391 non-null    int64  
 9   company_Apple                391 non-null    int64  
 10  company_Asus                 391 non-null    int64  
 11  company_Chuwi                391 non-null    int64  
 12  company_Dell                 391 non-null    int64  
 13  company_Google      

In [1022]:
def extract_cpu_info(cpu_string):
    cpu_intel = 0
    cpu_amd = 0
    cpu_ghz = None

    if 'Intel' in cpu_string:
        cpu_intel = 1
        match = re.search(r'(\d+\.?\d*)GHz', cpu_string)
        if match:
            cpu_ghz = float(match.group(1))
    
    if 'AMD' in cpu_string:
        cpu_amd = 1
        match = re.search(r'(\d+\.?\d*)GHz', cpu_string)
        if match:
            cpu_ghz = float(match.group(1))

    return pd.Series([cpu_intel, cpu_amd, cpu_ghz], index=['cpu_intel', 'cpu_amd', 'cpu_ghz'])

data_test[['cpu_intel', 'cpu_amd', 'cpu_ghz']] = data_test['cpu'].apply(extract_cpu_info)

data_test['cpu_intel'] = data_test['cpu_intel'].astype(int)
data_test['cpu_amd'] = data_test['cpu_amd'].astype(int)

data_test.drop(columns=['cpu'], inplace=True)

data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 40 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           391 non-null    int64  
 1   inches                       391 non-null    float64
 2   ram                          391 non-null    object 
 3   memory                       391 non-null    object 
 4   gpu                          391 non-null    object 
 5   opsys                        391 non-null    object 
 6   weight                       391 non-null    object 
 7   company_Acer                 391 non-null    int64  
 8   company_Apple                391 non-null    int64  
 9   company_Asus                 391 non-null    int64  
 10  company_Chuwi                391 non-null    int64  
 11  company_Dell                 391 non-null    int64  
 12  company_Google               391 non-null    int64  
 13  company_HP          

In [1023]:
def extract_ram(ram_string):
    match = re.search(r'(\d+)GB', ram_string)
    if match:
        return int(match.group(1))
    return None

data_test['ram_numeric'] = data_test['ram'].apply(extract_ram)

data_test.drop(columns=['ram'], inplace=True)

data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 40 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           391 non-null    int64  
 1   inches                       391 non-null    float64
 2   memory                       391 non-null    object 
 3   gpu                          391 non-null    object 
 4   opsys                        391 non-null    object 
 5   weight                       391 non-null    object 
 6   company_Acer                 391 non-null    int64  
 7   company_Apple                391 non-null    int64  
 8   company_Asus                 391 non-null    int64  
 9   company_Chuwi                391 non-null    int64  
 10  company_Dell                 391 non-null    int64  
 11  company_Google               391 non-null    int64  
 12  company_HP                   391 non-null    int64  
 13  company_Huawei      

In [1024]:
def extract_memory_info(memory_string):
    memory_types = {
        'memory_HDD': 0,
        'memory_SSD': 0,
        'memory_Flash': 0
    }
    memory_sizes = {    
        'memory_HDD_GB': 0,
        'memory_SSD_GB': 0,
        'memory_Flash_GB': 0
    }
    
    matches = re.findall(r'(\d+)([A-Za-z ]+)', memory_string)
    
    for match in matches:
        size = int(match[0])
        unit = match[1].strip().upper()
        
        if 'TB' in unit:
            size *= 1000  
        
        if 'HDD' in unit:
            memory_types['memory_HDD'] = 1
            memory_sizes['memory_HDD_GB'] += size
        elif 'SSD' in unit:
            memory_types['memory_SSD'] = 1
            memory_sizes['memory_SSD_GB'] += size
        elif 'FLASH STORAGE' in unit:
            memory_types['memory_Flash'] = 1
            memory_sizes['memory_Flash_GB'] += size
    
    result = {**memory_types, **memory_sizes}
    return pd.Series(result)

data_test[['memory_HDD', 'memory_SSD', 'memory_Flash', 'memory_HDD_GB', 'memory_SSD_GB', 'memory_Flash_GB']] = data_test['memory'].apply(extract_memory_info)

data_test.drop(columns=['memory'], inplace=True)

data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 45 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           391 non-null    int64  
 1   inches                       391 non-null    float64
 2   gpu                          391 non-null    object 
 3   opsys                        391 non-null    object 
 4   weight                       391 non-null    object 
 5   company_Acer                 391 non-null    int64  
 6   company_Apple                391 non-null    int64  
 7   company_Asus                 391 non-null    int64  
 8   company_Chuwi                391 non-null    int64  
 9   company_Dell                 391 non-null    int64  
 10  company_Google               391 non-null    int64  
 11  company_HP                   391 non-null    int64  
 12  company_Huawei               391 non-null    int64  
 13  company_LG          

In [1025]:
data_object = 'gpu'
# data = pd.concat([data, pd.get_dummies(data[data_object], prefix=data_object, dtype=int)], axis=1)
data_test.drop(columns=[data_object], inplace=True)
data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 44 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           391 non-null    int64  
 1   inches                       391 non-null    float64
 2   opsys                        391 non-null    object 
 3   weight                       391 non-null    object 
 4   company_Acer                 391 non-null    int64  
 5   company_Apple                391 non-null    int64  
 6   company_Asus                 391 non-null    int64  
 7   company_Chuwi                391 non-null    int64  
 8   company_Dell                 391 non-null    int64  
 9   company_Google               391 non-null    int64  
 10  company_HP                   391 non-null    int64  
 11  company_Huawei               391 non-null    int64  
 12  company_LG                   391 non-null    int64  
 13  company_Lenovo      

In [1026]:
data_object = 'opsys'
data_test = pd.concat([data_test, pd.get_dummies(data_test[data_object], prefix=data_object, dtype=int)], axis=1)
data_test.drop(columns=[data_object], inplace=True)
data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 51 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           391 non-null    int64  
 1   inches                       391 non-null    float64
 2   weight                       391 non-null    object 
 3   company_Acer                 391 non-null    int64  
 4   company_Apple                391 non-null    int64  
 5   company_Asus                 391 non-null    int64  
 6   company_Chuwi                391 non-null    int64  
 7   company_Dell                 391 non-null    int64  
 8   company_Google               391 non-null    int64  
 9   company_HP                   391 non-null    int64  
 10  company_Huawei               391 non-null    int64  
 11  company_LG                   391 non-null    int64  
 12  company_Lenovo               391 non-null    int64  
 13  company_MSI         

In [1027]:
def extract_weight(weight_string):
    match = re.search(r'(\d+(\.\d+)?)', weight_string)
    if match:
        return float(match.group(1))
    return None

data_test['weight_kg'] = data_test['weight'].apply(extract_weight)

data_test.drop(columns=['weight'], inplace=True)

data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 51 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           391 non-null    int64  
 1   inches                       391 non-null    float64
 2   company_Acer                 391 non-null    int64  
 3   company_Apple                391 non-null    int64  
 4   company_Asus                 391 non-null    int64  
 5   company_Chuwi                391 non-null    int64  
 6   company_Dell                 391 non-null    int64  
 7   company_Google               391 non-null    int64  
 8   company_HP                   391 non-null    int64  
 9   company_Huawei               391 non-null    int64  
 10  company_LG                   391 non-null    int64  
 11  company_Lenovo               391 non-null    int64  
 12  company_MSI                  391 non-null    int64  
 13  company_Microsoft   

In [1028]:
data_test.isna().sum()

id                             0
inches                         0
company_Acer                   0
company_Apple                  0
company_Asus                   0
company_Chuwi                  0
company_Dell                   0
company_Google                 0
company_HP                     0
company_Huawei                 0
company_LG                     0
company_Lenovo                 0
company_MSI                    0
company_Microsoft              0
company_Razer                  0
company_Samsung                0
company_Toshiba                0
company_Vero                   0
company_Xiaomi                 0
typename_2 in 1 Convertible    0
typename_Gaming                0
typename_Netbook               0
typename_Notebook              0
typename_Ultrabook             0
typename_Workstation           0
screen_4K                      0
screen_HD                      0
screen_Touchscreen             0
screen_Retina                  0
screen_Ultra                   0
screen_wid

In [1029]:
median_values = data_test.median()

data_test = data_test.fillna(median_values)

In [1030]:
data_test.isna().sum()

id                             0
inches                         0
company_Acer                   0
company_Apple                  0
company_Asus                   0
company_Chuwi                  0
company_Dell                   0
company_Google                 0
company_HP                     0
company_Huawei                 0
company_LG                     0
company_Lenovo                 0
company_MSI                    0
company_Microsoft              0
company_Razer                  0
company_Samsung                0
company_Toshiba                0
company_Vero                   0
company_Xiaomi                 0
typename_2 in 1 Convertible    0
typename_Gaming                0
typename_Netbook               0
typename_Notebook              0
typename_Ultrabook             0
typename_Workstation           0
screen_4K                      0
screen_HD                      0
screen_Touchscreen             0
screen_Retina                  0
screen_Ultra                   0
screen_wid

In [1031]:

data_test_copy = data_test.drop('id', axis=1)
# features = ['company_Fujitsu', 'company_Mediacom', 'opsys_Android']
# for feature in features:
#     if feature not in data_test.columns:
#         data_test[feature] = 0

In [1032]:
data_test_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 50 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       391 non-null    float64
 1   company_Acer                 391 non-null    int64  
 2   company_Apple                391 non-null    int64  
 3   company_Asus                 391 non-null    int64  
 4   company_Chuwi                391 non-null    int64  
 5   company_Dell                 391 non-null    int64  
 6   company_Google               391 non-null    int64  
 7   company_HP                   391 non-null    int64  
 8   company_Huawei               391 non-null    int64  
 9   company_LG                   391 non-null    int64  
 10  company_Lenovo               391 non-null    int64  
 11  company_MSI                  391 non-null    int64  
 12  company_Microsoft            391 non-null    int64  
 13  company_Razer       

In [1033]:
X_test = data_test_copy

X_test = X_test.dropna()

X_test.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 50 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   inches                       391 non-null    float64
 1   company_Acer                 391 non-null    int64  
 2   company_Apple                391 non-null    int64  
 3   company_Asus                 391 non-null    int64  
 4   company_Chuwi                391 non-null    int64  
 5   company_Dell                 391 non-null    int64  
 6   company_Google               391 non-null    int64  
 7   company_HP                   391 non-null    int64  
 8   company_Huawei               391 non-null    int64  
 9   company_LG                   391 non-null    int64  
 10  company_Lenovo               391 non-null    int64  
 11  company_MSI                  391 non-null    int64  
 12  company_Microsoft            391 non-null    int64  
 13  company_Razer       

In [1034]:
X_test.describe()



Unnamed: 0,inches,company_Acer,company_Apple,company_Asus,company_Chuwi,company_Dell,company_Google,company_HP,company_Huawei,company_LG,...,memory_Flash_GB,opsys_Chrome OS,opsys_Linux,opsys_Mac OS X,opsys_No OS,opsys_Windows 10,opsys_Windows 10 S,opsys_Windows 7,opsys_macOS,weight_kg
count,391.0,391.0,391.0,391.0,391.0,391.0,391.0,391.0,391.0,391.0,...,391.0,391.0,391.0,391.0,391.0,391.0,391.0,391.0,391.0,391.0
mean,15.030691,0.071611,0.01023,0.127877,0.002558,0.230179,0.002558,0.217391,0.002558,0.005115,...,2.987212,0.017903,0.038363,0.005115,0.056266,0.83376,0.01023,0.033248,0.005115,2.054179
std,1.461446,0.258173,0.100755,0.334381,0.050572,0.421487,0.050572,0.412999,0.050572,0.071428,...,17.775735,0.132768,0.192318,0.071428,0.23073,0.372773,0.100755,0.179513,0.071428,0.692613
min,11.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.91
25%,14.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.49
50%,15.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.05
75%,15.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.345
max,17.3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,256.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.7


In [1035]:
predicciones = model.predict(X_test)

In [1036]:
print(predicciones.shape)
print(predicciones)


(391,)
[2513.57594925  635.57618921  381.25626391 1460.51985635 1082.20423449
 1700.84043763 3072.80658892  490.60349646 4725.90406996 1402.51200874
 1440.95789144  665.57796313  337.00217327  580.38986987  341.41155819
  999.64163631  715.17497473 1518.03860352 1229.53176067 1613.03407163
 1550.93846153  988.95890115 1476.29590771 1078.99301309 1792.34231966
  436.70313592  781.66119487  338.0719289  1101.75769017  466.60952436
 1222.55523149 1119.03531795 1716.3183712   377.91792868  452.48338428
 1813.39792708  853.72444064  665.72021171 2850.51528195  377.48507089
  759.78334898  791.21980803 1523.87599475 1765.87965771 1113.73837239
 1328.497615    393.37416902 3178.83967748   88.95702337 1681.44570751
 1105.02469621  768.70384724  854.31614781  310.61494237 1262.00882335
  577.59884705  705.22667901  270.5977999   963.36016977 1284.50126775
 1438.03246618 1193.26804759  815.92806846 1019.05165563 1066.94114695
 1559.68334654  695.28128276 2098.84815534  464.20164991 2153.92671739

In [1037]:
df = pd.DataFrame()
df['ID'] = data_test['id']
df2 = pd.DataFrame(predicciones, columns=['Price_euros'])
df = pd.concat([df, df2], axis=1)
df['Price_euros'] = df['Price_euros'].round(1)
df.describe()

Unnamed: 0,ID,Price_euros
count,391.0,391.0
mean,646.074169,1166.706394
std,372.5007,647.456799
min,18.0,-84.8
25%,335.0,672.6
50%,629.0,1074.5
75%,955.5,1521.75
max,1319.0,4725.9


In [1038]:
import csv
import os
from datetime import datetime

# Get the current date and time
current_datetime = datetime.now()

# Construct the filename
filename = f"submission_dgerwig_{current_datetime.strftime('%Y_%m_%d__%H_%M')}.csv"

# Directory where the file will be saved
directory = "submissions"

# Create the directory if it doesn't exist
if not os.path.exists(directory):
    os.makedirs(directory)

# Full path for the file
filepath = os.path.join(directory, filename)


df.to_csv(filepath, index=False)


print(f"✅ File '{filepath}' generated successfully.")


✅ File 'submissions\submission_dgerwig_2024_07_20__12_19.csv' generated successfully.


#### EVALUATE


#### SUBMMIT

In [1039]:
import os
import glob

def get_most_recent_file(directory):
    # Ensure the directory path uses the correct separator
    directory = os.path.abspath(directory)
    files = glob.glob(os.path.join(directory, "*"))
    if not files:
        return None
    return max(files, key=os.path.getmtime)

directory = "./submissions"
most_recent_file = get_most_recent_file(directory)

if most_recent_file is None:
    print("No files found in the submissions directory.")
else:
    print(f"Most recent file: {most_recent_file}")


Most recent file: c:\Users\diego\OneDrive\code_dgerwig\Kaggle\TheBridge\laptops_market\submissions\submission_dgerwig_2024_07_20__12_19.csv
