# Random Forest - Ejemplo - Precio de Celulares

**Contexto**  
Este conjunto de datos contiene el precio de celulares, de acuerdo diversos factores.

**Contenido**  
El conjunto de datos proviene de kaggle: [Mobile Price Prediction](https://www.kaggle.com/datasets/mohannapd/mobile-price-prediction).  
Contiene 161 renglones, con las siguientes columnas:

| Variable     | Definición                     | Valor           |
| ------------ | ------------------------------ | --------------- |
| Product_id   | Identificador del celular      | Id numérico     |
| Price        | Precio **(variable objetivo)** | USD             |
| Sale         | Número de ticket               | Id numérico     |
| weight       | Peso                           | Libras          |
| resolution   | Resolución                     | Pulgadas        |
| ppi          | Densidad                       | Pixeles         |
| cpu core     | Núcleos de cpu                 | Numérico entero |
| cpu freq     | Frecuencia del cpu             | GHz             |
| internal mem | Memoria interna                | GB              |
| ram          | Memoria RAM                    | GB              |
| RearCam      | Cámaras traseras               | Numérico entero |
| Front_Cam    | Cámaras delanteras             | Numérico entero |
| battery      | Batería                        | mAh             |
| thickness    | Ancho                          | Milímetros      |

**Planteamiento del problema**  
Se busca encontrar que factores tienen mayor influencia en el precio de los celulares.

In [1]:
# Importar librerias
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn import metrics

## Cargar Datos

In [2]:
# Importar los datos
df = pd.read_csv('Cellphone.csv')
df.head()

Unnamed: 0,Product_id,Price,Sale,weight,resoloution,ppi,cpu core,cpu freq,internal mem,ram,RearCam,Front_Cam,battery,thickness
0,203,2357,10,135.0,5.2,424,8,1.35,16.0,3.0,13.0,8.0,2610,7.4
1,880,1749,10,125.0,4.0,233,2,1.3,4.0,1.0,3.15,0.0,1700,9.9
2,40,1916,10,110.0,4.7,312,4,1.2,8.0,1.5,13.0,5.0,2000,7.6
3,99,1315,11,118.5,4.0,233,2,1.3,4.0,0.512,3.15,0.0,1400,11.0
4,880,1749,11,125.0,4.0,233,2,1.3,4.0,1.0,3.15,0.0,1700,9.9


In [3]:
# Renombrar columnas
df.columns = ['id_producto', 'precio', 'ticket', 'peso', 'resolución', 'densidad_pixel', 'nucleos_cpu',
              'frec_cpu', 'mem_interna', 'ram', 'camara_tras', 'camara_frontal', 'bateria', 'ancho']

## Modelado

In [4]:
# Variables independientes
X = df[['peso', 'resolución', 'densidad_pixel', 'nucleos_cpu',
              'frec_cpu', 'mem_interna', 'ram', 'camara_tras', 'camara_frontal', 'bateria', 'ancho']]
X.head()

Unnamed: 0,peso,resolución,densidad_pixel,nucleos_cpu,frec_cpu,mem_interna,ram,camara_tras,camara_frontal,bateria,ancho
0,135.0,5.2,424,8,1.35,16.0,3.0,13.0,8.0,2610,7.4
1,125.0,4.0,233,2,1.3,4.0,1.0,3.15,0.0,1700,9.9
2,110.0,4.7,312,4,1.2,8.0,1.5,13.0,5.0,2000,7.6
3,118.5,4.0,233,2,1.3,4.0,0.512,3.15,0.0,1400,11.0
4,125.0,4.0,233,2,1.3,4.0,1.0,3.15,0.0,1700,9.9


In [5]:
# Variable dependiente
y = df['precio']
y.head()

0    2357
1    1749
2    1916
3    1315
4    1749
Name: precio, dtype: int64

In [6]:
print('X:', len(X), 'y:', len(y))

X: 161 y: 161


In [7]:
# Conjunto de entrenamiento y pruebas
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [8]:
print('X_train:', len(X_train), 'y_train:', len(y_train))
print('X_test:',  len(X_test),  'y_test:',  len(y_test))

X_train: 112 y_train: 112
X_test: 49 y_test: 49


In [9]:
# Entrenamiento
model = RandomForestRegressor(n_estimators=300, random_state=0)
model.fit(X_train,y_train)

In [10]:
# Predicciones
prediction = model.predict(X_test)
prediction

array([2916.48333333,  789.48      , 1953.41      , 2246.53      ,
       2975.54333333, 2631.49666667, 2076.37666667, 3043.24      ,
       1502.3       ,  752.82      , 2170.86666667, 1788.78      ,
       1947.61666667, 2821.04      , 3363.95      , 1641.82333333,
       1641.82333333, 2545.42      , 2624.89      , 1611.77666667,
       2080.23333333, 3489.43333333, 1577.21666667, 1447.29      ,
       2261.71666667, 2698.81333333, 3240.79333333, 3043.24      ,
       1503.22666667, 2076.37666667, 2353.10666667, 2058.03      ,
       1951.75      , 2599.23      , 1295.21      , 2456.11333333,
       2624.89      , 2212.47666667, 2729.98      , 1569.09333333,
       2261.71666667, 2994.43      , 2225.89666667, 4005.16666667,
       2916.48333333, 2821.04      , 2385.64333333, 2934.37      ,
       1569.09333333])

## Evaluación

In [11]:
print('MAE:', metrics.mean_absolute_error(y_test, prediction))
print('MSE:', metrics.mean_squared_error(y_test, prediction))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, prediction)))
print('R2:', metrics.r2_score(y_test, prediction))

MAE: 119.91374149659866
MSE: 22070.59431564626
RMSE: 148.5617525328988
R2: 0.9561688833852803
