# Regresión Lineal Múltiple

Este es un ejemplo de Regresión Lineal Múltiple en Python. En este caso, se utiliza un dataset de la cantidad de Co2 emitido por los vehículos en función de su cilindrada y potencia.

Esta es la función hipótesis que se utiliza:

<p align="center">
  <img src="./assets/funcion-regresion-lineal-multiple.png" />
</p>



In [9]:
# Instalación de librerías
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [12]:
# Importación de librerías
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [3]:
# Leer dataset
df = pd.read_csv('./assets/FuelConsumptionCo2.csv')
df.head() # Mostrar los primeros 5 registros

Unnamed: 0,MODELYEAR,MAKE,MODEL,VEHICLECLASS,ENGINESIZE,CYLINDERS,TRANSMISSION,FUELTYPE,FUELCONSUMPTION_CITY,FUELCONSUMPTION_HWY,FUELCONSUMPTION_COMB,FUELCONSUMPTION_COMB_MPG,CO2EMISSIONS
0,2014,ACURA,ILX,COMPACT,2.0,4,AS5,Z,9.9,6.7,8.5,33,196
1,2014,ACURA,ILX,COMPACT,2.4,4,M6,Z,11.2,7.7,9.6,29,221
2,2014,ACURA,ILX HYBRID,COMPACT,1.5,4,AV7,Z,6.0,5.8,5.9,48,136
3,2014,ACURA,MDX 4WD,SUV - SMALL,3.5,6,AS6,Z,12.7,9.1,11.1,25,255
4,2014,ACURA,RDX AWD,SUV - SMALL,3.5,6,AS6,Z,12.1,8.7,10.6,27,244


In [5]:
# Tomar columnas numericas
df = df[['ENGINESIZE', 'CYLINDERS', 'FUELCONSUMPTION_CITY', 'FUELCONSUMPTION_HWY', 'FUELCONSUMPTION_COMB', 'FUELCONSUMPTION_COMB_MPG', 'CO2EMISSIONS']]
df.head() # Mostrar los primeros 5 registros

Unnamed: 0,ENGINESIZE,CYLINDERS,FUELCONSUMPTION_CITY,FUELCONSUMPTION_HWY,FUELCONSUMPTION_COMB,FUELCONSUMPTION_COMB_MPG,CO2EMISSIONS
0,2.0,4,9.9,6.7,8.5,33,196
1,2.4,4,11.2,7.7,9.6,29,221
2,1.5,4,6.0,5.8,5.9,48,136
3,3.5,6,12.7,9.1,11.1,25,255
4,3.5,6,12.1,8.7,10.6,27,244


In [6]:
# Ver correlación
df.corr()

Unnamed: 0,ENGINESIZE,CYLINDERS,FUELCONSUMPTION_CITY,FUELCONSUMPTION_HWY,FUELCONSUMPTION_COMB,FUELCONSUMPTION_COMB_MPG,CO2EMISSIONS
ENGINESIZE,1.0,0.934011,0.832225,0.778746,0.819482,-0.808554,0.874154
CYLINDERS,0.934011,1.0,0.796473,0.724594,0.776788,-0.77043,0.849685
FUELCONSUMPTION_CITY,0.832225,0.796473,1.0,0.965718,0.995542,-0.935613,0.898039
FUELCONSUMPTION_HWY,0.778746,0.724594,0.965718,1.0,0.985804,-0.893809,0.861748
FUELCONSUMPTION_COMB,0.819482,0.776788,0.995542,0.985804,1.0,-0.927965,0.892129
FUELCONSUMPTION_COMB_MPG,-0.808554,-0.77043,-0.935613,-0.893809,-0.927965,1.0,-0.906394
CO2EMISSIONS,0.874154,0.849685,0.898039,0.861748,0.892129,-0.906394,1.0


In [7]:
# Columnas a tomar (FUELCONSUMPTION_CITY, FUELCONSUMPTION_COMB, FUELCONSUMPTION_COMB_MPG, CO2EMISSIONS)
df = df[['FUELCONSUMPTION_CITY', 'FUELCONSUMPTION_COMB', 'FUELCONSUMPTION_COMB_MPG', 'CO2EMISSIONS']]
df.head() # Mostrar los primeros 5 registros

Unnamed: 0,FUELCONSUMPTION_CITY,FUELCONSUMPTION_COMB,FUELCONSUMPTION_COMB_MPG,CO2EMISSIONS
0,9.9,8.5,33,196
1,11.2,9.6,29,221
2,6.0,5.9,48,136
3,12.7,11.1,25,255
4,12.1,10.6,27,244


In [8]:
# Total de ejemplos
total_examples = df.shape[0]
total_examples

1067

In [11]:
# Dividir conjunto de datos
x_train, x_test, y_train, y_test = train_test_split(df[['FUELCONSUMPTION_CITY', 'FUELCONSUMPTION_COMB', 'FUELCONSUMPTION_COMB_MPG']], df['CO2EMISSIONS'], test_size=0.2, random_state=42)
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((853, 3), (214, 3), (853,), (214,))

In [13]:
# Entrenar modelo
model = LinearRegression()
model.fit(x_train, y_train)

In [14]:
# Evaluar modelo
model.score(x_test, y_test)

0.8479478947695134