## Car Retail Price Prediction

Given *data about various cars*, let's try to predict the **suggested retail price** of a given car.

We will use a variety of regression models to make our predictions.

Data source: https://www.kaggle.com/datasets/CooperUnion/cardataset

### Getting Started

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.decomposition import PCA

from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.svm import LinearSVR, SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

In [2]:
data = pd.read_csv('archive/data.csv')
data

Unnamed: 0,Make,Model,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,BMW,1 Series M,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,Acura,ZDX,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11914 entries, 0 to 11913
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Make               11914 non-null  object 
 1   Model              11914 non-null  object 
 2   Year               11914 non-null  int64  
 3   Engine Fuel Type   11911 non-null  object 
 4   Engine HP          11845 non-null  float64
 5   Engine Cylinders   11884 non-null  float64
 6   Transmission Type  11914 non-null  object 
 7   Driven_Wheels      11914 non-null  object 
 8   Number of Doors    11908 non-null  float64
 9   Market Category    8172 non-null   object 
 10  Vehicle Size       11914 non-null  object 
 11  Vehicle Style      11914 non-null  object 
 12  highway MPG        11914 non-null  int64  
 13  city mpg           11914 non-null  int64  
 14  Popularity         11914 non-null  int64  
 15  MSRP               11914 non-null  int64  
dtypes: float64(3), int64(5

### Preprocessing

In [4]:
df = data.copy()

In [5]:
df

Unnamed: 0,Make,Model,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,BMW,1 Series M,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,Acura,ZDX,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


In [6]:
{column: len(df[column].unique()) for column in df.select_dtypes('object').columns}

{'Make': 48,
 'Model': 915,
 'Engine Fuel Type': 11,
 'Transmission Type': 5,
 'Driven_Wheels': 4,
 'Market Category': 72,
 'Vehicle Size': 3,
 'Vehicle Style': 16}

In [7]:
{column: df[column].unique() for column in df.select_dtypes('object').columns}

{'Make': array(['BMW', 'Audi', 'FIAT', 'Mercedes-Benz', 'Chrysler', 'Nissan',
        'Volvo', 'Mazda', 'Mitsubishi', 'Ferrari', 'Alfa Romeo', 'Toyota',
        'McLaren', 'Maybach', 'Pontiac', 'Porsche', 'Saab', 'GMC',
        'Hyundai', 'Plymouth', 'Honda', 'Oldsmobile', 'Suzuki', 'Ford',
        'Cadillac', 'Kia', 'Bentley', 'Chevrolet', 'Dodge', 'Lamborghini',
        'Lincoln', 'Subaru', 'Volkswagen', 'Spyker', 'Buick', 'Acura',
        'Rolls-Royce', 'Maserati', 'Lexus', 'Aston Martin', 'Land Rover',
        'Lotus', 'Infiniti', 'Scion', 'Genesis', 'HUMMER', 'Tesla',
        'Bugatti'], dtype=object),
 'Model': array(['1 Series M', '1 Series', '100', '124 Spider', '190-Class',
        '2 Series', '200', '200SX', '240SX', '240', '2',
        '3 Series Gran Turismo', '3 Series', '300-Class', '3000GT', '300',
        '300M', '300ZX', '323', '350-Class', '350Z', '360', '370Z', '3',
        '4 Series Gran Coupe', '4 Series', '400-Class', '420-Class',
        '456M', '458 Italia', '4C'

In [8]:
# Drop high cardinality columns
df = df.drop('Model', axis=1)

In [9]:
# Fill multi hot column missing values
df['Market Category'] = df['Market Category'].fillna('Missing')

In [10]:
def multihot_encode(df, column):
    df = df.copy()

    df[column] = df[column].apply(lambda x: x.split(','))
    
    all_categories = np.unique(df[column].sum())

    for category in all_categories:
        df[column + '_' + category] = df.apply(lambda x: 1 if category in x[column] else 0, axis=1)

    df = df.drop(column, axis=1)
    
    return df

In [11]:
# Multi-hot encoding
df = multihot_encode(df, 'Market Category')

In [12]:
df

Unnamed: 0,Make,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP,Market Category_Crossover,Market Category_Diesel,Market Category_Exotic,Market Category_Factory Tuner,Market Category_Flex Fuel,Market Category_Hatchback,Market Category_High-Performance,Market Category_Hybrid,Market Category_Luxury,Market Category_Missing,Market Category_Performance
0,BMW,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,Compact,Coupe,26,19,3916,46135,0,0,0,1,0,0,1,0,1,0,0
1,BMW,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,Compact,Convertible,28,19,3916,40650,0,0,0,0,0,0,0,0,1,0,1
2,BMW,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,Compact,Coupe,28,20,3916,36350,0,0,0,0,0,0,1,0,1,0,0
3,BMW,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Compact,Coupe,28,18,3916,29450,0,0,0,0,0,0,0,0,1,0,1
4,BMW,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Compact,Convertible,28,18,3916,34500,0,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,Acura,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,Midsize,4dr Hatchback,23,16,204,46120,1,0,0,0,0,1,0,0,1,0,0
11910,Acura,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,Midsize,4dr Hatchback,23,16,204,56670,1,0,0,0,0,1,0,0,1,0,0
11911,Acura,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,Midsize,4dr Hatchback,23,16,204,50620,1,0,0,0,0,1,0,0,1,0,0
11912,Acura,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,Midsize,4dr Hatchback,23,16,204,50920,1,0,0,0,0,1,0,0,1,0,0


In [13]:
def onehot_encode(df, column):
    df = df.copy()
    dummies = pd.get_dummies(df[column], prefix=column, dtype=int)
    df = pd.concat([df, dummies], axis=1)
    df = df.drop(column, axis=1)
    return df

In [14]:
# One-hot encoding
for column in df.select_dtypes('object').columns:
    df = onehot_encode(df, column=column)

In [15]:
df

Unnamed: 0,Year,Engine HP,Engine Cylinders,Number of Doors,highway MPG,city mpg,Popularity,MSRP,Market Category_Crossover,Market Category_Diesel,Market Category_Exotic,Market Category_Factory Tuner,Market Category_Flex Fuel,Market Category_Hatchback,Market Category_High-Performance,Market Category_Hybrid,Market Category_Luxury,Market Category_Missing,Market Category_Performance,Make_Acura,Make_Alfa Romeo,Make_Aston Martin,Make_Audi,Make_BMW,Make_Bentley,Make_Bugatti,Make_Buick,Make_Cadillac,Make_Chevrolet,Make_Chrysler,Make_Dodge,Make_FIAT,Make_Ferrari,Make_Ford,Make_GMC,Make_Genesis,Make_HUMMER,Make_Honda,Make_Hyundai,Make_Infiniti,Make_Kia,Make_Lamborghini,Make_Land Rover,Make_Lexus,Make_Lincoln,Make_Lotus,Make_Maserati,Make_Maybach,Make_Mazda,Make_McLaren,Make_Mercedes-Benz,Make_Mitsubishi,Make_Nissan,Make_Oldsmobile,Make_Plymouth,Make_Pontiac,Make_Porsche,Make_Rolls-Royce,Make_Saab,Make_Scion,Make_Spyker,Make_Subaru,Make_Suzuki,Make_Tesla,Make_Toyota,Make_Volkswagen,Make_Volvo,Engine Fuel Type_diesel,Engine Fuel Type_electric,Engine Fuel Type_flex-fuel (premium unleaded recommended/E85),Engine Fuel Type_flex-fuel (premium unleaded required/E85),Engine Fuel Type_flex-fuel (unleaded/E85),Engine Fuel Type_flex-fuel (unleaded/natural gas),Engine Fuel Type_natural gas,Engine Fuel Type_premium unleaded (recommended),Engine Fuel Type_premium unleaded (required),Engine Fuel Type_regular unleaded,Transmission Type_AUTOMATED_MANUAL,Transmission Type_AUTOMATIC,Transmission Type_DIRECT_DRIVE,Transmission Type_MANUAL,Transmission Type_UNKNOWN,Driven_Wheels_all wheel drive,Driven_Wheels_four wheel drive,Driven_Wheels_front wheel drive,Driven_Wheels_rear wheel drive,Vehicle Size_Compact,Vehicle Size_Large,Vehicle Size_Midsize,Vehicle Style_2dr Hatchback,Vehicle Style_2dr SUV,Vehicle Style_4dr Hatchback,Vehicle Style_4dr SUV,Vehicle Style_Cargo Minivan,Vehicle Style_Cargo Van,Vehicle Style_Convertible,Vehicle Style_Convertible SUV,Vehicle Style_Coupe,Vehicle Style_Crew Cab Pickup,Vehicle Style_Extended Cab Pickup,Vehicle Style_Passenger Minivan,Vehicle Style_Passenger Van,Vehicle Style_Regular Cab Pickup,Vehicle Style_Sedan,Vehicle Style_Wagon
0,2011,335.0,6.0,2.0,26,19,3916,46135,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,2011,300.0,6.0,2.0,28,19,3916,40650,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,2011,300.0,6.0,2.0,28,20,3916,36350,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
3,2011,230.0,6.0,2.0,28,18,3916,29450,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,2011,230.0,6.0,2.0,28,18,3916,34500,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,2012,300.0,6.0,4.0,23,16,204,46120,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
11910,2012,300.0,6.0,4.0,23,16,204,56670,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
11911,2012,300.0,6.0,4.0,23,16,204,50620,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
11912,2013,300.0,6.0,4.0,23,16,204,50920,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


In [16]:
df.isna().sum()[df.isna().sum() > 0]

Engine HP           69
Engine Cylinders    30
Number of Doors      6
dtype: int64

In [17]:
# Fill remaining missing values
df['Engine HP'] = df['Engine HP'].fillna(df['Engine HP'].mean())

for column in ['Engine Cylinders', 'Number of Doors']:
    df[column] = df[column].fillna(df[column].mode()[0])

In [18]:
df.isna().sum().sum()

np.int64(0)

In [19]:
# Split df into X and y
y = df['MSRP'].copy()
X = df.drop('MSRP', axis=1).copy()

In [20]:
# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

In [21]:
X_train.shape, X_test.shape

((8339, 104), (3575, 104))

In [22]:
# Scale X
scaler = StandardScaler()

scaler.fit(X_train)

X_train = pd.DataFrame(scaler.transform(X_train), columns=X_train.columns, index=X_train.index)
X_test = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns, index=X_test.index)

In [23]:
X_train

Unnamed: 0,Year,Engine HP,Engine Cylinders,Number of Doors,highway MPG,city mpg,Popularity,Market Category_Crossover,Market Category_Diesel,Market Category_Exotic,Market Category_Factory Tuner,Market Category_Flex Fuel,Market Category_Hatchback,Market Category_High-Performance,Market Category_Hybrid,Market Category_Luxury,Market Category_Missing,Market Category_Performance,Make_Acura,Make_Alfa Romeo,Make_Aston Martin,Make_Audi,Make_BMW,Make_Bentley,Make_Bugatti,Make_Buick,Make_Cadillac,Make_Chevrolet,Make_Chrysler,Make_Dodge,Make_FIAT,Make_Ferrari,Make_Ford,Make_GMC,Make_Genesis,Make_HUMMER,Make_Honda,Make_Hyundai,Make_Infiniti,Make_Kia,Make_Lamborghini,Make_Land Rover,Make_Lexus,Make_Lincoln,Make_Lotus,Make_Maserati,Make_Maybach,Make_Mazda,Make_McLaren,Make_Mercedes-Benz,Make_Mitsubishi,Make_Nissan,Make_Oldsmobile,Make_Plymouth,Make_Pontiac,Make_Porsche,Make_Rolls-Royce,Make_Saab,Make_Scion,Make_Spyker,Make_Subaru,Make_Suzuki,Make_Tesla,Make_Toyota,Make_Volkswagen,Make_Volvo,Engine Fuel Type_diesel,Engine Fuel Type_electric,Engine Fuel Type_flex-fuel (premium unleaded recommended/E85),Engine Fuel Type_flex-fuel (premium unleaded required/E85),Engine Fuel Type_flex-fuel (unleaded/E85),Engine Fuel Type_flex-fuel (unleaded/natural gas),Engine Fuel Type_natural gas,Engine Fuel Type_premium unleaded (recommended),Engine Fuel Type_premium unleaded (required),Engine Fuel Type_regular unleaded,Transmission Type_AUTOMATED_MANUAL,Transmission Type_AUTOMATIC,Transmission Type_DIRECT_DRIVE,Transmission Type_MANUAL,Transmission Type_UNKNOWN,Driven_Wheels_all wheel drive,Driven_Wheels_four wheel drive,Driven_Wheels_front wheel drive,Driven_Wheels_rear wheel drive,Vehicle Size_Compact,Vehicle Size_Large,Vehicle Size_Midsize,Vehicle Style_2dr Hatchback,Vehicle Style_2dr SUV,Vehicle Style_4dr Hatchback,Vehicle Style_4dr SUV,Vehicle Style_Cargo Minivan,Vehicle Style_Cargo Van,Vehicle Style_Convertible,Vehicle Style_Convertible SUV,Vehicle Style_Coupe,Vehicle Style_Crew Cab Pickup,Vehicle Style_Extended Cab Pickup,Vehicle Style_Passenger Minivan,Vehicle Style_Passenger Van,Vehicle Style_Regular Cab Pickup,Vehicle Style_Sedan,Vehicle Style_Wagon
10660,0.747903,-0.737725,-0.909115,0.643471,0.259193,0.141748,2.858336,-0.447632,-0.128761,-0.213334,-0.23317,-0.333578,-0.338880,-0.363024,-0.173248,-0.616396,1.460345,-0.463562,-0.145552,-0.018971,-0.092007,-0.171403,-0.166523,-0.079214,-0.015489,-0.130197,-0.186052,-0.325077,-0.130197,-0.232025,-0.064922,-0.078444,3.559563,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,-0.171403,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,-0.127795,-0.10562,-0.051431,-0.095904,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,-0.259994,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,-0.287342,-0.024494,-0.010951,-0.378508,-0.451679,0.810878,-0.238272,0.667634,-0.076881,-0.572596,-0.04245,-0.494785,-0.364714,1.220654,-0.630306,1.220654,-0.549131,-0.760958,-0.209621,-0.106199,-0.254697,-0.507895,12.747933,-0.087249,-0.269077,-0.051431,-0.336677,-0.245219,-0.236581,-0.189135,-0.103865,-0.191839,-0.584043,-0.230298
4140,0.090904,0.763695,1.324400,0.643471,-0.396376,0.030925,0.051096,-0.447632,-0.128761,-0.213334,-0.23317,-0.333578,-0.338880,-0.363024,5.772077,1.622334,-0.684770,-0.463562,-0.145552,-0.018971,-0.092007,-0.171403,-0.166523,-0.079214,-0.015489,-0.130197,5.374838,-0.325077,-0.130197,-0.232025,-0.064922,-0.078444,-0.280933,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,-0.171403,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,-0.127795,-0.10562,-0.051431,-0.095904,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,-0.259994,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,-0.287342,-0.024494,-0.010951,-0.378508,-0.451679,0.810878,-0.238272,0.667634,-0.076881,-0.572596,-0.04245,-0.494785,-0.364714,-0.819233,1.586531,-0.819233,1.821058,-0.760958,-0.209621,-0.106199,-0.254697,1.968910,-0.078444,-0.087249,-0.269077,-0.051431,-0.336677,-0.245219,-0.236581,-0.189135,-0.103865,-0.191839,-0.584043,-0.230298
11414,-1.091696,-0.636402,0.207643,0.643471,-0.505638,-0.412365,-0.374201,-0.447632,-0.128761,-0.213334,-0.23317,-0.333578,-0.338880,-0.363024,-0.173248,-0.616396,1.460345,-0.463562,-0.145552,-0.018971,-0.092007,-0.171403,-0.166523,-0.079214,-0.015489,-0.130197,-0.186052,-0.325077,7.680677,-0.232025,-0.064922,-0.078444,-0.280933,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,-0.171403,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,-0.127795,-0.10562,-0.051431,-0.095904,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,-0.259994,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,3.480177,-0.024494,-0.010951,-0.378508,-0.451679,-1.233231,-0.238272,0.667634,-0.076881,-0.572596,-0.04245,-0.494785,-0.364714,1.220654,-0.630306,-0.819233,1.821058,-0.760958,-0.209621,-0.106199,-0.254697,-0.507895,-0.078444,-0.087249,-0.269077,-0.051431,-0.336677,-0.245219,-0.236581,5.287236,-0.103865,-0.191839,-0.584043,-0.230298
5119,0.353704,0.726850,0.207643,0.643471,-0.177853,-0.301542,-0.947065,-0.447632,-0.128761,-0.213334,-0.23317,-0.333578,-0.338880,2.754642,-0.173248,1.622334,-0.684770,-0.463562,-0.145552,-0.018971,-0.092007,-0.171403,-0.166523,-0.079214,-0.015489,-0.130197,-0.186052,-0.325077,-0.130197,-0.232025,-0.064922,-0.078444,-0.280933,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,5.834194,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,-0.127795,-0.10562,-0.051431,-0.095904,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,-0.259994,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,-0.287342,-0.024494,-0.010951,-0.378508,2.213962,-1.233231,-0.238272,-1.497827,-0.076881,1.746432,-0.04245,-0.494785,-0.364714,-0.819233,1.586531,-0.819233,-0.549131,1.314133,-0.209621,-0.106199,-0.254697,-0.507895,-0.078444,-0.087249,-0.269077,-0.051431,-0.336677,-0.245219,-0.236581,-0.189135,-0.103865,-0.191839,1.712202,-0.230298
2639,-1.617295,-0.452179,0.207643,-1.619190,-0.833422,-0.634010,-0.115264,-0.447632,-0.128761,-0.213334,-0.23317,-0.333578,-0.338880,-0.363024,-0.173248,-0.616396,1.460345,-0.463562,-0.145552,-0.018971,-0.092007,-0.171403,-0.166523,-0.079214,-0.015489,-0.130197,-0.186052,3.076197,-0.130197,-0.232025,-0.064922,-0.078444,-0.280933,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,-0.171403,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,-0.127795,-0.10562,-0.051431,-0.095904,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,-0.259994,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,-0.287342,-0.024494,-0.010951,-0.378508,-0.451679,0.810878,-0.238272,-1.497827,-0.076881,1.746432,-0.04245,-0.494785,2.741874,-0.819233,-0.630306,-0.819233,1.821058,-0.760958,-0.209621,-0.106199,-0.254697,-0.507895,-0.078444,-0.087249,-0.269077,-0.051431,-0.336677,-0.245219,4.226880,-0.189135,-0.103865,-0.191839,-0.584043,-0.230298
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7813,0.747903,0.211025,0.207643,0.643471,-0.068591,-0.190720,1.081972,2.233979,-0.128761,-0.213334,-0.23317,-0.333578,-0.338880,-0.363024,-0.173248,1.622334,-0.684770,-0.463562,-0.145552,-0.018971,-0.092007,5.834194,-0.166523,-0.079214,-0.015489,-0.130197,-0.186052,-0.325077,-0.130197,-0.232025,-0.064922,-0.078444,-0.280933,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,-0.171403,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,-0.127795,-0.10562,-0.051431,-0.095904,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,-0.259994,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,-0.287342,-0.024494,-0.010951,-0.378508,2.213962,-1.233231,-0.238272,0.667634,-0.076881,-0.572596,-0.04245,2.021078,-0.364714,-0.819233,-0.630306,-0.819233,-0.549131,1.314133,-0.209621,-0.106199,-0.254697,1.968910,-0.078444,-0.087249,-0.269077,-0.051431,-0.336677,-0.245219,-0.236581,-0.189135,-0.103865,-0.191839,-0.584043,-0.230298
10955,0.616503,1.215042,1.324400,0.643471,-1.051945,-0.744832,0.334395,-0.447632,-0.128761,-0.213334,-0.23317,-0.333578,-0.338880,-0.363024,-0.173248,-0.616396,1.460345,-0.463562,-0.145552,-0.018971,-0.092007,-0.171403,-0.166523,-0.079214,-0.015489,-0.130197,-0.186052,-0.325077,-0.130197,-0.232025,-0.064922,-0.078444,-0.280933,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,-0.171403,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,-0.127795,-0.10562,-0.051431,-0.095904,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,3.846240,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,-0.287342,-0.024494,-0.010951,-0.378508,-0.451679,0.810878,-0.238272,0.667634,-0.076881,-0.572596,-0.04245,-0.494785,2.741874,-0.819233,-0.630306,-0.819233,1.821058,-0.760958,-0.209621,-0.106199,-0.254697,-0.507895,-0.078444,-0.087249,-0.269077,-0.051431,-0.336677,4.077992,-0.236581,-0.189135,-0.103865,-0.191839,-0.584043,-0.230298
905,-1.748695,-0.912737,-0.909115,0.643471,-0.068591,-0.190720,-0.817596,-0.447632,-0.128761,-0.213334,-0.23317,-0.333578,2.950898,-0.363024,-0.173248,1.622334,-0.684770,-0.463562,-0.145552,-0.018971,-0.092007,-0.171403,-0.166523,-0.079214,-0.015489,-0.130197,-0.186052,-0.325077,-0.130197,-0.232025,-0.064922,-0.078444,-0.280933,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,-0.171403,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,-0.127795,-0.10562,-0.051431,10.427065,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,-0.259994,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,-0.287342,-0.024494,-0.010951,-0.378508,-0.451679,0.810878,-0.238272,-1.497827,-0.076881,1.746432,-0.04245,-0.494785,-0.364714,1.220654,-0.630306,1.220654,-0.549131,-0.760958,-0.209621,-0.106199,3.926239,-0.507895,-0.078444,-0.087249,-0.269077,-0.051431,-0.336677,-0.245219,-0.236581,-0.189135,-0.103865,-0.191839,-0.584043,-0.230298
5192,-0.171896,-0.783780,-0.909115,-1.619190,0.696239,0.252570,-0.933143,-0.447632,-0.128761,-0.213334,-0.23317,-0.333578,-0.338880,-0.363024,-0.173248,-0.616396,1.460345,-0.463562,-0.145552,-0.018971,-0.092007,-0.171403,-0.166523,-0.079214,-0.015489,-0.130197,-0.186052,-0.325077,-0.130197,-0.232025,-0.064922,-0.078444,-0.280933,-0.208371,-0.015489,-0.037962,-0.200087,-0.16112,-0.171403,-0.14031,-0.065847,-0.109611,-0.129241,-0.116149,-0.051431,-0.071995,-0.03465,-0.186741,-0.018971,-0.173981,-0.138971,-0.220307,-0.115617,-0.088634,7.825046,-0.10562,-0.051431,-0.095904,-0.071995,-0.015489,-0.147686,-0.173615,-0.041008,-0.259994,-0.269589,-0.158357,-0.11017,-0.076881,-0.049032,-0.065847,-0.287342,-0.024494,-0.010951,-0.378508,-0.451679,0.810878,-0.238272,0.667634,-0.076881,-0.572596,-0.04245,-0.494785,-0.364714,1.220654,-0.630306,-0.819233,-0.549131,1.314133,-0.209621,-0.106199,-0.254697,-0.507895,-0.078444,-0.087249,-0.269077,-0.051431,2.970209,-0.245219,-0.236581,-0.189135,-0.103865,-0.191839,-0.584043,-0.230298


In [24]:
X_train.mean()

Year                                2.161706e-15
Engine HP                          -1.261066e-16
Engine Cylinders                   -1.508167e-16
Number of Doors                    -1.328700e-17
highway MPG                        -1.890534e-18
                                        ...     
Vehicle Style_Passenger Minivan     4.558585e-17
Vehicle Style_Passenger Van        -1.704144e-18
Vehicle Style_Regular Cab Pickup   -4.758289e-17
Vehicle Style_Sedan                 7.902967e-17
Vehicle Style_Wagon                -3.195270e-18
Length: 104, dtype: float64

In [25]:
X_train.var()

Year                                1.00012
Engine HP                           1.00012
Engine Cylinders                    1.00012
Number of Doors                     1.00012
highway MPG                         1.00012
                                     ...   
Vehicle Style_Passenger Minivan     1.00012
Vehicle Style_Passenger Van         1.00012
Vehicle Style_Regular Cab Pickup    1.00012
Vehicle Style_Sedan                 1.00012
Vehicle Style_Wagon                 1.00012
Length: 104, dtype: float64

#### Dimensionality Reduction

We have a very high number of columns for our purposes, so let's reduce the data using the PCA to have only 100 columns (to increase training speed).

In [26]:
n_components = 100

pca = PCA(n_components=n_components)
pca.fit(X_train)

X_train = pd.DataFrame(pca.transform(X_train), index=X_train.index, columns=["PC" + str(i) for i in range(1, n_components + 1)])
X_test = pd.DataFrame(pca.transform(X_test), index=X_test.index, columns=["PC" + str(i) for i in range(1, n_components + 1)])

In [27]:
X_train

Unnamed: 0,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC12,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC22,PC23,PC24,PC25,PC26,PC27,PC28,PC29,PC30,PC31,PC32,PC33,PC34,PC35,PC36,PC37,PC38,PC39,PC40,PC41,PC42,PC43,PC44,PC45,PC46,PC47,PC48,PC49,PC50,PC51,PC52,PC53,PC54,PC55,PC56,PC57,PC58,PC59,PC60,PC61,PC62,PC63,PC64,PC65,PC66,PC67,PC68,PC69,PC70,PC71,PC72,PC73,PC74,PC75,PC76,PC77,PC78,PC79,PC80,PC81,PC82,PC83,PC84,PC85,PC86,PC87,PC88,PC89,PC90,PC91,PC92,PC93,PC94,PC95,PC96,PC97,PC98,PC99,PC100
10660,-2.207051,-1.731002,-0.672992,0.735962,0.089681,0.805733,3.850911,2.393780,-2.574375,-0.234215,1.389818,1.121890,0.735338,0.405410,1.692671,-0.796962,-2.080482,-0.487751,-0.136007,1.663321,-1.066629,3.044523,0.692072,-1.352602,-0.559076,5.245817,-1.781763,1.564536,-0.082835,0.355929,-0.403278,-0.273223,2.550713,-0.147681,1.219848,1.237814,0.205473,-2.558826,-2.008032,-1.331898,0.913178,1.140081,0.953897,0.833307,0.070478,1.143713,2.782761,1.266500,-0.504445,1.194861,-0.203016,3.098487,-2.251272,4.329809,0.419076,-0.223972,0.511519,3.947264,-0.023448,-2.031068,-0.316026,-0.135716,-0.127001,-1.244754,0.923157,-0.943006,0.485847,0.961867,0.035192,0.031170,-0.346470,0.772689,-0.524787,0.062135,-0.895099,0.240055,-0.547814,-0.066982,-0.198027,-0.188397,0.916599,0.217476,0.578359,-0.167068,-0.124812,-0.046364,0.180806,0.028738,0.186745,0.062465,0.082588,-0.106493,-0.057495,0.158484,0.012805,-0.001758,1.665255e-14,1.327654e-15,-1.903289e-13,-2.693363e-13
4140,1.693145,-1.095496,1.816292,0.587013,0.390592,-0.147049,-0.863307,-0.279210,-0.840215,-2.233810,-1.623650,0.001733,1.589230,-0.364651,1.116328,-0.150461,2.715235,0.723604,1.458263,1.171158,-1.131596,1.096160,-0.771912,0.749367,1.359572,0.530613,1.404011,1.286141,0.496152,-1.979677,-0.784886,1.222193,-0.251269,-1.243484,0.634861,1.205381,-1.027579,1.379532,0.016777,0.441020,1.182603,-0.649188,-0.227331,0.269929,-0.383046,0.173592,0.689515,-0.345789,0.009572,0.019422,0.013786,-0.717416,-0.648711,0.035128,0.975285,0.466501,-0.003924,-1.178650,1.479297,1.180854,-0.255332,1.707660,-0.715125,-2.089821,0.631304,0.526881,0.027521,0.086603,-0.375033,2.110204,-0.578617,0.211776,0.155274,0.646671,-1.628802,-0.010902,0.058245,-0.553166,0.951555,-0.283118,0.932163,-0.343689,-0.446648,-0.539809,0.631569,-0.560820,0.318433,0.001338,-0.015725,-0.162709,0.012185,-0.171557,-0.527907,-0.086115,-0.119489,0.000046,-1.939498e-13,-2.233815e-13,1.256431e-14,2.600180e-13
11414,-0.270184,-2.718327,-0.045718,1.041036,-0.157851,2.415403,-1.283411,2.186850,0.487343,1.898638,0.472812,0.326319,-1.745511,-1.040533,-0.172832,3.067100,-2.228750,-0.311949,1.674115,-1.237651,1.421185,0.780201,1.137327,2.139966,1.389465,-0.056930,1.531619,-2.336277,-0.095538,0.190968,-0.219467,1.312092,-0.833541,-0.106568,-0.880883,-1.804052,-0.756324,-1.070964,-2.825582,1.867682,1.210335,-0.563787,-0.151830,-1.553503,-0.338508,0.160139,0.124381,0.436240,-0.133952,0.210392,0.026390,-0.682861,-1.306553,0.594717,-0.264051,0.513695,-1.777995,-0.289616,-0.886084,-0.277679,-1.790867,1.350405,-0.411589,-0.281302,-0.517420,0.354359,0.744730,0.393120,-0.877787,0.217220,-0.029268,-0.043208,0.823412,0.481189,-1.309710,0.584854,-0.708625,-0.532766,-0.347208,-0.082384,0.038115,0.126218,-1.219319,-0.328798,0.597440,1.042681,-1.148890,0.056568,-0.884762,-1.705906,-0.068591,0.141449,0.006712,0.327975,-0.003728,0.004031,3.059489e-13,9.973505e-14,-2.936630e-13,-1.112960e-13
5119,2.743703,2.306195,0.328075,-0.906494,0.925906,1.865861,-0.827424,-1.592953,-0.492798,-0.467818,-0.528387,-1.048576,-0.436110,1.692417,-0.670348,-1.807972,-0.277314,-0.374270,-0.566143,-0.440075,0.839806,-0.631305,-0.444690,-0.014224,0.228230,0.245512,0.023492,-1.157936,-1.312686,-1.147275,0.534970,-0.002933,-1.206467,0.516525,0.682086,-1.401310,1.563724,1.165014,-1.600857,-0.873710,-1.243872,0.755237,0.685802,0.020445,-0.565897,0.454184,0.143776,-0.192418,0.217947,-0.282604,-0.232143,0.029611,-0.922444,1.436279,0.001324,-0.283562,0.184244,-0.452941,-0.201598,0.684978,0.972516,-0.105430,0.552853,-1.042321,-0.376445,0.683755,0.508671,1.479354,1.191864,-0.452571,-1.038356,-1.187776,0.337419,-0.325883,-0.691598,-0.170814,0.411940,0.013140,-0.131281,0.620510,1.269893,-0.325728,0.786315,1.168587,-0.513226,0.265292,-0.730042,0.169864,0.249342,0.041979,-0.281796,0.020256,-0.143379,0.019731,0.010881,-0.000210,5.890094e-16,-1.382621e-13,1.160510e-13,-2.945661e-13
2639,-0.052020,-2.754489,-3.401689,0.551487,0.694003,-0.284351,-0.374278,-2.025633,1.086520,-1.357062,1.371472,0.095781,-0.068593,0.445255,-0.710495,-0.658829,0.405817,0.123542,0.568171,-1.229815,1.340132,-0.822233,0.983025,-0.181723,-0.191315,0.579151,-0.417157,1.770810,-0.948908,0.134848,-0.084110,-0.473767,-0.457290,-0.088233,-0.010943,0.025283,-0.218387,-0.087606,0.269231,-0.283767,0.332653,0.110674,-0.186689,-0.336827,-0.056706,-0.047609,0.012360,-0.013670,0.021522,0.033420,-0.066801,0.335367,-0.104369,-0.273722,0.034103,0.499332,-2.105038,0.658060,0.592696,1.678695,-0.023132,0.748486,0.819857,0.387090,-0.002277,0.572633,0.021154,-0.806265,0.565357,0.745568,1.021180,0.364028,0.659822,-0.152463,-0.219430,0.387518,-0.103842,0.261139,0.057350,0.798042,-0.400264,0.967090,-0.620703,-0.093086,-0.792666,0.443017,-0.538841,0.117839,0.017537,-0.055600,0.017261,0.177650,0.135814,-0.993351,0.005957,-0.000582,2.448242e-13,-1.443751e-14,2.341648e-13,7.009986e-13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7813,1.898281,0.938080,3.675175,-1.084082,-0.401794,-1.477482,2.690240,-0.984664,-0.638534,0.029391,0.985928,0.453746,-1.288266,-0.943904,-1.115572,0.440457,0.481356,-1.152554,0.162204,0.193704,-0.036194,-0.008558,0.120523,-0.446539,0.277450,0.188869,0.408524,0.004672,-0.853739,1.302307,-0.113955,-0.120196,0.302924,-0.792128,-1.449858,-1.167321,0.467008,1.111209,-0.395644,-0.703271,1.013077,1.142183,-0.883924,0.260044,-0.524077,-0.235333,0.353820,0.060430,-0.306367,0.357281,0.344645,-0.716551,1.281756,-0.685789,-0.187834,-0.150905,-0.201269,1.187475,-0.292271,0.300517,-0.284961,-0.607663,-0.083940,-0.710461,-0.417967,-0.355544,-0.026817,0.048354,0.110803,0.848436,-0.474364,-1.123433,-0.075497,-0.859779,0.290349,0.785833,0.415462,-0.865150,1.288125,-1.296891,1.263341,0.544490,-0.869586,0.119062,-0.042694,0.169652,0.107599,-0.015007,-0.121295,0.196631,-0.137275,0.112493,0.024362,0.010042,0.005209,0.001394,-1.864560e-14,4.726313e-14,1.857591e-13,8.793348e-14
10955,1.227278,-3.587466,-0.356626,1.419937,0.768819,-0.991572,-0.400437,1.075504,1.029675,-3.432590,1.161339,-0.192087,1.531897,-0.084952,0.148680,1.387949,0.644425,-0.325111,-0.124376,-0.570964,1.120785,-0.825418,-0.575853,0.225108,-1.681978,0.016251,0.043247,-0.657100,-0.145745,0.092636,0.158286,0.272146,-0.216396,-0.088492,0.087808,0.237001,0.832840,-0.452637,0.443415,0.473532,-0.259125,0.053927,0.338708,0.478557,-0.040881,-0.030113,-0.042495,-0.017435,-0.009549,-0.069322,0.112351,-0.459423,-0.040079,-0.313696,0.000689,-0.872283,1.300778,0.033854,-0.701054,-0.334976,-0.023110,-0.660216,-0.374673,0.117431,-0.697530,-2.185235,0.026115,0.724637,0.429400,-0.326269,0.454276,-0.071265,0.137516,0.069863,-0.844711,0.528192,-0.210056,0.130303,0.005324,0.138247,-0.025362,0.519599,-0.058702,0.019478,0.043885,0.168325,-0.108085,-0.135914,0.029783,-0.015110,0.134321,-0.036934,0.041927,0.007665,-0.014724,-0.000120,1.111672e-13,3.364902e-14,2.361103e-14,-2.120488e-13
905,-3.099300,1.697673,-1.532061,-1.405756,-1.691139,-0.648080,-2.033424,-0.923332,-0.407306,-1.262148,-1.250782,2.601860,-2.113718,-1.099885,3.129160,-0.739555,0.911350,-2.804349,-1.892458,3.294606,3.113493,-2.076986,1.792834,-0.703794,1.868960,-0.179623,0.247127,0.219317,-1.285046,-0.441376,1.105451,-0.556315,-0.273241,1.288433,0.671021,-0.122196,0.327724,-0.719850,0.548652,-0.346653,-1.026057,-0.119551,0.536826,-0.490810,0.689196,-0.439787,-0.231441,0.517937,-0.118356,0.081500,0.264107,0.687019,-0.682600,0.805165,0.035090,0.839326,-1.513956,-1.103507,-0.170107,-0.560360,-2.051243,0.774708,-0.047561,0.687040,0.005732,-2.155573,-3.923086,1.343693,-2.478146,1.028518,-0.093987,-1.890046,0.094593,0.755909,-1.094739,-0.845544,0.767493,0.454214,1.658788,0.214621,0.594230,0.010183,0.298532,-0.007875,0.174273,0.170394,-0.360874,-0.046399,-0.087074,0.223941,0.136860,-0.150351,-0.051981,0.038526,-0.001568,-0.002475,1.436551e-13,-1.159952e-13,-2.912456e-14,-1.467677e-13
5192,-1.544841,-0.092466,-0.744454,-1.342110,2.005974,1.204834,-0.403097,0.686877,-0.150184,2.179172,-0.485080,-0.169998,0.180413,1.119320,-0.649628,2.449930,0.947568,1.196361,-1.311149,0.124452,1.526005,0.165767,-0.909000,0.914494,0.553796,0.107837,-0.093785,0.621668,1.305082,1.017246,-0.438246,2.454487,0.906850,-0.110037,-1.868399,1.238738,2.270342,0.024183,2.118492,-2.063657,0.555152,0.854985,-0.365448,-0.899092,0.390680,0.371591,0.680682,0.191855,0.078417,0.327142,-0.354640,1.788834,-1.077235,0.070938,-1.241656,0.416514,0.599940,-0.868457,-0.882316,1.478785,-0.177728,-1.228708,-0.698851,-0.468986,1.120085,0.001718,-0.588869,-1.611988,0.286607,-0.590837,1.891150,-0.553734,-0.762644,-0.360802,-0.059922,-0.042974,0.508284,-0.062243,0.103704,0.510727,0.717021,0.225056,-0.675059,-0.683618,0.004620,-0.154703,-0.380331,0.081952,0.063564,0.110272,0.093182,-0.044453,-0.074846,-0.027504,0.044361,0.001209,-5.508174e-14,-4.635643e-14,1.709099e-13,-5.239871e-14


### Training

In [28]:
models = {
    "                     Linear Regression": LinearRegression(),
    " Linear Regression (L2 Regularization)": Ridge(),
    " Linear Regression (L1 Regularization)": Lasso(),
    "                   K-Nearest Neighbors": KNeighborsRegressor(),
    "                        Neural Network": MLPRegressor(),
    "Support Vector Machine (Linear Kernel)": LinearSVR(),
    "   Support Vector Machine (RBF Kernel)": SVR(),
    "                         Decision Tree": DecisionTreeRegressor(),
    "                         Random Forest": RandomForestRegressor(),
    "                     Gradient Boosting": GradientBoostingRegressor()
}

In [29]:
for name, model in models.items():
    model.fit(X_train, y_train)
    print(name + " trained.")

                     Linear Regression trained.
 Linear Regression (L2 Regularization) trained.
 Linear Regression (L1 Regularization) trained.
                   K-Nearest Neighbors trained.




                        Neural Network trained.
Support Vector Machine (Linear Kernel) trained.
   Support Vector Machine (RBF Kernel) trained.
                         Decision Tree trained.
                         Random Forest trained.
                     Gradient Boosting trained.


In [30]:
X_train.var()

PC1      6.272760e+00
PC2      4.775776e+00
PC3      4.388434e+00
PC4      3.298823e+00
PC5      2.620420e+00
             ...     
PC96     6.054559e-04
PC97     8.836150e-26
PC98     9.433111e-26
PC99     3.940199e-25
PC100    4.698604e-26
Length: 100, dtype: float64

### Results

In [32]:
for name, model in models.items():
    print(name + " R^2 Score: {:.5f}".format(model.score(X_test, y_test)))

                     Linear Regression R^2 Score: 0.79505
 Linear Regression (L2 Regularization) R^2 Score: 0.79506
 Linear Regression (L1 Regularization) R^2 Score: 0.79508


Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md



                   K-Nearest Neighbors R^2 Score: 0.76867
                        Neural Network R^2 Score: 0.48400
Support Vector Machine (Linear Kernel) R^2 Score: -0.25656
   Support Vector Machine (RBF Kernel) R^2 Score: -0.02933
                         Decision Tree R^2 Score: 0.87070
                         Random Forest R^2 Score: 0.87457
                     Gradient Boosting R^2 Score: 0.86735
