## Lyft/Uber Price Prediction

Given *data about Lyft and Uber rides*, let's try to predict the **price** of a given ride.

We will use a linear regression model to make our predictions.

Data source: https://www.kaggle.com/datasets/ravi72munde/uber-lyft-cab-prices?select=cab_rides.csv

### Importing Libraries

In [24]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LinearRegression

In [25]:
rides_df = pd.read_csv('cab_rides.csv')
weather_df = pd.read_csv('weather.csv')

In [26]:
rides_df

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,id,product_id,name
0,0.44,Lyft,1544952607890,North Station,Haymarket Square,5.0,1.0,424553bb-7174-41ea-aeb4-fe06d4f4b9d7,lyft_line,Shared
1,0.44,Lyft,1543284023677,North Station,Haymarket Square,11.0,1.0,4bd23055-6827-41c6-b23b-3c491f24e74d,lyft_premier,Lux
2,0.44,Lyft,1543366822198,North Station,Haymarket Square,7.0,1.0,981a3613-77af-4620-a42a-0c0866077d1e,lyft,Lyft
3,0.44,Lyft,1543553582749,North Station,Haymarket Square,26.0,1.0,c2d88af2-d278-4bfd-a8d0-29ca77cc5512,lyft_luxsuv,Lux Black XL
4,0.44,Lyft,1543463360223,North Station,Haymarket Square,9.0,1.0,e0126e1f-8ca9-4f2e-82b3-50505a09db9a,lyft_plus,Lyft XL
...,...,...,...,...,...,...,...,...,...,...
693066,1.00,Uber,1543708385534,North End,West End,13.0,1.0,616d3611-1820-450a-9845-a9ff304a4842,6f72dfc5-27f1-42e8-84db-ccc7a75f6969,UberXL
693067,1.00,Uber,1543708385534,North End,West End,9.5,1.0,633a3fc3-1f86-4b9e-9d48-2b7132112341,55c66225-fbe7-4fd5-9072-eab1ece5e23e,UberX
693068,1.00,Uber,1543708385534,North End,West End,,1.0,64d451d0-639f-47a4-9b7c-6fd92fbd264f,8cf7e821-f0d3-49c6-8eba-e679c0ebcf6a,Taxi
693069,1.00,Uber,1543708385534,North End,West End,27.0,1.0,727e5f07-a96b-4ad1-a2c7-9abc3ad55b4e,6d318bcc-22a3-4af6-bddd-b409bfce1546,Black SUV


In [27]:
weather_df

Unnamed: 0,temp,location,clouds,pressure,rain,time_stamp,humidity,wind
0,42.42,Back Bay,1.00,1012.14,0.1228,1545003901,0.77,11.25
1,42.43,Beacon Hill,1.00,1012.15,0.1846,1545003901,0.76,11.32
2,42.50,Boston University,1.00,1012.15,0.1089,1545003901,0.76,11.07
3,42.11,Fenway,1.00,1012.13,0.0969,1545003901,0.77,11.09
4,43.13,Financial District,1.00,1012.14,0.1786,1545003901,0.75,11.49
...,...,...,...,...,...,...,...,...
6271,44.72,North Station,0.89,1000.69,,1543819974,0.96,1.52
6272,44.85,Northeastern University,0.88,1000.71,,1543819974,0.96,1.54
6273,44.82,South Station,0.89,1000.70,,1543819974,0.96,1.54
6274,44.78,Theatre District,0.89,1000.70,,1543819974,0.96,1.54


In [28]:
rides_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 693071 entries, 0 to 693070
Data columns (total 10 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   distance          693071 non-null  float64
 1   cab_type          693071 non-null  object 
 2   time_stamp        693071 non-null  int64  
 3   destination       693071 non-null  object 
 4   source            693071 non-null  object 
 5   price             637976 non-null  float64
 6   surge_multiplier  693071 non-null  float64
 7   id                693071 non-null  object 
 8   product_id        693071 non-null  object 
 9   name              693071 non-null  object 
dtypes: float64(3), int64(1), object(6)
memory usage: 52.9+ MB


In [29]:
weather_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6276 entries, 0 to 6275
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   temp        6276 non-null   float64
 1   location    6276 non-null   object 
 2   clouds      6276 non-null   float64
 3   pressure    6276 non-null   float64
 4   rain        894 non-null    float64
 5   time_stamp  6276 non-null   int64  
 6   humidity    6276 non-null   float64
 7   wind        6276 non-null   float64
dtypes: float64(6), int64(1), object(1)
memory usage: 392.4+ KB


#### Cleaning Ride Data

In [30]:
rides_df

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,id,product_id,name
0,0.44,Lyft,1544952607890,North Station,Haymarket Square,5.0,1.0,424553bb-7174-41ea-aeb4-fe06d4f4b9d7,lyft_line,Shared
1,0.44,Lyft,1543284023677,North Station,Haymarket Square,11.0,1.0,4bd23055-6827-41c6-b23b-3c491f24e74d,lyft_premier,Lux
2,0.44,Lyft,1543366822198,North Station,Haymarket Square,7.0,1.0,981a3613-77af-4620-a42a-0c0866077d1e,lyft,Lyft
3,0.44,Lyft,1543553582749,North Station,Haymarket Square,26.0,1.0,c2d88af2-d278-4bfd-a8d0-29ca77cc5512,lyft_luxsuv,Lux Black XL
4,0.44,Lyft,1543463360223,North Station,Haymarket Square,9.0,1.0,e0126e1f-8ca9-4f2e-82b3-50505a09db9a,lyft_plus,Lyft XL
...,...,...,...,...,...,...,...,...,...,...
693066,1.00,Uber,1543708385534,North End,West End,13.0,1.0,616d3611-1820-450a-9845-a9ff304a4842,6f72dfc5-27f1-42e8-84db-ccc7a75f6969,UberXL
693067,1.00,Uber,1543708385534,North End,West End,9.5,1.0,633a3fc3-1f86-4b9e-9d48-2b7132112341,55c66225-fbe7-4fd5-9072-eab1ece5e23e,UberX
693068,1.00,Uber,1543708385534,North End,West End,,1.0,64d451d0-639f-47a4-9b7c-6fd92fbd264f,8cf7e821-f0d3-49c6-8eba-e679c0ebcf6a,Taxi
693069,1.00,Uber,1543708385534,North End,West End,27.0,1.0,727e5f07-a96b-4ad1-a2c7-9abc3ad55b4e,6d318bcc-22a3-4af6-bddd-b409bfce1546,Black SUV


In [31]:
rides_df.isna().sum()

distance                0
cab_type                0
time_stamp              0
destination             0
source                  0
price               55095
surge_multiplier        0
id                      0
product_id              0
name                    0
dtype: int64

In [32]:
rides_df = rides_df.dropna(axis=0).reset_index(drop=True)

#### Cleaning Weather Data

In [33]:
weather_df.isna().sum()

temp             0
location         0
clouds           0
pressure         0
rain          5382
time_stamp       0
humidity         0
wind             0
dtype: int64

In [34]:
weather_df = weather_df.fillna(0)

#### Creating Average Weather DataFrame

In [35]:
weather_df

Unnamed: 0,temp,location,clouds,pressure,rain,time_stamp,humidity,wind
0,42.42,Back Bay,1.00,1012.14,0.1228,1545003901,0.77,11.25
1,42.43,Beacon Hill,1.00,1012.15,0.1846,1545003901,0.76,11.32
2,42.50,Boston University,1.00,1012.15,0.1089,1545003901,0.76,11.07
3,42.11,Fenway,1.00,1012.13,0.0969,1545003901,0.77,11.09
4,43.13,Financial District,1.00,1012.14,0.1786,1545003901,0.75,11.49
...,...,...,...,...,...,...,...,...
6271,44.72,North Station,0.89,1000.69,0.0000,1543819974,0.96,1.52
6272,44.85,Northeastern University,0.88,1000.71,0.0000,1543819974,0.96,1.54
6273,44.82,South Station,0.89,1000.70,0.0000,1543819974,0.96,1.54
6274,44.78,Theatre District,0.89,1000.70,0.0000,1543819974,0.96,1.54


In [36]:
weather_df.groupby('location').mean()

Unnamed: 0_level_0,temp,clouds,pressure,rain,time_stamp,humidity,wind
location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Back Bay,39.082122,0.678432,1008.44782,0.007925,1543857000.0,0.764073,6.778528
Beacon Hill,39.047285,0.677801,1008.448356,0.008297,1543857000.0,0.765048,6.810325
Boston University,39.047744,0.679235,1008.459254,0.007738,1543857000.0,0.763786,6.69218
Fenway,38.964379,0.679866,1008.453289,0.007343,1543857000.0,0.767266,6.711721
Financial District,39.410822,0.67673,1008.435793,0.008563,1543857000.0,0.754837,6.860019
Haymarket Square,39.067897,0.676711,1008.445239,0.00866,1543857000.0,0.764837,6.843193
North End,39.090841,0.67673,1008.441912,0.008644,1543857000.0,0.764054,6.853117
North Station,39.035315,0.676998,1008.442811,0.008649,1543857000.0,0.765545,6.835755
Northeastern University,38.975086,0.678317,1008.444168,0.007358,1543857000.0,0.767648,6.749426
South Station,39.394092,0.677495,1008.438031,0.00831,1543857000.0,0.755468,6.848948


In [37]:
avg_weather_df = weather_df.groupby('location').mean().reset_index(drop=False)
avg_weather_df = avg_weather_df.drop('time_stamp', axis=1)
avg_weather_df

Unnamed: 0,location,temp,clouds,pressure,rain,humidity,wind
0,Back Bay,39.082122,0.678432,1008.44782,0.007925,0.764073,6.778528
1,Beacon Hill,39.047285,0.677801,1008.448356,0.008297,0.765048,6.810325
2,Boston University,39.047744,0.679235,1008.459254,0.007738,0.763786,6.69218
3,Fenway,38.964379,0.679866,1008.453289,0.007343,0.767266,6.711721
4,Financial District,39.410822,0.67673,1008.435793,0.008563,0.754837,6.860019
5,Haymarket Square,39.067897,0.676711,1008.445239,0.00866,0.764837,6.843193
6,North End,39.090841,0.67673,1008.441912,0.008644,0.764054,6.853117
7,North Station,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
8,Northeastern University,38.975086,0.678317,1008.444168,0.007358,0.767648,6.749426
9,South Station,39.394092,0.677495,1008.438031,0.00831,0.755468,6.848948


#### Merging DataFrames

In [38]:
rides_df

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,id,product_id,name
0,0.44,Lyft,1544952607890,North Station,Haymarket Square,5.0,1.0,424553bb-7174-41ea-aeb4-fe06d4f4b9d7,lyft_line,Shared
1,0.44,Lyft,1543284023677,North Station,Haymarket Square,11.0,1.0,4bd23055-6827-41c6-b23b-3c491f24e74d,lyft_premier,Lux
2,0.44,Lyft,1543366822198,North Station,Haymarket Square,7.0,1.0,981a3613-77af-4620-a42a-0c0866077d1e,lyft,Lyft
3,0.44,Lyft,1543553582749,North Station,Haymarket Square,26.0,1.0,c2d88af2-d278-4bfd-a8d0-29ca77cc5512,lyft_luxsuv,Lux Black XL
4,0.44,Lyft,1543463360223,North Station,Haymarket Square,9.0,1.0,e0126e1f-8ca9-4f2e-82b3-50505a09db9a,lyft_plus,Lyft XL
...,...,...,...,...,...,...,...,...,...,...
637971,1.00,Uber,1543708385534,North End,West End,9.5,1.0,353e6566-b272-479e-a9c6-98bd6cb23f25,9a0e7b09-b92b-4c41-9779-2ad22b4d779d,WAV
637972,1.00,Uber,1543708385534,North End,West End,13.0,1.0,616d3611-1820-450a-9845-a9ff304a4842,6f72dfc5-27f1-42e8-84db-ccc7a75f6969,UberXL
637973,1.00,Uber,1543708385534,North End,West End,9.5,1.0,633a3fc3-1f86-4b9e-9d48-2b7132112341,55c66225-fbe7-4fd5-9072-eab1ece5e23e,UberX
637974,1.00,Uber,1543708385534,North End,West End,27.0,1.0,727e5f07-a96b-4ad1-a2c7-9abc3ad55b4e,6d318bcc-22a3-4af6-bddd-b409bfce1546,Black SUV


In [39]:
rides_df['source'].unique()

array(['Haymarket Square', 'Back Bay', 'North End', 'North Station',
       'Beacon Hill', 'Boston University', 'Fenway', 'South Station',
       'Theatre District', 'West End', 'Financial District',
       'Northeastern University'], dtype=object)

In [40]:
source_weather_df = avg_weather_df.rename(
    columns={
        'location': 'source',
        'temp': 'source_temp',
        'clouds': 'source_clouds',
        'pressure': 'source_pressure',
        'rain': 'source_rain',
        'humidity': 'source_humidity',
        'wind': 'source_wind'
    }
)

source_weather_df

Unnamed: 0,source,source_temp,source_clouds,source_pressure,source_rain,source_humidity,source_wind
0,Back Bay,39.082122,0.678432,1008.44782,0.007925,0.764073,6.778528
1,Beacon Hill,39.047285,0.677801,1008.448356,0.008297,0.765048,6.810325
2,Boston University,39.047744,0.679235,1008.459254,0.007738,0.763786,6.69218
3,Fenway,38.964379,0.679866,1008.453289,0.007343,0.767266,6.711721
4,Financial District,39.410822,0.67673,1008.435793,0.008563,0.754837,6.860019
5,Haymarket Square,39.067897,0.676711,1008.445239,0.00866,0.764837,6.843193
6,North End,39.090841,0.67673,1008.441912,0.008644,0.764054,6.853117
7,North Station,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
8,Northeastern University,38.975086,0.678317,1008.444168,0.007358,0.767648,6.749426
9,South Station,39.394092,0.677495,1008.438031,0.00831,0.755468,6.848948


In [41]:
destination_weather_df = avg_weather_df.rename(
    columns={
        'location': 'destination',
        'temp': 'destination_temp',
        'clouds': 'destination_clouds',
        'pressure': 'destination_pressure',
        'rain': 'destination_rain',
        'humidity': 'destination_humidity',
        'wind': 'destination_wind'
    }
)

destination_weather_df

Unnamed: 0,destination,destination_temp,destination_clouds,destination_pressure,destination_rain,destination_humidity,destination_wind
0,Back Bay,39.082122,0.678432,1008.44782,0.007925,0.764073,6.778528
1,Beacon Hill,39.047285,0.677801,1008.448356,0.008297,0.765048,6.810325
2,Boston University,39.047744,0.679235,1008.459254,0.007738,0.763786,6.69218
3,Fenway,38.964379,0.679866,1008.453289,0.007343,0.767266,6.711721
4,Financial District,39.410822,0.67673,1008.435793,0.008563,0.754837,6.860019
5,Haymarket Square,39.067897,0.676711,1008.445239,0.00866,0.764837,6.843193
6,North End,39.090841,0.67673,1008.441912,0.008644,0.764054,6.853117
7,North Station,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
8,Northeastern University,38.975086,0.678317,1008.444168,0.007358,0.767648,6.749426
9,South Station,39.394092,0.677495,1008.438031,0.00831,0.755468,6.848948


In [42]:
data = rides_df\
    .merge(source_weather_df, on='source')\
    .merge(destination_weather_df, on='destination')

data

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,id,product_id,name,source_temp,source_clouds,source_pressure,source_rain,source_humidity,source_wind,destination_temp,destination_clouds,destination_pressure,destination_rain,destination_humidity,destination_wind
0,0.44,Lyft,1544952607890,North Station,Haymarket Square,5.0,1.0,424553bb-7174-41ea-aeb4-fe06d4f4b9d7,lyft_line,Shared,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
1,0.44,Lyft,1543284023677,North Station,Haymarket Square,11.0,1.0,4bd23055-6827-41c6-b23b-3c491f24e74d,lyft_premier,Lux,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
2,0.44,Lyft,1543366822198,North Station,Haymarket Square,7.0,1.0,981a3613-77af-4620-a42a-0c0866077d1e,lyft,Lyft,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
3,0.44,Lyft,1543553582749,North Station,Haymarket Square,26.0,1.0,c2d88af2-d278-4bfd-a8d0-29ca77cc5512,lyft_luxsuv,Lux Black XL,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
4,0.44,Lyft,1543463360223,North Station,Haymarket Square,9.0,1.0,e0126e1f-8ca9-4f2e-82b3-50505a09db9a,lyft_plus,Lyft XL,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
637971,1.00,Uber,1543708385534,North End,West End,9.5,1.0,353e6566-b272-479e-a9c6-98bd6cb23f25,9a0e7b09-b92b-4c41-9779-2ad22b4d779d,WAV,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117
637972,1.00,Uber,1543708385534,North End,West End,13.0,1.0,616d3611-1820-450a-9845-a9ff304a4842,6f72dfc5-27f1-42e8-84db-ccc7a75f6969,UberXL,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117
637973,1.00,Uber,1543708385534,North End,West End,9.5,1.0,633a3fc3-1f86-4b9e-9d48-2b7132112341,55c66225-fbe7-4fd5-9072-eab1ece5e23e,UberX,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117
637974,1.00,Uber,1543708385534,North End,West End,27.0,1.0,727e5f07-a96b-4ad1-a2c7-9abc3ad55b4e,6d318bcc-22a3-4af6-bddd-b409bfce1546,Black SUV,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117


### Preprocessing

In [43]:
df = data.copy()

In [44]:
# Drop unneeded columns
df = df.drop(['id',], axis=1)
df

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,product_id,name,source_temp,source_clouds,source_pressure,source_rain,source_humidity,source_wind,destination_temp,destination_clouds,destination_pressure,destination_rain,destination_humidity,destination_wind
0,0.44,Lyft,1544952607890,North Station,Haymarket Square,5.0,1.0,lyft_line,Shared,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
1,0.44,Lyft,1543284023677,North Station,Haymarket Square,11.0,1.0,lyft_premier,Lux,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
2,0.44,Lyft,1543366822198,North Station,Haymarket Square,7.0,1.0,lyft,Lyft,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
3,0.44,Lyft,1543553582749,North Station,Haymarket Square,26.0,1.0,lyft_luxsuv,Lux Black XL,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
4,0.44,Lyft,1543463360223,North Station,Haymarket Square,9.0,1.0,lyft_plus,Lyft XL,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
637971,1.00,Uber,1543708385534,North End,West End,9.5,1.0,9a0e7b09-b92b-4c41-9779-2ad22b4d779d,WAV,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117
637972,1.00,Uber,1543708385534,North End,West End,13.0,1.0,6f72dfc5-27f1-42e8-84db-ccc7a75f6969,UberXL,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117
637973,1.00,Uber,1543708385534,North End,West End,9.5,1.0,55c66225-fbe7-4fd5-9072-eab1ece5e23e,UberX,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117
637974,1.00,Uber,1543708385534,North End,West End,27.0,1.0,6d318bcc-22a3-4af6-bddd-b409bfce1546,Black SUV,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117


In [46]:
{column: df[column].unique() for column in df.select_dtypes('object').columns}

{'cab_type': array(['Lyft', 'Uber'], dtype=object),
 'destination': array(['North Station', 'Northeastern University', 'West End',
        'Haymarket Square', 'South Station', 'Fenway', 'Theatre District',
        'Beacon Hill', 'Back Bay', 'North End', 'Financial District',
        'Boston University'], dtype=object),
 'source': array(['Haymarket Square', 'Back Bay', 'North End', 'North Station',
        'Beacon Hill', 'Boston University', 'Fenway', 'South Station',
        'Theatre District', 'West End', 'Financial District',
        'Northeastern University'], dtype=object),
 'product_id': array(['lyft_line', 'lyft_premier', 'lyft', 'lyft_luxsuv', 'lyft_plus',
        'lyft_lux', '6f72dfc5-27f1-42e8-84db-ccc7a75f6969',
        '6c84fd89-3f11-4782-9b50-97c468b19529',
        '55c66225-fbe7-4fd5-9072-eab1ece5e23e',
        '9a0e7b09-b92b-4c41-9779-2ad22b4d779d',
        '6d318bcc-22a3-4af6-bddd-b409bfce1546',
        '997acbb5-e102-41e1-b155-9df7de0a73f2'], dtype=object),
 'name': arr

In [47]:
# Binary encode cab_type column
df['cab_type'] = df['cab_type'].replace({'Lyft': 0, 'Uber': 1})

  df['cab_type'] = df['cab_type'].replace({'Lyft': 0, 'Uber': 1})


In [48]:
# One hot encode remaining categorical columns
def onehot_encode(df, column, prefix):
    df = df.copy()
    dummies = pd.get_dummies(df[column], prefix=prefix)
    df = pd.concat([df, dummies], axis=1)
    df = df.drop(column, axis=1)
    return df

In [49]:
for column, prefix in [('destination', 'dest'), ('source', 'src'), ('product_id', 'pid'),('name', 'nm')]:
    df = onehot_encode(df, column, prefix)

In [50]:
df

Unnamed: 0,distance,cab_type,time_stamp,price,surge_multiplier,source_temp,source_clouds,source_pressure,source_rain,source_humidity,source_wind,destination_temp,destination_clouds,destination_pressure,destination_rain,destination_humidity,destination_wind,dest_Back Bay,dest_Beacon Hill,dest_Boston University,dest_Fenway,dest_Financial District,dest_Haymarket Square,dest_North End,dest_North Station,dest_Northeastern University,dest_South Station,dest_Theatre District,dest_West End,src_Back Bay,src_Beacon Hill,src_Boston University,src_Fenway,src_Financial District,src_Haymarket Square,src_North End,src_North Station,src_Northeastern University,src_South Station,src_Theatre District,src_West End,pid_55c66225-fbe7-4fd5-9072-eab1ece5e23e,pid_6c84fd89-3f11-4782-9b50-97c468b19529,pid_6d318bcc-22a3-4af6-bddd-b409bfce1546,pid_6f72dfc5-27f1-42e8-84db-ccc7a75f6969,pid_997acbb5-e102-41e1-b155-9df7de0a73f2,pid_9a0e7b09-b92b-4c41-9779-2ad22b4d779d,pid_lyft,pid_lyft_line,pid_lyft_lux,pid_lyft_luxsuv,pid_lyft_plus,pid_lyft_premier,nm_Black,nm_Black SUV,nm_Lux,nm_Lux Black,nm_Lux Black XL,nm_Lyft,nm_Lyft XL,nm_Shared,nm_UberPool,nm_UberX,nm_UberXL,nm_WAV
0,0.44,0,1544952607890,5.0,1.0,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False
1,0.44,0,1543284023677,11.0,1.0,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False
2,0.44,0,1543366822198,7.0,1.0,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
3,0.44,0,1543553582749,26.0,1.0,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False
4,0.44,0,1543463360223,9.0,1.0,39.067897,0.676711,1008.445239,0.008660,0.764837,6.843193,39.035315,0.676998,1008.442811,0.008649,0.765545,6.835755,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
637971,1.00,1,1543708385534,9.5,1.0,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
637972,1.00,1,1543708385534,13.0,1.0,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False
637973,1.00,1,1543708385534,9.5,1.0,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False
637974,1.00,1,1543708385534,27.0,1.0,38.983403,0.677247,1008.441090,0.008657,0.767266,6.816233,39.090841,0.676730,1008.441912,0.008644,0.764054,6.853117,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False


In [51]:
# Split df into X and y
y = df['price']
X = df.drop('price', axis=1)

In [52]:
# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

In [53]:
X_train.shape, X_test.shape

((446583, 64), (191393, 64))

In [54]:
# Scale X
scaler = StandardScaler()
scaler.fit(X_train)

X_train = pd.DataFrame(scaler.transform(X_train), columns=X_train.columns, index=X_train.index)
X_test = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns, index=X_test.index)

In [55]:
X_train

Unnamed: 0,distance,cab_type,time_stamp,surge_multiplier,source_temp,source_clouds,source_pressure,source_rain,source_humidity,source_wind,destination_temp,destination_clouds,destination_pressure,destination_rain,destination_humidity,destination_wind,dest_Back Bay,dest_Beacon Hill,dest_Boston University,dest_Fenway,dest_Financial District,dest_Haymarket Square,dest_North End,dest_North Station,dest_Northeastern University,dest_South Station,dest_Theatre District,dest_West End,src_Back Bay,src_Beacon Hill,src_Boston University,src_Fenway,src_Financial District,src_Haymarket Square,src_North End,src_North Station,src_Northeastern University,src_South Station,src_Theatre District,src_West End,pid_55c66225-fbe7-4fd5-9072-eab1ece5e23e,pid_6c84fd89-3f11-4782-9b50-97c468b19529,pid_6d318bcc-22a3-4af6-bddd-b409bfce1546,pid_6f72dfc5-27f1-42e8-84db-ccc7a75f6969,pid_997acbb5-e102-41e1-b155-9df7de0a73f2,pid_9a0e7b09-b92b-4c41-9779-2ad22b4d779d,pid_lyft,pid_lyft_line,pid_lyft_lux,pid_lyft_luxsuv,pid_lyft_plus,pid_lyft_premier,nm_Black,nm_Black SUV,nm_Lux,nm_Lux Black,nm_Lux Black XL,nm_Lyft,nm_Lyft XL,nm_Shared,nm_UberPool,nm_UberX,nm_UberXL,nm_WAV
114027,-0.782551,0.965841,-0.818626,-0.158351,2.075265,-0.286937,-1.161925,0.203658,-2.015513,0.842019,-0.716205,-0.013349,-0.074138,0.401278,0.956172,0.574275,-0.302080,-0.300701,-0.301996,-0.301073,-0.304655,-0.302005,-0.302098,-0.299415,-0.300825,-0.301475,3.318972,-0.300497,-0.301422,-0.300391,-0.301546,-0.302177,-0.30490,-0.301448,-0.301369,-0.30001,-0.300431,3.314645,-0.301484,-0.301249,-0.307093,-0.307045,-0.308335,-0.306700,-0.306923,3.25690,-0.295912,-0.295768,-0.295813,-0.296059,-0.295549,-0.295464,-0.307045,-0.308335,-0.295464,-0.295813,-0.296059,-0.295912,-0.295549,-0.295768,-0.306923,-0.307093,-0.306700,3.25690
597473,-0.350616,0.965841,-0.822991,-0.158351,-0.002167,-1.066915,-0.532527,0.901710,0.020939,0.918199,-0.300904,0.025680,0.511918,0.176479,0.257154,0.136007,-0.302080,3.325565,-0.301996,-0.301073,-0.304655,-0.302005,-0.302098,-0.299415,-0.300825,-0.301475,-0.301298,-0.300497,-0.301422,-0.300391,-0.301546,-0.302177,-0.30490,-0.301448,3.318193,-0.30001,-0.300431,-0.301691,-0.301484,-0.301249,3.256344,-0.307045,-0.308335,-0.306700,-0.306923,-0.30704,-0.295912,-0.295768,-0.295813,-0.296059,-0.295549,-0.295464,-0.307045,-0.308335,-0.295464,-0.295813,-0.296059,-0.295912,-0.295549,-0.295768,-0.306923,3.256344,-0.306700,-0.30704
342796,0.592589,-1.035367,1.501498,-0.158351,-0.738175,-0.540430,-0.665848,0.929696,0.782908,0.244114,-0.297758,1.489255,2.279388,-0.993513,-0.042425,-2.023527,-0.302080,-0.300701,3.311301,-0.301073,-0.304655,-0.302005,-0.302098,-0.299415,-0.300825,-0.301475,-0.301298,-0.300497,-0.301422,-0.300391,-0.301546,-0.302177,-0.30490,-0.301448,-0.301369,-0.30001,-0.300431,-0.301691,-0.301484,3.319508,-0.307093,-0.307045,-0.308335,-0.306700,-0.306923,-0.30704,-0.295912,-0.295768,-0.295813,3.377703,-0.295549,-0.295464,-0.307045,-0.308335,-0.295464,-0.295813,3.377703,-0.295912,-0.295549,-0.295768,-0.306923,-0.307093,-0.306700,-0.30704
64532,0.901114,0.965841,-0.842908,-0.158351,-0.795154,0.551539,-0.166669,-1.788149,0.873618,-0.976858,-0.382968,-0.793922,-0.387322,0.912474,0.375170,0.600837,-0.302080,-0.300701,-0.301996,-0.301073,-0.304655,-0.302005,-0.302098,3.339845,-0.300825,-0.301475,-0.301298,-0.300497,-0.301422,-0.300391,-0.301546,-0.302177,-0.30490,-0.301448,-0.301369,-0.30001,3.328555,-0.301691,-0.301484,-0.301249,3.256344,-0.307045,-0.308335,-0.306700,-0.306923,-0.30704,-0.295912,-0.295768,-0.295813,-0.296059,-0.295549,-0.295464,-0.307045,-0.308335,-0.295464,-0.295813,-0.296059,-0.295912,-0.295549,-0.295768,-0.306923,3.256344,-0.306700,-0.30704
189601,0.239989,0.965841,-0.456494,-0.158351,-0.868506,2.130993,1.312262,-1.818534,0.782908,-1.665970,-0.300904,0.025680,0.511918,0.176479,0.257154,0.136007,-0.302080,3.325565,-0.301996,-0.301073,-0.304655,-0.302005,-0.302098,-0.299415,-0.300825,-0.301475,-0.301298,-0.300497,-0.301422,-0.300391,-0.301546,3.309318,-0.30490,-0.301448,-0.301369,-0.30001,-0.300431,-0.301691,-0.301484,-0.301249,3.256344,-0.307045,-0.308335,-0.306700,-0.306923,-0.30704,-0.295912,-0.295768,-0.295813,-0.296059,-0.295549,-0.295464,-0.307045,-0.308335,-0.295464,-0.295813,-0.296059,-0.295912,-0.295549,-0.295768,-0.306923,3.256344,-0.306700,-0.30704
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
371403,4.647491,0.965841,-0.464752,-0.158351,2.189878,-1.066915,-1.524682,0.733394,-2.165185,1.044350,-0.795909,0.552567,-0.167163,-1.789107,0.874469,-0.977137,-0.302080,-0.300701,-0.301996,-0.301073,-0.304655,-0.302005,-0.302098,-0.299415,3.324195,-0.301475,-0.301298,-0.300497,-0.301422,-0.300391,-0.301546,-0.302177,3.27976,-0.301448,-0.301369,-0.30001,-0.300431,-0.301691,-0.301484,-0.301249,-0.307093,-0.307045,3.243221,-0.306700,-0.306923,-0.30704,-0.295912,-0.295768,-0.295813,-0.296059,-0.295549,-0.295464,-0.307045,3.243221,-0.295464,-0.295813,-0.296059,-0.295912,-0.295549,-0.295768,-0.306923,-0.307093,-0.306700,-0.30704
491263,-0.606251,0.965841,1.004871,-0.158351,-0.868506,2.130993,1.312262,-1.818534,0.782908,-1.665970,-0.062054,0.669653,0.425095,-0.601515,0.025662,-0.445204,3.310382,-0.300701,-0.301996,-0.301073,-0.304655,-0.302005,-0.302098,-0.299415,-0.300825,-0.301475,-0.301298,-0.300497,-0.301422,-0.300391,-0.301546,3.309318,-0.30490,-0.301448,-0.301369,-0.30001,-0.300431,-0.301691,-0.301484,-0.301249,-0.307093,-0.307045,-0.308335,3.260518,-0.306923,-0.30704,-0.295912,-0.295768,-0.295813,-0.296059,-0.295549,-0.295464,-0.307045,-0.308335,-0.295464,-0.295813,-0.296059,-0.295912,-0.295549,-0.295768,-0.306923,-0.307093,3.260518,-0.30704
470924,-0.817811,0.965841,1.459287,-0.158351,-0.738175,-0.540430,-0.665848,0.929696,0.782908,0.244114,2.076848,-0.286549,-1.162528,0.203679,-2.016923,0.841989,-0.302080,-0.300701,-0.301996,-0.301073,-0.304655,-0.302005,-0.302098,-0.299415,-0.300825,3.317026,-0.301298,-0.300497,-0.301422,-0.300391,-0.301546,-0.302177,-0.30490,-0.301448,-0.301369,-0.30001,-0.300431,-0.301691,-0.301484,3.319508,-0.307093,-0.307045,-0.308335,-0.306700,3.258151,-0.30704,-0.295912,-0.295768,-0.295813,-0.296059,-0.295549,-0.295464,-0.307045,-0.308335,-0.295464,-0.295813,-0.296059,-0.295912,-0.295549,-0.295768,3.258151,-0.307093,-0.306700,-0.30704
491755,-0.429951,-1.035367,-0.703720,-0.158351,-0.715515,-0.013945,-0.073655,0.401160,0.955258,0.574342,-0.159587,-1.086637,0.006483,0.935274,0.207225,0.736790,-0.302080,-0.300701,-0.301996,-0.301073,-0.304655,3.311204,-0.302098,-0.299415,-0.300825,-0.301475,-0.301298,-0.300497,-0.301422,-0.300391,-0.301546,-0.302177,-0.30490,-0.301448,-0.301369,-0.30001,-0.300431,-0.301691,3.316929,-0.301249,-0.307093,-0.307045,-0.308335,-0.306700,-0.306923,-0.30704,3.379388,-0.295768,-0.295813,-0.296059,-0.295549,-0.295464,-0.307045,-0.308335,-0.295464,-0.295813,-0.296059,3.379388,-0.295549,-0.295768,-0.306923,-0.307093,-0.306700,-0.30704


### Training

In [56]:
model = LinearRegression()
model.fit(X_train, y_train)

print("Test R^2 Score: {:.5f}".format(model.score(X_test, y_test)))

Test R^2 Score: 0.92854
