# Import Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [2]:
#loading a pickled file
taxi_grouped_by_region = pd.read_pickle('taxi_grouped_by_region.pkl')


In [3]:
#Viewing the data of top 5 rows to take a glimpse of the data

taxi_grouped_by_region.head()

Unnamed: 0,PULocationID,transaction_date,transaction_month,transaction_day,transaction_hour,trip_distance,total_amount,count_of_transactions
0,1,2019-01-01,1,1,2,0.0,21.8,1
1,1,2019-01-01,1,1,5,0.0,87.3,1
2,1,2019-01-01,1,1,6,0.0,80.3,1
3,1,2019-01-01,1,1,8,0.0,128.58,2
4,1,2019-01-01,1,1,10,16.9,43.245,4



 **PULocationID**: This column represents the taxi zone in which the passenger was picked up. It is an ID assigned to the location.

 **transaction_date**: This column represents the date of the taxi ride.

**transaction_month**: This column represents the month of the transaction. In this case, all the transactions are from January, so this column contains only 1.

 **transaction_day**: This column represents the day of the month on which the transaction took place.

 **transaction_hour**: This column represents the hour of the day when the transaction took place, in 24-hour format.

 **trip_distance**: This column represents the distance of the taxi ride in miles.

 **total_amount**: This column represents the total fare of the taxi ride.

 **count_of_transactions**: This column represents the number of taxi transactions that occurred during a specific hour of a specific day in a specific location. It is calculated as the number of transactions grouped by **PULocationID**, **transaction_month**, **transaction_day**, and **transaction_hour**.
   


#  Feature Engineering

* We will add new feature to the model.

1. The first set is **Time-Based Feature**. This include, weekend and holiday boolean
<br>

2. The second set is **Location-based information**. We have Location IDs per region but there is a higher level abstraction for regions called Boroughs. This information came from the source of the main data.
<br>

3. The last set is **Weather related data**. 

In [4]:
 # Deep copy
data_with_new_features = taxi_grouped_by_region.copy() 

In [5]:
# Viewing the data of top 5 rows to take a glimpse of the data

data_with_new_features.head()

Unnamed: 0,PULocationID,transaction_date,transaction_month,transaction_day,transaction_hour,trip_distance,total_amount,count_of_transactions
0,1,2019-01-01,1,1,2,0.0,21.8,1
1,1,2019-01-01,1,1,5,0.0,87.3,1
2,1,2019-01-01,1,1,6,0.0,80.3,1
3,1,2019-01-01,1,1,8,0.0,128.58,2
4,1,2019-01-01,1,1,10,16.9,43.245,4


## 1. Time-Based Feature



**transaction_week_day**: This column represents the day of the week the transaction took place.
The dt.dayofweek method is applied on the transaction_date column which returns the day of the week as an integer, where Monday is 0 and Sunday is 6.

**Weekend**: This column is a boolean that indicates whether the transaction took place on the weekend (Saturday or Sunday).This is done by using the np.where() function which checks if the transaction_week_day is either 5 (Saturday) or 6 (Sunday).If it is, the weekend column is set to True, else it is set to False.

In [6]:

# The isin method is used to check whether each value in transaction_week_day is equal to either 5 or 6 
#(which correspond to Saturday and Sunday, respectively). 
#If a value is equal to one of those numbers, then True is assigned to the weekend column; otherwise, False is assigned.
data_with_new_features['transaction_week_day'] = data_with_new_features['transaction_date'].dt.dayofweek
data_with_new_features['weekend'] = np.where(data_with_new_features['transaction_week_day'].isin([5,6]), True, False)



The reason behind creating these features is to see, if there is a different trend/pattern of taxi transactions on weekends compared to weekdays.For example, there might be more taxi transactions on weekends due to people going out more.
This information could potentially help improve the predictive performance of our model.


In [None]:
# Holidays library is not included in the standard Anaconda distribution or Python's standard library, 
#so you need to install it before you can use it.


In [15]:
!pip install holidays


Defaulting to user installation because normal site-packages is not writeable
Collecting holidays
  Downloading holidays-0.29-py3-none-any.whl (695 kB)
     ------------------------------------ 695.7/695.7 kB 744.3 kB/s eta 0:00:00
Installing collected packages: holidays
Successfully installed holidays-0.29


In [7]:
import holidays

# create an instance of UnitedStates() which contains US federal holidays
us_holidays = holidays.UnitedStates(years=[2018, 2019, 2020])

# convert dates to datetime format and then check if the date is a holiday
data_with_new_features['transaction_date'] = pd.to_datetime(data_with_new_features['transaction_date'])
data_with_new_features['is_holiday'] = data_with_new_features['transaction_date'].dt.date.apply(lambda x: x in us_holidays)


Adding a feature like **'is_holiday'** can be important in many predictive models because behavior often changes on holidays. In the context of predicting taxi fares or rides, it could be important because the volume of rides or the fare might be different on holidays. People may use taxis differently (more or less frequently, different distances, etc.) on holidays than on regular days. Hence, by including **'is_holiday'** as a feature, we might improve the model's ability to understand the data and make more accurate predictions.

In [8]:
#Viewing the data of top 5 rows to take a glimpse of the data

data_with_new_features.head()

Unnamed: 0,PULocationID,transaction_date,transaction_month,transaction_day,transaction_hour,trip_distance,total_amount,count_of_transactions,transaction_week_day,weekend,is_holiday
0,1,2019-01-01,1,1,2,0.0,21.8,1,1,False,True
1,1,2019-01-01,1,1,5,0.0,87.3,1,1,False,True
2,1,2019-01-01,1,1,6,0.0,80.3,1,1,False,True
3,1,2019-01-01,1,1,8,0.0,128.58,2,1,False,True
4,1,2019-01-01,1,1,10,16.9,43.245,4,1,False,True


## 2. Location-based information

#### Borough Information
On the context of New York City, where this taxi dataset seems to originate, a borough refers to one of the five main administrative divisions within the city.

These five boroughs are:

    * Manhattan
    * Brooklyn
    * Queens
    * The Bronx
    * Staten Island

Each borough is also a county within the state of New York. They each have their own distinct cultures, neighborhoods, and, to an extent, socioeconomic identities.

The Borough information in the dataset  refers to the borough in which each taxi ride started (as given by the PULocationID field).

In [9]:

zone_lookup = pd.read_csv('taxi+_zone_lookup.csv')



In [10]:
#Viewing the data of top 5 rows to take a glimpse of the data

zone_lookup.head()

Unnamed: 0,LocationID,Borough,Zone,service_zone
0,1,EWR,Newark Airport,EWR
1,2,Queens,Jamaica Bay,Boro Zone
2,3,Bronx,Allerton/Pelham Gardens,Boro Zone
3,4,Manhattan,Alphabet City,Yellow Zone
4,5,Staten Island,Arden Heights,Boro Zone


In [11]:
zone_lookup = zone_lookup[['LocationID','Borough']] #Select only LocationID and Borough columns
zone_lookup['LocationID'] = zone_lookup['LocationID'].astype(str) #Convert 'LocationID' into string object

In [12]:
zone_lookup.head()

Unnamed: 0,LocationID,Borough
0,1,EWR
1,2,Queens
2,3,Bronx
3,4,Manhattan
4,5,Staten Island


In [13]:
# Merge 'data_with_new_features' DataFrame with 'zone_lookup' DataFrame. 
#The merge is performed based on the 'PULocationID' column in the first DataFrame
# and the 'LocationID' column in the second DataFrame. 
# 'how=left' ensures that all rows from the first (left) DataFrame are retained, and matching rows
# from the 'zone_lookup' DataFrame are included where applicable.
# If there's no match, the new columns (Borough, Zone, service_zone) will have null (NaN) values.
data_with_new_features = data_with_new_features.merge(zone_lookup, left_on='PULocationID', right_on='LocationID', how='left')

# Drop the 'LocationID' column from the 'data_with_new_features' DataFrame, as it's now redundant after the merge. 
# 'axis=1' specifies that we want to drop a column (not a row), 
# and 'inplace=True' means that the change should be applied directly to the DataFrame
# without returning a new one.
data_with_new_features.drop('LocationID', axis=1, inplace=True)


In [14]:
#Viewing the data of top 5 rows to take a glimpse of the data

data_with_new_features.head()


Unnamed: 0,PULocationID,transaction_date,transaction_month,transaction_day,transaction_hour,trip_distance,total_amount,count_of_transactions,transaction_week_day,weekend,is_holiday,Borough
0,1,2019-01-01,1,1,2,0.0,21.8,1,1,False,True,EWR
1,1,2019-01-01,1,1,5,0.0,87.3,1,1,False,True,EWR
2,1,2019-01-01,1,1,6,0.0,80.3,1,1,False,True,EWR
3,1,2019-01-01,1,1,8,0.0,128.58,2,1,False,True,EWR
4,1,2019-01-01,1,1,10,16.9,43.245,4,1,False,True,EWR


* **PULocationID**: The identifier for the pickup location. It is a categorical value, each number representing a different area in New York City.
* **transaction_date**: The date of the transaction.
* **transaction_month**: The month of the transaction.
* **transaction_day**: The day of the month of the transaction.
* **transaction_hour**: The hour of the day of the transaction.
* **trip_distance**: The distance of the trip for which the taxi was hired.
* **total_amount**: The total fare for the trip, which is the target variable for the model.
* **count_of_transactions**: The number of transactions that happened in the specific region for the specific time.
* **transaction_week_day**: The day of the week of the transaction, where 0 is Monday and 6 is Sunday.
* **weekend**: A Boolean column where True represents that the transaction day is a weekend (Saturday or Sunday), and False represents a weekday.
* **Borough**: The borough in which the pickup location (PULocationID) is situated.

Adding borough information to the main taxi dataset is a common practice in exploratory data analysis and feature engineering for several reasons:

**Improve model performance**: The borough where the ride starts might be a significant predictor for the total amount. 
For instance, rides starting in wealthier boroughs might tip more or use more expensive ride options, resulting in a higher total amount.

In [15]:
data_with_new_features['Borough'].value_counts()


Manhattan        45309
Brooklyn         23633
Queens           22002
Bronx             9586
Unknown           1453
Staten Island      302
EWR                271
Name: Borough, dtype: int64

## 3. Weather related data

#### Weather Information 

If its raining, then you are likely to take a cab or somewhere

In [16]:
nyc_weather = pd.read_csv('nyc_weather.csv')

#Viewing the data of top 5 rows to take a glimpse of the data
nyc_weather.head()

Unnamed: 0,date and time,temperature,humidity,wind speed,cloud cover,amount of precipitation
0,31.12.2019 22:00,6.1,65,6,100%.,0.3
1,31.12.2019 19:00,6.7,71,5,70 – 80%.,
2,31.12.2019 16:00,7.2,66,5,50%.,
3,31.12.2019 13:00,6.1,76,3,100%.,Trace of precipitation
4,31.12.2019 10:00,4.4,83,2,100%.,Trace of precipitation


   *  **date and time**: The exact date and time when the weather data was recorded.
   *  **temperature**: The air temperature at that time, presumably in degrees Celsius.
   * **humidity**: The relative humidity at that time, in percent.
   *   **wind speed**: The speed of the wind at that time, presumably in kilometers per hour.
   * **cloud cover**: The percentage of the sky obscured by clouds at that time. This column also seems to have some range values (e.g., "70 – 80%").
   *   **amount of precipitation**: The amount of precipitation at that time. This could be rainfall, snowfall, etc. The unit of measurement isn't specified, but it's typically in millimeters for rain or centimeters for snow. The column also has some non-numeric values like "Trace of precipitation".

In [17]:
nyc_weather.shape


(2936, 6)

In [18]:
nyc_weather.dtypes


date and time               object
temperature                float64
humidity                     int64
wind speed                   int64
cloud cover                 object
amount of precipitation     object
dtype: object

 We will need to convert **cloud cover** into numerical format            


In [19]:
nyc_weather['cloud cover'].value_counts()


70 – 80%.                                                     973
100%.                                                         896
20–30%.                                                       479
50%.                                                          413
no clouds                                                     168
Sky obscured by fog and/or other meteorological phenomena.      3
Name: cloud cover, dtype: int64

In [20]:
nyc_weather['amount of precipitation'].value_counts()


Trace of precipitation    266
0.3                        61
2.0                        49
1.0                        45
0.5                        42
0.8                        42
4.0                        33
5.0                        24
3.0                        23
6.0                        19
7.0                        13
8.0                        13
9.0                        11
10.0                       10
13.0                        7
12.0                        7
15.0                        6
11.0                        5
16.0                        3
22.0                        2
14.0                        2
29.0                        2
63.0                        1
21.0                        1
68.0                        1
17.0                        1
20.0                        1
30.0                        1
35.0                        1
34.0                        1
24.0                        1
18.0                        1
25.0                        1
Name: amou

The rest of the values represent the amount of rainfall in some unit (possibly millimeters), ranging from 0.3 to 68. The numbers tell you how much rain fell during the corresponding time period. Higher numbers indicate heavier rainfall.

In terms of how you might use this information for your model, consider that weather conditions, particularly rain, could influence taxi usage. People might be more likely to take a taxi when it's raining, and trips might take longer due to slower traffic conditions. Hence, this could be a valuable feature for your predictive model.

In [21]:
nyc_weather.isna().sum()


date and time                 0
temperature                   0
humidity                      0
wind speed                    0
cloud cover                   4
amount of precipitation    2240
dtype: int64

There are 4 missing values in the 'cloud cover' column and 2240 missing values in the 'amount of precipitation' column

For this case, '**Trace of precipitation**' can be thought of as 0.1 And the missing values 0

#### Dealing wih 'amount of precipitation' missing values and others

In [22]:
nyc_weather['amount of precipitation'].value_counts()


Trace of precipitation    266
0.3                        61
2.0                        49
1.0                        45
0.5                        42
0.8                        42
4.0                        33
5.0                        24
3.0                        23
6.0                        19
7.0                        13
8.0                        13
9.0                        11
10.0                       10
13.0                        7
12.0                        7
15.0                        6
11.0                        5
16.0                        3
22.0                        2
14.0                        2
29.0                        2
63.0                        1
21.0                        1
68.0                        1
17.0                        1
20.0                        1
30.0                        1
35.0                        1
34.0                        1
24.0                        1
18.0                        1
25.0                        1
Name: amou

In [23]:
# Replace 'Trace of precipitation' with a negligible but non-zero value and convert NaN values to 0
nyc_weather['amount of precipitation'] = nyc_weather['amount of precipitation'].replace('Trace of precipitation', 0.1).fillna(0)

# Convert the data type of the 'amount of precipitation' column to float
nyc_weather['amount of precipitation'] = nyc_weather['amount of precipitation'].astype(float)

#perform interpolation
nyc_weather['amount of precipitation'].interpolate(inplace=True)

#### Dealing wih 'Cloud cover' missing values and others

In [24]:
nyc_weather['cloud cover'].value_counts()


70 – 80%.                                                     973
100%.                                                         896
20–30%.                                                       479
50%.                                                          413
no clouds                                                     168
Sky obscured by fog and/or other meteorological phenomena.      3
Name: cloud cover, dtype: int64

* **'70 – 80%'**: This indicates that 70 to 80 percent of the sky was covered by clouds when the observation was made.
* **'100%'**: This indicates that the entire sky (100%) was covered by clouds at the time of observation.
* **'20–30%'**: This indicates that 20 to 30 percent of the sky was covered by clouds.
* **'50%'**: This indicates that half of the sky was covered by clouds.
*  **'no clouds'**: This indicates that there were no clouds in the sky at the time of the observation, i.e., 0% cloud cover.
* **'Sky obscured by fog and/or other meteorological phenomena'**: This is a categorical description indicating that the sky was obscured by fog or other weather phenomena,making it difficult to accurately observe and quantify cloud cover.
* This category(**Sky obscured by fog and/or other meteorological phenomena**) might require special handling during data preprocessing, as it doesn't directly correspond to a numeric percentage like the other categories.

In [25]:
# Create a dictionary mapping the string descriptions to numeric values
cloud_cover_dict = {
    '70 – 80%.': 0.7,
    '100%.': 1,
    '20–30%.': 0.3,
    '50%.': 0.5,
    'no clouds': 0,
    'Sky obscured by fog and/or other meteorological phenomena.': 1
}

# Use the map function to replace the string descriptions with their corresponding numeric values
nyc_weather['cloud cover'] = nyc_weather['cloud cover'].map(cloud_cover_dict).astype(float)

# Interpolate any remaining missing values
nyc_weather['cloud cover'].interpolate(inplace=True)


In [26]:
nyc_weather.dtypes


date and time               object
temperature                float64
humidity                     int64
wind speed                   int64
cloud cover                float64
amount of precipitation    float64
dtype: object

In [27]:
nyc_weather.head()

Unnamed: 0,date and time,temperature,humidity,wind speed,cloud cover,amount of precipitation
0,31.12.2019 22:00,6.1,65,6,1.0,0.3
1,31.12.2019 19:00,6.7,71,5,0.7,0.0
2,31.12.2019 16:00,7.2,66,5,0.5,0.0
3,31.12.2019 13:00,6.1,76,3,1.0,0.1
4,31.12.2019 10:00,4.4,83,2,1.0,0.1


In [28]:
# Convert 'date and time' to datetime object
nyc_weather['date and time'] = pd.to_datetime(nyc_weather['date and time'])

# Create new 'hour' column
nyc_weather['hour'] = nyc_weather['date and time'].apply(lambda x: x.hour)

# Create new 'month' column
nyc_weather['month'] = nyc_weather['date and time'].apply(lambda x: x.month)

# Create new 'day' column
nyc_weather['day'] = nyc_weather['date and time'].apply(lambda x: x.day)




By performing these transformations, the date and time information is broken down into separate components (hour, day, month)which makes the data easier to handle and manipulate. It could also potentially improve the performance of machine learning models since they can treat these as individual features.

In [29]:
nyc_weather.head()

Unnamed: 0,date and time,temperature,humidity,wind speed,cloud cover,amount of precipitation,hour,month,day
0,2019-12-31 22:00:00,6.1,65,6,1.0,0.3,22,12,31
1,2019-12-31 19:00:00,6.7,71,5,0.7,0.0,19,12,31
2,2019-12-31 16:00:00,7.2,66,5,0.5,0.0,16,12,31
3,2019-12-31 13:00:00,6.1,76,3,1.0,0.1,13,12,31
4,2019-12-31 10:00:00,4.4,83,2,1.0,0.1,10,12,31


In [30]:
#Borough information

data_with_new_features.head()


Unnamed: 0,PULocationID,transaction_date,transaction_month,transaction_day,transaction_hour,trip_distance,total_amount,count_of_transactions,transaction_week_day,weekend,is_holiday,Borough
0,1,2019-01-01,1,1,2,0.0,21.8,1,1,False,True,EWR
1,1,2019-01-01,1,1,5,0.0,87.3,1,1,False,True,EWR
2,1,2019-01-01,1,1,6,0.0,80.3,1,1,False,True,EWR
3,1,2019-01-01,1,1,8,0.0,128.58,2,1,False,True,EWR
4,1,2019-01-01,1,1,10,16.9,43.245,4,1,False,True,EWR


In [31]:
nyc_taxi_with_weather = data_with_new_features.merge(nyc_weather, left_on = ['transaction_month','transaction_day','transaction_hour'], right_on = ['month','day','hour'], how='left')

print(nyc_taxi_with_weather.shape)
nyc_taxi_with_weather.head()


(102556, 21)


Unnamed: 0,PULocationID,transaction_date,transaction_month,transaction_day,transaction_hour,trip_distance,total_amount,count_of_transactions,transaction_week_day,weekend,...,Borough,date and time,temperature,humidity,wind speed,cloud cover,amount of precipitation,hour,month,day
0,1,2019-01-01,1,1,2,0.0,21.8,1,1,False,...,EWR,NaT,,,,,,,,
1,1,2019-01-01,1,1,5,0.0,87.3,1,1,False,...,EWR,NaT,,,,,,,,
2,1,2019-01-01,1,1,6,0.0,80.3,1,1,False,...,EWR,NaT,,,,,,,,
3,1,2019-01-01,1,1,8,0.0,128.58,2,1,False,...,EWR,NaT,,,,,,,,
4,1,2019-01-01,1,1,10,16.9,43.245,4,1,False,...,EWR,2019-01-01 10:00:00,15.6,62.0,11.0,0.7,0.0,10.0,1.0,1.0


The reason we're doing this is to incorporate weather data into our taxi data. Weather can significantly influence taxi usage. For instance, if it's raining, people might be more likely to take a taxi rather than walk or cycle. By joining these dataframes, we're able to analyze the relationship between weather and taxi usage. 

The output shows that the merge operation has been successful, but there seem to be missing values (NaN) in the newly joined weather columns for some records. This could be due to the weather data not covering all the hours of the taxi data.
We need to sort this issue out

In [32]:
#The columns being dropped are 'date and time', 'hour', 'month', and 'day'. 
#These columns are dropped because they are redundant after the merge - 
#the information they contain is already represented in other columns ('transaction_month', 'transaction_day', and 'transaction_hour').

nyc_taxi_with_weather = nyc_taxi_with_weather.drop(['date and time','hour','month','day'], axis=1)


In [33]:
nyc_taxi_with_weather.head()


Unnamed: 0,PULocationID,transaction_date,transaction_month,transaction_day,transaction_hour,trip_distance,total_amount,count_of_transactions,transaction_week_day,weekend,is_holiday,Borough,temperature,humidity,wind speed,cloud cover,amount of precipitation
0,1,2019-01-01,1,1,2,0.0,21.8,1,1,False,True,EWR,,,,,
1,1,2019-01-01,1,1,5,0.0,87.3,1,1,False,True,EWR,,,,,
2,1,2019-01-01,1,1,6,0.0,80.3,1,1,False,True,EWR,,,,,
3,1,2019-01-01,1,1,8,0.0,128.58,2,1,False,True,EWR,,,,,
4,1,2019-01-01,1,1,10,16.9,43.245,4,1,False,True,EWR,15.6,62.0,11.0,0.7,0.0


In [45]:
# We have many missing values

In [34]:
nyc_taxi_with_weather.isna().sum()

PULocationID                   0
transaction_date               0
transaction_month              0
transaction_day                0
transaction_hour               0
trip_distance                  0
total_amount                   0
count_of_transactions          0
transaction_week_day           0
weekend                        0
is_holiday                     0
Borough                        0
temperature                68371
humidity                   68371
wind speed                 68371
cloud cover                68371
amount of precipitation    68371
dtype: int64

In [35]:
# We can sort the DataFrame directly without resetting the index and dropping the old index.
nyc_taxi_with_weather.sort_values(by=['transaction_date', 'transaction_hour'], ignore_index=True, inplace=True)


In [36]:
# We'll create a list of columns we want to interpolate
numeric_columns = ['temperature', 'humidity', 'wind speed', 'cloud cover', 'amount of precipitation']

# Now we interpolate only these columns
nyc_taxi_with_weather[numeric_columns] = nyc_taxi_with_weather[numeric_columns].interpolate()

# Fill the remaining NaNs using backfill method
nyc_taxi_with_weather[numeric_columns] = nyc_taxi_with_weather[numeric_columns].fillna(method='bfill')


In [37]:
nyc_taxi_with_weather.isna().sum()

PULocationID               0
transaction_date           0
transaction_month          0
transaction_day            0
transaction_hour           0
trip_distance              0
total_amount               0
count_of_transactions      0
transaction_week_day       0
weekend                    0
is_holiday                 0
Borough                    0
temperature                0
humidity                   0
wind speed                 0
cloud cover                0
amount of precipitation    0
dtype: int64

In [None]:
# No more missing values

In [38]:
# Display the dataframe
nyc_taxi_with_weather.head()

Unnamed: 0,PULocationID,transaction_date,transaction_month,transaction_day,transaction_hour,trip_distance,total_amount,count_of_transactions,transaction_week_day,weekend,is_holiday,Borough,temperature,humidity,wind speed,cloud cover,amount of precipitation
0,10,2019-01-01,1,1,0,3.02,14.3,2,1,False,True,Queens,8.3,97.0,0.0,1.0,29.0
1,100,2019-01-01,1,1,0,2.801852,18.13,54,1,False,True,Manhattan,8.3,97.0,0.0,1.0,29.0
2,106,2019-01-01,1,1,0,2.593333,15.373333,3,1,False,True,Brooklyn,8.3,97.0,0.0,1.0,29.0
3,107,2019-01-01,1,1,0,2.437458,14.897458,421,1,False,True,Manhattan,8.3,97.0,0.0,1.0,29.0
4,11,2019-01-01,1,1,0,1.795,9.3,2,1,False,True,Brooklyn,8.3,97.0,0.0,1.0,29.0


#### Save this dataframe to pickle (For next part of the project) - 'Machine Learning part 3 - Model Training'

In [39]:
nyc_taxi_with_weather.to_pickle("nyc_taxi_with_weather.pkl")