# Ride duration model implementation

In this notebook is implemented a simple application that takes the `lin_reg.bin` model developed and built in the first module of this course ([Introduction to MLOps](../../01_introduction/)) and make a simple prediction using the `fhv_tripdata_2021-02.parquet` dataset.

In [125]:
# Built-in imports
import os
from datetime import datetime

# External imports
import pickle
import pandas as pd

# Own imports
from scripts import get_path_dir as gpd

# Define some important directories
MODEL_DIR = os.path.join(
    gpd.get_desired_folder_path("01_introduction"), 
    "models"
)
DATA_DIR = gpd.get_desired_folder_path("data")

In [115]:
with open(os.path.join(MODEL_DIR,'lin_reg.bin'), 'rb') as f_in:
    dv, lr = pickle.load(f_in)

In [120]:
CATEGORICAL = ['PUlocationID', 'DOlocationID']

def read_data(filename):
    df = pd.read_parquet(filename)
    
    df['duration'] = df["dropOff_datetime"] - df["pickup_datetime"]
    df['duration'] = df["duration"].dt.total_seconds() / 60

    df = df[(df["duration"] >= 1) & (df["duration"] <= 60)]

    df[CATEGORICAL] = df[CATEGORICAL].fillna(-1).astype('float').astype('str')
    
    return df

In [121]:
df = read_data(os.path.join(DATA_DIR, "fhv_tripdata_2021-02.parquet"))

In [122]:
dicts = df[CATEGORICAL].to_dict(orient='records')
X_val = dv.transform(dicts)
y_pred = lr.predict(X_val)

## Let's analyze the predictions
In this section of the notebooks are going to be calculated and implemented some of the tasks that were indicated in the homework of this module.

In [123]:
print(f"The mean of the predicted values is {y_pred.mean()}")

The mean of the predicted values is 16.191691681964873


In [137]:
df_resulted = pd.DataFrame()
df_resulted['ride_id'] = f'{datetime.today().year:04d}/{datetime.today().month:02d}_' + df.index.astype('str')
df_resulted["predictions"] = y_pred

df.to_parquet(
    os.path.join(
        DATA_DIR,
        "predictions",
        f"{datetime.today()}-fhv_tripdata_duration_ride.parquet"
    ),
    engine='pyarrow',
    compression=None,
    index=False
)

In [140]:
!jupyter nbconvert --to script ride-duration-prediction.ipynb  --output ../scripts/ride-duration.py

[NbConvertApp] Converting notebook ride-duration-prediction.ipynb to script
[NbConvertApp] Writing 2219 bytes to ../scripts/ride-duration.py.py
