# Build bento model flight prediction

The best model obtained in the notebook, which was the RandomForestRegressor model, will be build into a bento model, thats the reason if this notebook

## Load libraries

The following libraries are needed to create the bento model

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer

from sklearn.ensemble import RandomForestRegressor

import bentoml

## Load the data

First lets load the data into a pandas DataFrame

In [2]:
data = 'https://raw.githubusercontent.com/FranciscoOrtizTena/ML_Zoomcamp/main/12_week_capstone_project/flight_price_prediction.csv'
df = pd.read_csv(data).set_index('ID')

## Data preparation

Let's prepara the data as we did in the notebook file

In [3]:
df = df.drop(['flight'], axis=1)
categorical_columns = ['airline', 'source_city', 'departure_time',
                       'stops', 'arrival_time', 'destination_city', 'class']
for i in categorical_columns:
    df[i] = df[i].str.lower()

Now let's split the data into the train and test sets

In [4]:
df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=7)
df_full_train = df_full_train.reset_index(drop=True)

y_full_train = df_full_train.price.values

del df_full_train['price']

Making the one-hot encode using DictVectorizer

In [5]:
dicts_full_train = df_full_train.to_dict(orient='records')

dv = DictVectorizer(sparse=False)
X_full_train = dv.fit_transform(dicts_full_train)

## Random Forest Regressor

Now let's train the Random Forest Regressor

In [6]:
model = RandomForestRegressor(n_estimators=75,
                           max_depth=25,
                           min_samples_leaf=5,
                           random_state=7)
model.fit(X_full_train, y_full_train)

RandomForestRegressor(max_depth=25, min_samples_leaf=5, n_estimators=75,
                      random_state=7)

## BentoML

Finally let's save the model into the bentoml.yaml file

In [7]:
bentoml.sklearn.save_model(
    'flight_price_prediction',
    model,
    custom_objects={
        'dictVectorizer': dv
    }
)

Model(tag="flight_price_prediction:wqqm6cd7xcjtzzc6", path="C:\Users\10714681\bentoml\models\flight_price_prediction\wqqm6cd7xcjtzzc6\")

## Test data

Let's obtain the features for test a data

In [11]:
np.array(df_test.iloc[0])[:-1]

array(['vistara', 'delhi', 'early_morning', 'two_or_more', 'evening',
       'chennai', 'economy', 12.42, 16], dtype=object)

Now the values for each feature

In [13]:
df_test.columns[:-1]

Index(['airline', 'source_city', 'departure_time', 'stops', 'arrival_time',
       'destination_city', 'class', 'duration', 'days_left'],
      dtype='object')

Let's create a dictionary of them

In [14]:
dict(zip(df_test.columns[:-1], np.array(df_test.iloc[0])[:-1]))

{'airline': 'vistara',
 'source_city': 'delhi',
 'departure_time': 'early_morning',
 'stops': 'two_or_more',
 'arrival_time': 'evening',
 'destination_city': 'chennai',
 'class': 'economy',
 'duration': 12.42,
 'days_left': 16}

Now pass the values into price

In [15]:
price = {"airline": "vistara",
         "source_city": "delhi",
         "departure_time": "early_morning",
         "stops": "two_or_more",
         "arrival_time": "evening",
         "destination_city": "chennai",
         "class": "economy",
         "duration": 12.42,
         "days_left": 16}

Finally let's predict the price

In [19]:
model.predict(dv.transform(price))

array([10443.22634515])

And the real price

In [21]:
df_test.iloc[[0]]

Unnamed: 0_level_0,airline,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
37442,vistara,delhi,early_morning,two_or_more,evening,chennai,economy,12.42,16,14293
