***

# Taxi Trip Fare Prediction - Model 2

***

The goal of this example is to build on the Model 1 example and generate a better ML model. We will
- enhance the training dataset using contextual features
- train an ML model based on historical taxi trip fare data and contextual features
- serve the ML model to predict the trip fare for new trips

### Prepare your data

In the Model 1 example we used trip table from S3 bucket to use as training dataset. In this example we will use 3 more tables; hour_of_day_context, holiday_weekend_context and geo_area_context to enrich the data to and pass it model to better understand the context and provide better predictions.

Let us look at the first few lines of the csv's which we will be going to use for building Model 2.

##### hour_of_day_context.csv
    hour_of_day,hourly_segment
    0,early morning
    1,early morning
    2,early morning
    3,early morning

##### holiday_weekend_context.csv
    calendar_day,is_holiday_or_weekend
    2009-01-01,1
    2009-01-02,0
    2009-01-03,1
    2009-01-04,1
    
##### geo_area_context.csv
    zipcode,geo_area
    10023,Commercial
    10021,Residential
    10002,Suburbs
    11201,Commercial

### Static contextual feature data

We will enhance the data by adding three contextual feature tables. 

- an hourly segment table that maps an hour to an hourly-segment. 
- a holiday weekend table that maps a date to a flag indicating whether that date was a holiday-or-weekend or neither.
- a geo area table that maps a zipcode to a type of geo area.

The idea is that the hourly-segment, the holiday-or-weekend flag and the type of pickup and dropoff geo areas have an influence on the trip fare amount. We can create a more accurate ML model with these additional features.

Each contextual feature table is a csv file in S3 bucket containing the respective mapping.

***

**We will reuse the `trip_fare` project from Model 1 for this example.**

In [None]:
set project trip_fare

***

# Connect your Data Sources

<html><img src="../../images/trip_fare_images/2_1.png"/></html>

In the Model 1 example we have connected the S3 bucket as a data source to Foresight for the trip table. Similarly in this step we will connect the S3 bucket as a data source to Foresight for the three contextual feature tables.

### Create a Foresight ML sources file

Data sources are connected to Foresight via a Foresight ML sources file. In the Model 1 example we have created a Foresight ML sources file to connect the S3 bucket to the Foresight platform for the trip table. Create another Foresight ML sources file to add the three new contextual feature sources. Use the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.
Alternatively you may use the Foresight ML sources file from this tutorial.

<br> The relevant sections in the `trip_fare_2_data_sources.yml` file look like this:
    
            meta:
              source_type: aws
              source_format: csv
              path: s3a://foresight-tutorial/trip_fare/<table_name>.csv         <<<<                
              anon: true                                                              
              delimiter: ','                                                          
              s3_endpoint_url: https://foresight-tutorial.s3.us-west-2.amazonaws.com  


In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

#### Add column schema to your data sources file

Foresight can automatically infer column schema from your data sources and update the ML sources file. Use the `add columns` command to automatically infer and update the ML sources file with the data source column schema. After this command completes, you must review the column schema for correctness and if necessary edit the ML sources file to fix column names or data types. Alternatively you may manually edit the ML sources file and add all the column names and data types to match your data source schema.

In [None]:
add columns trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

***

# Create Feature Sets for contextual features

<html><img src="../../images/trip_fare_images/2_2.png"/></html>

In this step we will create three feature sets to generate and store the contextual feature tables in Foresight storage based on the three csv data sources. 

### Create Foresight ML job files to generate feature sets

The feature sets will be created using Foresight ML job files. The `using_foresight_options` section of the Foresight ML job file is where you specify the key entities for each feature set. Key entities indicate row uniqueness within a table, and they are used to lookup the contextual feature. Create Foresight ML job files using the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.

In [None]:
!cat trip_fare_prediction_model_2/hour_of_day_context.ml

In [None]:
!cat trip_fare_prediction_model_2/holiday_weekend_context.ml

In [None]:
!cat trip_fare_prediction_model_2/geo_area_context.ml

### Create feature sets

Use the `start featureset` command to execute the Foresight ML job file to create the feature set. This command will start a job that creates the feature set tables within Foresight, and inserts data into the Foresight tables from the data source. The job continues to run until all the data has been fetched. The `status featureset` command will show the status of the feature set.


In [None]:
start featureset trip_fare_prediction_model_2/hour_of_day_context

In [None]:
start featureset trip_fare_prediction_model_2/holiday_weekend_context

In [None]:
start featureset trip_fare_prediction_model_2/geo_area_context

In [None]:
status featureset hour_of_day_context

In [None]:
status featureset holiday_weekend_context

In [None]:
status featureset geo_area_context

In [None]:
display featureset hour_of_day_context

In [None]:
display featureset holiday_weekend_context

In [None]:
display featureset geo_area_context

***

# Create a Feature View to serve contextual features

<html><img src="../../images/trip_fare_images/2_3.png"/></html>

In this step we will create a feature view to serve four contextual features from the three feature sets that we created in internal Foresight storage. The feature view will output the following contextual features
- the hourly_segment for a given hour of day
- the holiday_or_weekend flag for a given date
- the pickup_geo_area for a given pickup zipcode
- the dropoff_geo_area for a given dropoff zipcode

### Create a Foresight ML job file to generate a feature view

The feature view will be created using a Foresight ML job file. The `using_foresight_options` section of the Foresight ML job file is where you specify the feature name and source for the feature. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help. Make sure to update the models section of your Foresight ML job sources file as well.

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_feature_view.ml

### Start serving contextual features

Use the `start featureview` command to execute the Foresight ML job file to start serving contextual features for the feature view. This command starts a job to serve the feature view. Use the `offline` option to serve features for training dataset creation and the `online` option to serve features for prediction. 

The `status featureview` command will show the status of the feature view. The *`feature_status`* element indicates the availability of feature data. A feature status of "OK" indicates that feature data is available.

In [None]:
start featureview trip_fare_prediction_model_2/trip_fare_2_feature_view,offline

In [None]:
start featureview trip_fare_prediction_model_2/trip_fare_2_feature_view,online

In [None]:
status featureview trip_fare_2_feature_view,offline

In [None]:
status featureview trip_fare_2_feature_view,online

### Explore feature sets and feature views

Explore the feature sets and feature views that you created using `Foresight Explorer`. The `Foresight Explorer` tool can be opened by clicking on the following icon in the Launcher page. 

<html><img src="../../images/trip_fare_images/2_7.png"/></html>

Navigate to the `Foresight Explorer` web page and open the `trip_fare` project. Explore the feature sets and feature views within that project.

***

# Create a Training Dataset

<html><img src="../../images/trip_fare_images/2_4.png"/></html>

In this step we will create a training dataset using the trip table data source and the contextual features. We will use the pickup_zipcode, dropoff_zipcode and passenger_count as input features to the ML model. We will use the ***contextual_feature_fetch*** UDF to fetch the the hourly_segment, the is_holiday_or_weekend flag, the pickup_geo_area and the dropoff_geo_area from the feature view and use those as additional inputs to the ML model. The fare_amount will be the target or label for the ML model to train. 

### Create a Foresight ML job file to generate a training dataset

The training dataset will be created using a SQL command. SQL commands can be executed via Foresight ML job files. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_train_dataset.ml

### Create the dataset

Use the `start dataset` command to execute the Foresight ML job file to create the training dataset.The `status dataset` command will show the current status of dataset generation; "RUNNING", "COMPLETED" or "ERROR". The `list datasets` command will list the created datasets within a project. The `display dataset` command will display the first few rows of the training dataset.

**This command may take up to 10 minutes due to the size of the dataset.**

In [None]:
start dataset trip_fare_prediction_model_2/trip_fare_2_train_dataset

In [None]:
status dataset trip_fare_2_train_dataset

In [None]:
list datasets

In [None]:
display dataset trip_fare_2_train_dataset

### Explore data quality of the dataset

Use the `explore data-quality` command to visually explore the data quality of the dataset and also get summary statistics. The `target_column` is an optional filed in the job file and represents the target or label for ML training. Click on the output url to visualize the data-quality report.

The data-quality report is generated asynchronously. You can run the `status data-quality` command to check the status of the report.

Click on the output url to visualize the final report.

**The final report may take a few minutes due to the size of the dataset.**

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_train_dataset_data_quality.ml

In [None]:
explore data-quality trip_fare_prediction_model_2/trip_fare_2_train_dataset_data_quality

In [None]:
status data-quality trip_fare_2_train_dataset

***

# Train an ML Model

<html><img src="../../images/trip_fare_images/2_5.png"/></html>

In this step we will train an ML model using the training dataset that was created. We will use the pickup_zipcode, dropoff_zipcode, passenger_count, hourly_segment and is_holiday_or_weekend as input features to the ML model. The fare_amount will be the target or label for the ML model to train. 

### Create a Foresight ML job file for model training

ML model training is initiated via a Foresight ML job file which specifies the ML training parameters. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.

### Start ML model training

Use the `start training` command to execute the Foresight ML job file to start the model training. The `status training` command will show the status of the model training.

### Machine Learning

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_ml_model_train.ml

In [None]:
start training trip_fare_prediction_model_2/trip_fare_2_ml_model_train,limit=2000

**Click the url shown in the output of status to open a *ML-Flow* session that displays the training metrics.**

#### Wait for ML model training to complete

Use the `status training` command to check the status of the model training. Wait for the ML model training status to complete. 

**Training could take 10 minutes or more to complete.**

In [None]:
status training trip_fare_2_ml_model_train

### Deep Learning

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_dl_model_train.ml

In [None]:
start training trip_fare_prediction_model_2/trip_fare_2_dl_model_train

**Click the url shown in the output to open a *TensorBoard* session that displays the training progress and metrics.** After opening the *TensorBoard* url click on the reload button to the top right of the *TensorBoard* page.

**Training could take 10 minutes or more to complete.**

In [None]:
status training trip_fare_2_dl_model_train

#### Note:
 TensorBoard is only available for Deep Learning models   

In [None]:
list tensorboard "<model name>,<run_id>"

## Register a trained ML model

After the training is complete, the `status training` command will show COMPLETED status. The trained ML model must be registered before it can be used for predictions. The `list trained-models` command will list all the trained models within a project. The `register model` command will register a trained model. The `list registered-models` will list all registered models within a project.

##### To list all the ML models that have been trained

In [None]:
list trained-models trip_fare_2_ml_model

##### To list all the DL models that have been trained

In [None]:
list trained-models trip_fare_2_dl_model

##### Run this cell to register the machine learning model

In [None]:
register model trip_fare_2_ml_model,1,PRODUCTION

##### Run this cell to register the deep learning model

In [None]:
register model trip_fare_2_dl_model,1,PRODUCTION

#### To list all registered models

In [None]:
list registered-models

***

# Serve an ML Model

<html><img src="../../images/trip_fare_images/2_6.png"/></html>

In this step we will deploy the trained ML model to serve prediction requests. 

### Create a Foresight ML job file for model serving

ML models are deployed via a Foresight ML job file which specifies the ML serving options. 

Create a Foresight ML job file using the registered-model version that you want to serve. 

The `create prediction` command takes 2 required parameters the registered-model name and the model version. The 'dir' parameter specifies the location where the generated files will be saved. The command will generate 3 files, a Foresight ML job file, a sources yaml and a sample curl command requests file. Refer to the Foresight User Manual for help.

The sources yaml will contain definitions for two REST sources, one for the prediction REST request and one for the prediction REST response and a definition for the prediction log table.

**In the command below, replace the version '1' with the version of the registered model you are using.**

In the following cells we have used machine learning models to serve predictions, if you choose to use deep learning model for predictions replace trip_fare_2_ml_model to trip_fare_2_dl_model.

In [None]:
create prediction trip_fare_2_ml_model,1,dir=trip_fare_prediction_model_2/

***

### Inspect the model serving files

Inspect the model serving ML job file and the definitions for the prediction REST request, prediction REST response and the prediction log table.

**Note: The generated files names have the model version number as shown below. In the commands below, replace the version '1' with the version of the registered model you are using.**

    Example : <model name>_<version>_serve.ml , <model name>_<version>_sources.yml

    

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_ml_model_1_serve.ml

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_ml_model_1_sources.yml

### Deploy the model

Use the `start prediction` command to execute the Foresight ML job file to deploy a model. The `status prediction` command will show the status of the model serving. The url shown in the output is the endpoint to which REST prediction request may be sent via `curl` or some other means.

In [None]:
start prediction trip_fare_prediction_model_2/trip_fare_2_ml_model_1_serve

In [None]:
status prediction trip_fare_2_ml_model_1_serve

## Predict trip fare amounts

Use the `test prediction` command to send prediction requests to the deployed model. The command by default uses the last 10 rows from the training dataset for prediction request data and sends curl requests to the deployed model. The predictions responses are collected and displayed.

Refer to the Foresight User Manual for help.

Note: Once you run start prediction command, a prediction service starts running which is ready for serving. You can use the URL the prediction service gives you to send curl requests. Upon running the test prediction it also outputs the "Example Curl Request". Use this Curl request example to send data to predcition service or integrate the same into applications which where the predictions can be served.

In [None]:
test prediction trip_fare_2_ml_model_1_serve

Below is a markdowncell which shows how to run the Curl Request to fetch predictions. Convert the cell into Code state and then enter the prediction URL in the space mentioned and execute the cell to get response.

!curl -X GET ">enter the prediction URL here<" -H "Content-Type: application/json" -d '[{"rest_request_id": "prediction_test-1", "pickup_datetime": "2022-11-12 11:29:05", "pickup_zipcode": "10069", "dropoff_zipcode": "10107", "passenger_count": 3}]'

### Stop the deployed model

Use the `stop prediction` command to stop ML model serving when you have completed the prediction requests. This step is optional, you may choose to leave the model deployed.

In [None]:
stop prediction trip_fare_2_ml_model_1_serve