***

# Taxi Trip Fare Prediction - Model 1

***

The goal of this example is to train and serve a taxi trip fare prediction model. We will
- train an ML model based on historical taxi trip fare data
- serve the ML model to predict the trip fare for new trips

***

### Create a Foresight project

We will create a project called `trip_fare` for this tutorial.

In [None]:
create project trip_fare

***

# Connect your Data Sources

<html><img src="../../images/trip_fare_images/1_1.png"/></html>

We will be using trip_table.csv available in s3 bucket to fetch the data and start working on it. The data includes the pickup datetime, pickup latitude, longitude, dropoff latitude, longitude, pickup and dropoff zipcodes, passenger count and fare amount.

The first few lines of the csv file are shown below,

##### pickup_datetime,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,pickup_zipcode,dropoff_zipcode,passenger_count,fare_amount
    2022-10-18 03:16:57,-73.9617,40.7628,-73.9748,40.7528,10065,10017,1,25.45
    2022-10-18 03:17:00,-73.9957,40.7594,-73.9758,40.7553,10199,10111,1,16.48
    2022-10-18 03:17:00,-73.9946,40.7259,-73.9915,40.7325,10003,10003,1,13.1
    2022-10-18 03:17:46,-73.9893,40.7419,-74.0018,40.7263,10010,10012,1,23.31

### Create a Foresight ML sources file

In this step we will connect the AWS S3 as a data source to Foresight. This allows Foresight to read the trip table from the S3 bucket.

Data sources are connected to Foresight via a Foresight ML sources file. Create a Foresight ML sources file using the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.

Alternatively you may use the Foresight ML sources file from this tutorial.

<br>The relevant section in the `trip_fare_data_sources.yml` file looks like this:
    
            meta:
              source_type: aws
              source_format: csv
              path: s3a://foresight-tutorial/trip_fare/trip_table.csv                 
              anon: true                                                              
              infer_schema: true                                                      
              header: true                                                            
              delimiter: ','                                                          
              s3_endpoint_url: https://s3.us-west-2.amazonaws.com  
              batch_schedule: -1d
              
The S3 bucket or any other source (MySql, Kafka, etc.) if you wish to use, its credentials must be specified in the sources file.

In [None]:
!cat trip_fare_data_sources.yml

#### Add column schema to your data sources file

Foresight can automatically infer column schema from your data sources and update the ML sources file. Use the `add columns` command to automatically infer and update the ML sources file with the data source column schema. After this command completes, you must review the column schema for correctness and if necessary edit the ML sources file to fix column names or data types. Alternatively you may manually edit the ML sources file and add all the column names and data types to match your data source schema.

In [None]:
add columns trip_fare_data_sources.yml

In [None]:
!cat trip_fare_data_sources.yml

***

# Create a Training Dataset

<html><img src="../../images/trip_fare_images/1_2.png"/></html>

In this step we will create a training dataset using the trip table data source. We will use the pickup_zipcode, dropoff_zipcode and passenger_count as input features to the ML model and thus import these features into training dataset from trip table in S3 bucket along with the fare_amount which will be the target or label for the ML model to train. 

### Create a Foresight ML job file to generate a training dataset

The training dataset will be created using a SQL command. SQL commands can be executed via Foresight ML job files. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.

In [None]:
!cat trip_fare_prediction_model_1/trip_fare_1_train_dataset.ml

### Create the dataset

Use the `start dataset` command to execute the Foresight ML job file to create the training dataset.The `status dataset` command will show the current status of dataset generation; "RUNNING", "COMPLETED" or "ERROR". The `list datasets` command will list the created datasets within a project. The `display dataset` command will display the first few rows of the training dataset.

**This command may take up to 10 minutes due to the size of the dataset.**

In [None]:
start dataset trip_fare_prediction_model_1/trip_fare_1_train_dataset

In [None]:
status dataset trip_fare_1_train_dataset

In [None]:
list datasets

In [None]:
display dataset trip_fare_1_train_dataset

***

# Train an ML Model

<html><img src="../../images/trip_fare_images/1_3.png"/></html>

In this step we will train an ML model using the training dataset that was created. We will use the pickup_zipcode, dropoff_zipcode and passenger_count as input features to the ML model. The fare_amount will be the target or label for the ML model to train. 

### Create a Foresight ML job file for model training

ML model training is initiated via a Foresight ML job file which specifies the ML training parameters. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.

There are 2 different ways to train a model in Foresight -

1\.Machine Learning: Uses machine learning models to accomplish supervised and unsupervised learning tasks with models like linear regression, logistic regression, random forests, etc. 

2\.Deep Learning: Uses neural network based models like cnn, lstm and rnn.

Here in this free version of Foresight only machine learning models are available. Get Licensed version of foresight to unlock deep learning models and also other computational features to work on live streaming data.

##### NOTE: 
We specified "pickup_zipcode" and "dropoff_zipcode" as "high_cardinality_features" because they're categorical columns with high cardinality and the training engine transforms the categorical variable into numerical representation by using techniques like one_hot_encoding and variables with high cardinality will cause the data to explode in higher dimensions. To avoid this mention the columns with high cardinality in "high_cardinality_features" section in the training job file which doesn't let data to explode into higher dimensions.

In [None]:
!cat trip_fare_prediction_model_1/trip_fare_1_model_train.ml

### Start ML model training

Use the `start training` command to execute the Foresight ML job file to start the model training. The `status training` command will show the status of the model training.

**Training could take 10 minutes or more to complete.**

**Click the url shown in the output of status to open a *ML-Flow* session that displays the training metrics.**

In [None]:
start training trip_fare_prediction_model_1/trip_fare_1_model_train

#### Wait for ML model training to complete

Use the `status training` command to check the status of the model training. Wait for the ML model training status to complete. 

**Training could take 10 minutes or more to complete.**

**Click the url shown in the output of status to open a *ML-Flow* session that displays the training metrics.**

In [None]:
status training trip_fare_1_model_train

## Register a trained ML model

After the training is complete, the `status training` command will show COMPLETED status. The trained ML model must be registered before it can be used for predictions. The `list trained-models` command will list all the trained models within a project. The `register model` command will register a trained model. The `list registered-models` will list all registered models within a project.

In [None]:
list trained-models trip_fare_1_model

In [None]:
register model trip_fare_1_model,1,PRODUCTION

In [None]:
list registered-models

***

# Serve an ML Model

<html><img src="../../images/trip_fare_images/1_4.png"/></html>

In this step we will deploy a trained ML model to serve prediction requests. 

### Create a Foresight ML job file for model serving

ML models are deployed via a Foresight ML job file which specifies the ML serving options. 

Create a Foresight ML job file using the registered-model version that you want to serve. 

The `create prediction` command takes 2 required parameters the registered-model name and the model version. The 'dir' parameter specifies the location where the generated files will be saved. The command will generate 3 files, a Foresight ML job file, a sources yaml and a sample curl command requests file. Refer to the Foresight User Manual for help.

The sources yaml will contain definitions for two REST sources, one for the prediction REST request and one for the prediction REST response and a definition for the prediction log table.

**In the command below, replace the version '1' with the version of the registered model you are using.**


In [None]:
create prediction trip_fare_1_model,1,dir=trip_fare_prediction_model_1/

***

### Inspect the model serving files

Inspect the model serving ML job file and the definitions for the prediction REST request, prediction REST response and the prediction log table.

**Note: The generated files names have the model version number as shown below. In the commands below, replace the version '1' with the version of the registered model you are using.**

    Example : <model name>_<version>_serve.ml , <model name>_<version>_sources.yml

    

In [None]:
!cat trip_fare_prediction_model_1/trip_fare_1_model_1_serve.ml

In [None]:
!cat trip_fare_prediction_model_1/trip_fare_1_model_1_sources.yml

### Deploy the model

Use the `start prediction` command to execute the Foresight ML job file to deploy a model. The `status prediction` command will show the status of the model serving. The url shown in the output is the endpoint to which REST prediction request may be sent via `curl` or some other means.

In [None]:
start prediction trip_fare_prediction_model_1/trip_fare_1_model_1_serve

In [None]:
status prediction trip_fare_1_model_1_serve

## Predict trip fare amounts

Use the `test prediction` command to send prediction requests to the deployed model. The command by default uses the last 10 rows from the training dataset for prediction request data and sends curl requests to the deployed model. The predictions responses are collected and displayed.

Refer to the Foresight User Manual for help.

Note: Once you run start prediction command, a prediction service starts running which is ready for serving. You can use the URL the prediction service gives you to send curl requests. Upon running the test prediction it also outputs the "Example Curl Request". Use this Curl request example to send data to predcition service or integrate the same into applications which where the predictions can be served.

In [None]:
test prediction trip_fare_1_model_1_serve

Below is a markdowncell which shows how to run the Curl Request to fetch predictions. Convert the cell into Code state and then enter the prediction URL in the space mentioned and execute the cell to get response.

!curl -X GET ">enter the prediction URL here<" -H "Content-Type: application/json" -d '[{"rest_request_id": "prediction_test-1", "pickup_datetime": "2022-11-12 07:15:00", "pickup_zipcode": "10010", "dropoff_zipcode": "10011", "passenger_count": 3}]'

### Stop the deployed model

Use the `stop prediction` command to stop ML model serving when you have completed the prediction requests. This step is optional, you may choose to leave the model deployed.

In [None]:
stop prediction trip_fare_1_model_1_serve