***

# Taxi Trip Fare Prediction - Scheduled DAG

***

The objective of this example is to leverage the Schedule DAG functionality to train and deploy a taxi trip fare prediction model. This functionality streamlines the entire process, including featureset creation, handling contextual features, creating datasets, and conducting training autonomously once the scheduled task is initiated. We will execute the Model 2 example using jobs scheduled by a DAG. As in the Model 2 example, we will
- create the training dataset using contextual features
- train an ML model based on historical taxi trip fare data and contextual features
- serve the ML model to predict the trip fare for new trips



***

**We will use the `trip_fare_dag` project for this example.**

In [None]:
create project trip_fare_dag

***

# Connect your Data Sources

<html><img src="../../images/trip_fare_images/2_1.png"/></html>

In the Model 2 example we have connected the S3 bucket as a data source to Foresight for the trip table. Similarly in this step we will connect the S3 bucket as a data source to Foresight for the three contextual feature tables.

### Create a Foresight ML sources file

Foresight establishes connections with data sources through a Foresight ML sources file. In the Model 2 example, we generated a Foresight ML sources file to connect the S3 bucket to the Foresight platform, enabling access to the trip table. Additionally, we created another Foresight ML sources file to incorporate three new contextual feature sources. These same files will be employed in this example.
Use the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.
Alternatively you may use the Foresight ML sources file from this tutorial.

<br> The relevant sections in the `trip_fare_2_data_sources.yml` file look like this:
    
            meta:
              source_type: aws
              source_format: csv
              path: s3a://foresight-tutorial/trip_fare/<table_name>.csv         <<<<                
              anon: true                                                              
              infer_schema: true                                                      
              header: true                                                            
              delimiter: ','                                                          
              s3_endpoint_url: https://foresight-tutorial.s3.us-west-2.amazonaws.com  
              batch_schedule: -1d

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

#### Add column schema to your data sources file

Foresight can automatically infer column schema from your data sources and update the ML sources file. Use the `add columns` command to automatically infer and update the ML sources file with the data source column schema. After this command completes, you must review the column schema for correctness and if necessary edit the ML sources file to fix column names or data types. Alternatively you may manually edit the ML sources file and add all the column names and data types to match your data source schema.

In [None]:
add columns trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

## Schedule Dataset Generation and Model Training

<html><img src="../../images/trip_fare_images/4_1.png"/></html>


The Scheduled DAG feature allows users to schedule jobs for each stage of the machine learning pipeline. Users can define when specific tasks, such as featureset updates, dataset generation, or model retraining, should run. This level of automation optimizes resource utilization and ensures that models are regularly retrained with the new data, enhancing their accuracy and relevance.
 
Jobs that must be scheduled are defined in a DAG JSON file. The JSON file lists the jobs that need to be run and the dependencies between the jobs. There are parameters to control the periodic execution of jobs.



The following DAG JSON file contains jobs for the creation of featuresets, featureviews, a training dataset and the training of a model.

In [None]:
!cat trip_fare_dag/trip_fare_training_dag.json

## Scheduling the DAG
The **schedule** DAG command schedules a DAG JSON file for execution based on the parameters in the file.
In this example we will reuse the job files from the Model 2 example.

In [None]:
schedule dag trip_fare_dag/trip_fare_training_dag.json

The **status** DAG command provides the current status of the DAG and each individual task mentioned in the JSON file. For tasks that have been initiated, it returns their current status. If a job has started, it indicates its ongoing status. If a particular job is in a waiting state, waiting for its parent job to complete, the status will be reported as "job not found" until it starts. This command helps to monitor the progression of DAG and the status of each task within it.

#### Wait for the DAG to complete

Use the `status dag` command to check the status of the dag. Wait for the dag status to complete. 

**DAG could take 10 minutes or more to complete.**

In [None]:
status dag trip_fare_training_dag

## Register a trained ML model

After the training is complete, the `status` command will show COMPLETED status. The trained ML model must be registered before it can be used for predictions. The `list trained-models` command will list all the trained models within a project. The `register model` command will register a trained model. The `list registered-models` will list all registered models within a project.

##### To list all the models that have been trained

In [None]:
list trained-models trip_fare_2_dl_model

##### Run this cell to register the model

**In the command below replace '1' with the run id of the model you want to register and deploy. The run id can be obtained from the previous list trained-models output.**

In [None]:
register model trip_fare_2_dl_model,1,PRODUCTION

#### To list all registered models

In [None]:
list registered-models

***

# Serve an ML Model

<html><img src="../../images/trip_fare_images/2_6.png"/></html>

In this step we will deploy the trained ML model to serve prediction requests. 

### Create a Foresight ML job file for model serving

ML models are deployed via a Foresight ML job file which specifies the ML serving options. 

Create a Foresight ML job file using the registered-model version that you want to serve. 

The `create prediction` command takes 2 required parameters the registered-model name and the model version. The 'dir' parameter specifies the location where the generated files will be saved. The command will generate 3 files, a Foresight ML job file, a sources yaml and a sample curl command requests file. Refer to the Foresight User Manual for help.

The sources yaml will contain definitions for two REST sources, one for the prediction REST request and one for the prediction REST response and a definition for the prediction log table.

**In the command below, replace the version '1' with the version of the registered model you are using.**

In [None]:
create prediction trip_fare_2_dl_model,1,dir=trip_fare_dag/

***

### Inspect the model serving files

Inspect the model serving ML job file and the definitions for the prediction REST request, prediction REST response and the prediction log table.

**Note: The generated files names have the model version number as shown below. In the commands below, replace the version '1' with the version of the registered model you are using.**

    Example : <model name>_<version>_serve.ml , <model name>_<version>_sources.yml

    

In [None]:
!cat trip_fare_dag/trip_fare_2_dl_model_1_serve.ml

In [None]:
!cat trip_fare_dag/trip_fare_2_dl_model_1_sources.yml

### Schedule Model Deployment

<html><img src="../../images/trip_fare_images/4_2.png"/></html>

We will schedule a DAG to deploy a model.

**Edit the DAG JSON file and replace the version number in the model serving job file name to match the generated file name from the create prediction command.**

The following DAG JSON file contains a job for deploying a registered model.

In [None]:
!cat trip_fare_dag/trip_fare_prediction_dag.json

In [None]:
schedule dag trip_fare_dag/trip_fare_prediction_dag.json

The command '**status dag**' will provide the current status of the model serving. The URL presented in the output serves as the endpoint for sending REST prediction requests, which can be done using tools such as curl or other suitable methods.

In [None]:
status dag trip_fare_prediction_dag

## Predict trip fare amounts

Use the `test prediction` command to send prediction requests to the deployed model. The command by default uses the last 10 rows from the training dataset for prediction request data and sends curl requests to the deployed model. The predictions responses are collected and displayed.

Refer to the Foresight User Manual for help.

Note: Once you run start prediction command, a prediction service starts running which is ready for serving. You can use the URL the prediction service gives you to send curl requests. Upon running the test prediction it also outputs the "Example Curl Request". Use this Curl request example to send data to predcition service or integrate the same into applications which where the predictions can be served.

In [None]:
test prediction trip_fare_2_dl_model_1_serve

Below is a markdowncell which shows how to run the Curl Request to fetch predictions. Convert the cell into Code state and then enter the prediction URL in the space mentioned and execute the cell to get response.

!curl -X GET ">enter the prediction URL here<" -H "Content-Type: application/json" -d '[{"rest_request_id": "prediction_test-1", "pickup_datetime": "2022-11-12 11:29:05", "pickup_zipcode": "10069", "dropoff_zipcode": "10107", "passenger_count": 3}]'

### Stop the deployed model

Use the `stop dag` command to stop ML model serving when you have completed the prediction requests. This step is optional, you may choose to leave the model deployed.

In [None]:
stop dag trip_fare_prediction_dag