***

# Taxi Trip Fare Prediction - Scheduled DAG

***

The objective of this example is to leverage the Schedule DAG functionality to train and deploy a taxi trip fare prediction model. This functionality streamlines the entire process, including featureset creation, handling contextual features, creating datasets, and conducting training autonomously once the scheduled task is initiated.. We will
- create the training dataset using contextual features
- train an ML model based on historical taxi trip fare data and contextual features
- serve the ML model to predict the trip fare for new trips


### Prepare your data

In this example, we utilize the trip table retrieved from an S3 bucket to construct a training dataset. To enhance the dataset and enable the model to gain a deeper understanding of context for improved predictions, we incorporate three additional tables: hour_of_day_context, holiday_weekend_context, and geo_area_context. These tables contribute additional contextual information to the model, enriching the training data and enhancing the model's predictive capabilities.

Let us look at the first few lines of the csv's which we will be going to use for building Model.

##### hour_of_day_context.csv
    hour_of_day,hourly_segment
    0,early morning
    1,early morning
    2,early morning
    3,early morning

##### holiday_weekend_context.csv
    calendar_day,is_holiday_or_weekend
    2009-01-01,1
    2009-01-02,0
    2009-01-03,1
    2009-01-04,1
    
##### geo_area_context.csv
    zipcode,geo_area
    10023,Commercial
    10021,Residential
    10002,Suburbs
    11201,Commercial

### Static contextual feature data

We will enhance the data by adding three contextual feature tables. 

- an hourly segment table that maps an hour to an hourly-segment. 
- a holiday weekend table that maps a date to a flag indicating whether that date was a holiday-or-weekend or neither.
- a geo area table that maps a zipcode to a type of geo area.

The idea is that the hourly-segment, the holiday-or-weekend flag and the type of pickup and dropoff geo areas have an influence on the trip fare amount. We can create a more accurate ML model with these additional features.

Each contextual feature table is a csv file in S3 bucket containing the respective mapping.

***

**We will use the `trip_fare_dag` project for this example.**

In [None]:
create project trip_fare_dag

***

# Connect your Data Sources

<html><img src="../../images/trip_fare_images/2_1.png"/></html>

In the Model 1 example we have connected the S3 bucket as a data source to Foresight for the trip table. Similarly in this step we will connect the S3 bucket as a data source to Foresight for the three contextual feature tables.

### Create a Foresight ML sources file

Foresight establishes connections with data sources through a Foresight ML sources file. In the Model 2 example, we generated a Foresight ML sources file to connect the S3 bucket to the Foresight platform, enabling access to the trip table. Additionally, we created another Foresight ML sources file to incorporate three new contextual feature sources. These same files will be employed in this example.
Use the templates and code snippets available at the icons to the left. Refer to the Foresight User Manual for help.
Alternatively you may use the Foresight ML sources file from this tutorial.

<br> The relevant sections in the `trip_fare_2_data_sources.yml` file look like this:
    
            meta:
              source_type: aws
              source_format: csv
              path: s3a://foresight-tutorial/trip_fare/<table_name>.csv         <<<<                
              anon: true                                                              
              infer_schema: true                                                      
              header: true                                                            
              delimiter: ','                                                          
              s3_endpoint_url: https://foresight-tutorial.s3.us-west-2.amazonaws.com  
              batch_schedule: -1d

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

#### Add column schema to your data sources file

Foresight can automatically infer column schema from your data sources and update the ML sources file. Use the `add columns` command to automatically infer and update the ML sources file with the data source column schema. After this command completes, you must review the column schema for correctness and if necessary edit the ML sources file to fix column names or data types. Alternatively you may manually edit the ML sources file and add all the column names and data types to match your data source schema.

In [None]:
add columns trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_data_sources.yml

## Training DAG:

<html><img src="../../images/trip_fare_images/4_1.png"/></html>

The Scheduled DAG feature empowers users to set up automated schedules for each stage of the machine learning pipeline. Users can define when specific tasks, such as featureset updates, dataset generation, or model retraining, should occur. This level of automation not only optimizes resource utilization but also ensures that models are regularly updated with the latest data, enhancing their accuracy and relevance.



This functionality of the Schedule DAG is triggered by a user-defined JSON file, offering a flexible and user-friendly approach to configuring and scheduling tasks. This JSON file is like a detailed plan. It clearly lists the tasks that need to be done, making a straightforward guide for the DAG to automatically carry out its workflow.

### The DAG JSON file structure is organized as follows:

**schedule_time:** Users specify the exact time they want the DAG to run.\
**schedule_interval:** Users define the frequency of DAG execution, indicating the interval between each run. users can also use a negative interval value to indicate a one-time execution.\
**dag_jobs:** A list of jobs or tasks that the DAG will execute. Each item in this list is a dictionary containing:

- **name:** Users assigns a name for the specific task.
- **command:** Users provides the command for a particular job, such as starting a featureset or creating a dataset.
- **parent_name:** Users can specify whether this task should consider the status of prior jobs before executing. It establishes a dependency on the parent job.
- **start_delay_secs:** Users can introduce a delay in seconds before executing the job, providing a time buffer if needed.



Json file which we will be using for this example is as follows

In [None]:
!cat trip_fare_dagfile.json

## Preparation and Scheduling DAG:
Before scheduling the DAG, we must first create job files for each task outlined in the JSON file.

In this example, we will utilize the same job files used in Model 1

In [None]:
schedule dag trip_fare_prediction_model_2/trip_fare_training_dag.json

The **status** DAG command provides the current status of the DAG and each individual task mentioned in the JSON file. For tasks that have been initiated, it returns their current status. If a job has started, it indicates its ongoing status. If a particular job is in a waiting state, waiting for its parent job to complete, the status will be reported as "job not found" until it starts. This command helps to monitor the progression of DAG and the status of each task within it.

In [None]:
status dag trip_fare_dagfile

The **stop** DAG command is used to halt the DAG immediately, along with all its individual running jobs. Furthermore, it prevents the DAG from executing any further, especially if it is scheduled at intervals or set for recurring jobs. This command ensures an abrupt cessation of all ongoing processes, offering users control over the DAG's execution.\
**stop** DAG command not only stops the DAG but also forcefully terminates any individual jobs that may already be in progress within the DAG.



In [None]:
stop dag trip_fare_dagfile

## Register a trained ML model

After the training is complete, the `status` command will show COMPLETED status. The trained ML model must be registered before it can be used for predictions. The `list trained-models` command will list all the trained models within a project. The `register model` command will register a trained model. The `list registered-models` will list all registered models within a project.

##### To list all the models that have been trained

In [None]:
list trained-models trip_fare_2_dl_model

##### Run this cell to register the model

In [None]:
register model trip_fare_2_dl_model,1,PRODUCTION

#### To list all registered models

In [None]:
list registered-models

***

# Serve an ML Model

<html><img src="../../images/trip_fare_images/2_6.png"/></html>

In this step we will deploy the trained ML model to serve prediction requests. 

### Create a Foresight ML job file for model serving

ML models are deployed via a Foresight ML job file which specifies the ML serving options. 

Create a Foresight ML job file using the registered-model version that you want to serve. 

The `create prediction` command takes 2 required parameters the registered-model name and the model version. The 'dir' parameter specifies the location where the generated files will be saved. The command will generate 3 files, a Foresight ML job file, a sources yaml and a sample curl command requests file. Refer to the Foresight User Manual for help.

The sources yaml will contain definitions for two REST sources, one for the prediction REST request and one for the prediction REST response and a definition for the prediction log table.

**In the command below, replace the version '1' with the version of the registered model you are using.**

In [None]:
create prediction trip_fare_2_dl_model,1,dir=trip_fare_prediction_model_2/

***

### Inspect the model serving files

Inspect the model serving ML job file and the definitions for the prediction REST request, prediction REST response and the prediction log table.

**Note: The generated files names have the model version number as shown below. In the commands below, replace the version '1' with the version of the registered model you are using.**

    Example : <model name>_<version>_serve.ml , <model name>_<version>_sources.yml

    

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_dl_model_1_serve.ml

In [None]:
!cat trip_fare_prediction_model_2/trip_fare_2_dl_model_1_sources.yml

### Deploy the model - Prediction DAG

<html><img src="../../images/trip_fare_images/4_2.png"/></html>

We will schedule a DAG to start prediction and for deploying a model.

In [None]:
schedule dag trip_fare_prediction_model_2/trip_fare_prediction_dag.json

The command '**status dag**' will provide the current status of the model serving. The URL presented in the output serves as the endpoint for sending REST prediction requests, which can be done using tools such as curl or other suitable methods.

In [None]:
status dag trip_fare_prediction_dagfile

## Predict trip fare amounts

Use the `test prediction` command to send prediction requests to the deployed model. The command by default uses the last 10 rows from the training dataset for prediction request data and sends curl requests to the deployed model. The predictions responses are collected and displayed.

Refer to the Foresight User Manual for help.

Note: Once you run start prediction command, a prediction service starts running which is ready for serving. You can use the URL the prediction service gives you to send curl requests. Upon running the test prediction it also outputs the "Example Curl Request". Use this Curl request example to send data to predcition service or integrate the same into applications which where the predictions can be served.

In [None]:
test prediction trip_fare_2_dl_model_1_serve

Below is a markdowncell which shows how to run the Curl Request to fetch predictions. Convert the cell into Code state and then enter the prediction URL in the space mentioned and execute the cell to get response.

!curl -X GET ">enter the prediction URL here<" -H "Content-Type: application/json" -d '[{"rest_request_id": "prediction_test-1", "pickup_datetime": "2022-11-12 11:29:05", "pickup_zipcode": "10069", "dropoff_zipcode": "10107", "passenger_count": 3}]'

### Stop the deployed model

Use the `stop dag` command to stop ML model serving when you have completed the prediction requests. This step is optional, you may choose to leave the model deployed.

In [None]:
stop prediction trip_fare_2_dl_model_1_serve