# Snowflake - Setup Automated Scoring Jobs End to End

### Scope
The scope of this notebook is to provide instructions on how to use the python API to setup automated batch prediction scoring jobs. You might want to make a one-time batch prediction, but you might also want to schedule regular batch prediction jobs. This section shows how to create and schedule batch prediction jobs via python API


### Background
Making predictions on a daily, monthly basis is manual, time consuming and cumbersome process. Batch predictions are typical where users have to score new records on a given time frequency - for example scoring new leads on a monthly basis to predict who will convert or refreshing predictions on a daily basis for which products someone is likely to purchase!

### Key Documentation
- Python API Documentation: https://datarobot-public-api-client.readthedocs-hosted.com
- Getting Started - https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/setup/getting_started.html#installation
- Schedule batch prediction jobs via UI - https://docs.datarobot.com/en/docs/predictions/batch/batch-dep/batch-pred-jobs.html#schedule-recurring-batch-prediction-jobs

### You Will Learn How To Use the Python API to:
1. Retrieve existing data store and credential information
2. Setup prediciton job specifications
3. Setup prediciton job schedule
4. Run Test Prediction Job & Enable Automated Schedule For Scoring

## ===================================================================

### Prerequisties
1. An established data connection for reading data and writeback of predictions. To create a new data connection - https://docs.datarobot.com/en/docs/data/connect-data/data-conn.html
2. An existing deployment made in Datarobot - https://docs.datarobot.com/en/docs/mlops/deployment/deploy-methods/deploy-model.html#deploy-from-the-leaderboard

## ===================================================================

### 1. Import Libraries

In [0]:
import datarobot as dr
import pandas as pd

### 2. Setup Connection to Datarobot
To make sure only authorized users access the DataRobot API, you need an API token. To get a token, log in to the DataRobot web UI, click your profile icon, and select Developer Tools. API tokens are shown under API Keys, and you can create a new one if needed.

In [0]:
endpoint = 'https://app.datarobot.com/api/v2'

# this can be found in DR UI - under developer tools section in the top right menu - Copy API token
api_token = 'API_TOKEN'

#setup
dr.Client(token=api_token, endpoint=endpoint)

## ===================================================================

### 3. Retrieve existing data store and credential information

To enable integration with a variety of enterprise databases, DataRobot provides a “self-service” JDBC product for database connectivity setup. Once configured, you can read data from production databases for model building and predictions. This allows you to quickly train and retrain models on that data, and avoids the unnecessary step of exporting data from your enterprise database to a CSV for ingest to DataRobot. It allows access to more diverse data, which results in more accurate models.

- Python Docs - https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/entities/database_connectivity.html?highlight=data%20store
- Data Connectivity - https://docs.datarobot.com/en/docs/data/connect-data/data-conn.html#create-a-new-connection
- Credentials Management - https://docs.datarobot.com/en/docs/data/connect-data/stored-creds.html#credentials-management

In [0]:
# list of data stores you have configured
for d in dr.DataStore.list():
    print(d.id,d.canonical_name,d.params)

In [0]:
# get list of credentials
dr.Credential.list()

### 4. Setup prediciton job specifications - data store, credentials, intake / writeback settings, and deployment

- Supported Settings - https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/entities/batch_predictions.html?highlight=output#supported-output-types
- Statement & Configurations - https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/output-options.html#statement-types

In [0]:
data_store_name = 'Snowflake Connection'
creds_name = 'CFDS_USER_AA'

deployment_id = '620219bb18f7f84dec6cec59'

data_store_id = [ds.id for ds in dr.DataStore.list() if ds.canonical_name == data_store_name][0]
credential_id = [cr.credential_id for cr in dr.Credential.list() if cr.name == creds_name][0]

print(credential_id, data_store_id, deployment_id)

In [0]:
# check deployment name
deployment = dr.Deployment.get(deployment_id)
deployment.label

In [0]:
# setup intake settings

intake_settings = {
    'type': 'jdbc',
    'table': 'LENDING_CLUB_10K',
    'schema': 'TRAINING', # optional, if supported by database
    'catalog': 'DEMO', # optional, if supported by database 
    'data_store_id': data_store_id,
    'credential_id': credential_id,
}

print(intake_settings)

In [0]:
# setup output settings

output_settings = {
    'type': 'jdbc',
    'table': 'LENDING_CLUB_10K_AA_Temp',
    'schema': 'SCORING', # optional, if supported by database
    'catalog': 'SANDBOX', # optional, if supported by database schema
    'statement_type': 'insert',
    'create_table_if_not_exists': True,
    'data_store_id': data_store_id,
    'credential_id': credential_id,
    }

print(output_settings)

#For local file export
#output_settings={
#    'type': 'localFile',
#    'path': './predicted.csv',
#}

#print(output_settings)


### 5. Setup prediciton job schedule
- Example - https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/job-scheduling.html#schedule-batch-prediction-jobs

In [0]:
# setup schedule to run end to end pipeline monthly on day 1 at 7 59 am
schedule = {
        "minute": [59],
        "hour": [7],
        "month": ["*"],
        "dayOfWeek": ["*"],
        "dayOfMonth": [1],
    }
schedule

In [0]:
# Combine parameters for prediction job
job = {
    "deployment_id" : deployment_id,
    "num_concurrent": 4,
    "intake_settings" : intake_settings,
    "output_settings" : output_settings,
    "passthroughColumnsSet": "all"

}

### 6. Run Test Prediction Job & Enable Automated Schedule For Scoring

- Access Prediction Jobs In UI - https://docs.datarobot.com/en/docs/predictions/batch/batch-dep/batch-pred-jobs.html#filter-prediction-jobs
- Configure Prediction Jobs Python - https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/entities/batch_prediction_job_definitions.html?highlight=schedule

In [0]:
# Setup job definition for recurring and automatic scoring
definition = dr.BatchPredictionJobDefinition.create(
       enabled=True,
       batch_prediction_job=job,
       name="Monthly Prediction Job Snowflake",
       schedule=schedule
)
definition


In [0]:
# Test job once if needed
definition = dr.BatchPredictionJobDefinition.get("63a1f7c2d32ed1f544fd467a")
job = definition.run_once()
job.wait_for_completion()


In [0]:
# enable automated schedule
job_run_automatically = definition.run_on_schedule(schedule)


## ========================================================================

- Dated: 12/20/2022
- Author: Arjun Arora