# Using the AI Catalog

This code example provides instruction on how to create and share datasets in the AI Catalog and then use them to create projects and generate predictions.

Download this notebook from the [code examples home page](index).

## Requirements

* Python 3.7+
* DataRobot API version 2.21+

### Import libraries

In [None]:
import yaml
import requests
import pandas as pd
import datarobot as dr

### Connect to DataRobot

Read more about different options for [connecting to DataRobot from the client](https://docs.datarobot.com/en/docs/api/api-quickstart/api-qs.html).

In [None]:
# To connect to a Zepl notebook:
# dr.Client(token=z.getDatasource("datarobot_api")['token'] , endpoint='https://app.datarobot.com/api/v2')

# To connect to a Jupyter notebook:
dr.Client(config_path = '/Users/nathan.goudreault/.config/datarobot/drconfig.yaml')

### Creating a dataset or a data source

From the following commands, use the code that corresponds to your dataset or data source type to upload it to the AI Catalog. You can also use commands to [connect to a database](#connecting-to-a-database). Be sure to indiciate the correct path to your dataset.

In [None]:
path_to_data = 'data.csv' # Provide your dataset here

In [None]:
# From a local file
dataset = dr.Dataset.create_from_file(file_path=path_to_data)

In [None]:
# From a file object
with open(path_to_data, 'rb') as f:
    dataset = dr.Dataset.create_from_file(filelike=f)

In [None]:
df = pd.read_csv(path_to_data)
df_lst = df.to_dict(orient='records')

In [None]:
# From a pandas data frame
dataset = dr.Dataset.create_from_in_memory_data(data_frame=df)

In [None]:
# From a list of dictionaries representing rows of data
dataset = dr.Dataset.create_from_in_memory_data(records=df_lst)

In [None]:
# Based on CSV data from a URL
dataset = dr.Dataset.create_from_url(url='https://data.csv')

### Connect to a database

In [None]:
# Get a driver
ms_sql_driver = [drv for drv in dr.DataDriver.list() if drv.class_name == 'com.microsoft.sqlserver.jdbc.SQLServerDriver'][-1]

# Create a data store
datastore = dr.DataStore.create(data_store_type='jdbc', 
                                canonical_name='Demo DB', 
                                driver_id=ms_sql_driver.id, 
                                jdbc_url=creds['jdbc_url'])

# Create a data source based on a query
query = "select * from db.schema.table"
params = dr.DataSourceParameters(data_store_id=datastore.id, 
                                 query=query)

datasource = dr.DataSource.create(data_source_type='jdbc', 
                                  canonical_name='datasource_query', 
                                  params=params)

# Create a data source based on a table
params = dr.DataSourceParameters(data_store_id=datastore.id, 
                                 schema='schema',
                                 table='table')

datasource = dr.DataSource.create(data_source_type='jdbc', 
                                  canonical_name='datasource_table', 
                                  params=params)

### Share a dataset or data source

Use the following command to specify a list of users to share data with and assign them a role.

In [None]:
users = ['user@domain.com']
role = dr.enums.SHARING_ROLE.READ_ONLY

In [None]:
# To share via an API call:
data = {'data': [{'username': user, 'role': role} for user in users]}
sharing_resp = dr_rest_call(f'/api/v2/datasets/{dataset.id}/accessControl', requests.patch, payload=data)

In [None]:
# To share a data source with Python:
access_lst = [dr.SharingAccess(username=user, role=role) for user in users]
datasource.share(access_lst)

### Create a project

In [None]:
# Create a project from a dataset
dr.Project.create_from_dataset(dataset_id=dataset.id, 
                               project_name=dataset.name)

In [None]:
# Create a project from a data source
dr.Project.create_from_data_source(data_source_id=datasource.id, 
                                   username=creds['db_user'], 
                                   password=creds['db_pass'], 
                                   project_name=datasource.canonical_name
                                  )

### Use a dataset to generate batch predictions

You can use a dataset to generate batch predictions for a deployment. Before proceeding, select a deployment and obtain its [deployment ID](https://docs.datarobot.com/en/docs/predictions/predapi/dep-pred.html#predictions-for-deployments). Additionally, provide the dataset ID (obtained from the AI Catalog).

In [None]:
deployment_id = 'deployment id'
dataset_id = 'dataset id'

# Prepare the parameters to run a batch prediction job
data = {'deploymentId': deployment_id,
        'passthroughColumnsSet': 'all',
        'intakeSettings': 
            {'type': 'dataset',
             'datasetId': dataset_id},
        'outputSettings':
            {'type': 'localFile', 
            }
       }

In [None]:
# Initiate a batch prediction job
batch_pred_resp = dr_rest_call('/api/v2/batchPredictions', requests.post, payload=data)

# Retrieve the job ID and its object
batch_pred_job_id = batch_pred_resp.json()['id']
batch_pred_job = dr.BatchPredictionJob.get(batch_pred_job_id)

# Once run, wait for the job to complete and for the results to write
batch_pred_job.wait_for_completion()
with open('data/predictions.csv', 'wb') as f:
    batch_pred_job.download(f)