### 7. Python API Training - Using a Database [Solution]
**Author**: Thodoris Petropoulos

**Contributors**: Rajiv Shah

This is the 7th exercise to complete in order to finish your Python API Training for DataRobot course! This exercise teaches you how to use a database in order to read datasets in order to train models or dump your predictions in a table.

Here are the actual sections of the notebook alongside time to complete:

1. Connect to DataRobot. [3min]
2. Connect to the SQLite database provided. [5min]
3. Load the `readmissions` dataset that needs scoring. [10min]
4. Use one of the deployments generated earlier to score the dataset. [15min]
5. Writeback predictions in the database

As always, consult:

[API Documentation](https://datarobot-public-api-client.readthedocs-hosted.com) <br>
[Samples](https://github.com/datarobot-community/examples-for-data-scientists) <br>
[Tutorials](https://github.com/datarobot-community/tutorials-for-data-scientists)

The last two links should provide you with the snippets you need to complete most of these exercises.

**Data**

The dataset used in the current exercise can be reached via `databases/test_database.db`

#### Import Libraries

In [None]:
import sqlite3
import pandas as pd
import datarobot as dr

#### 1. Connect to DataRobot

In [None]:
#Possible solution
dr.Client(config_path='../github/config.yaml')

#### 2. Connect to the SQLite database provided. [5min]

The SQLite database is under the folder `databases/test_database.db`. To create a connection, use the `sqlite3` library. It becomes apparent that while using Python, it really does not matter where your data is, you could just invoke the appropriate libraries and load your data.

In [None]:
#Possible Solution
conn = sqlite3.connect('databases/test_database.db')

### 3. Load the readmissions dataset that needs scoring. [10min]
The readmissions dataset is saved within the `test_database`.

**Instructions** 
1. Query the first 100 observations.
2. Save them into a pandas DataFrame.

In [None]:
# Possible Solution
df = pd.read_sql_query('SELECT * FROM readmissions LIMIT 100', conn)
df.to_csv('dataset_to_be_scored.csv',index=False)

#### 4. Use one of the deployments generated earlier to score the dataset. [15min]
**Instructions**
1. Navigate to `Deployments` page within DataRobot.
2. Find the Python code that allows you to make predictions using the API under `integrations` tab.
3. Score the dataset and save the results in a new pandas dataframe.

In [None]:
# Possible Solution

"""
Usage:
    python datarobot-predict.py <input-file.csv>
 
This example uses the requests library which you can install with:
    pip install requests
We highly recommend that you update SSL certificates with:
    pip install -U urllib3[secure] certifi
"""
import sys
import json
import requests
 
API_KEY = ''
DATAROBOT_KEY = ''
 
DEPLOYMENT_ID = ''
 
MAX_PREDICTION_FILE_SIZE_BYTES = 52428800  # 50 MB
 
 
class DataRobotPredictionError(Exception):
    """Raised if there are issues getting predictions from DataRobot"""
 
 
def make_datarobot_deployment_predictions(data, deployment_id):
    """
    Make predictions on data provided using DataRobot deployment_id provided.
    See docs for details:
         https://app.eu.datarobot.com/docs/users-guide/predictions/api/new-prediction-api.html
 
    Parameters
    ----------
    data : str
        Feature1,Feature2
        numeric_value,string
    deployment_id : str
        The ID of the deployment to make predictions with.
 
    Returns
    -------
    Response schema:
        https://app.eu.datarobot.com/docs/users-guide/predictions/api/new-prediction-api.html#response-schema
 
    Raises
    ------
    DataRobotPredictionError if there are issues getting predictions from DataRobot
    """
    # Set HTTP headers. The charset should match the contents of the file.
    headers = {
        'Content-Type': 'text/plain; charset=UTF-8',
        'Authorization': 'Bearer {}'.format(API_KEY),
        'DataRobot-Key': DATAROBOT_KEY,
    }
 
    url = ''\
          'predictions'.format(deployment_id=deployment_id)
    # Make API request for predictions
    predictions_response = requests.post(
        url,
        data=data,
        headers=headers,
    )
    _raise_dataroboterror_for_status(predictions_response)
    # Return a Python dict following the schema in the documentation
    return predictions_response.json()
 
 
def _raise_dataroboterror_for_status(response):
    """Raise DataRobotPredictionError if the request fails along with the response returned"""
    try:
        response.raise_for_status()
    except requests.exceptions.HTTPError:
        err_msg = '{code} Error: {msg}'.format(
            code=response.status_code, msg=response.text)
        raise DataRobotPredictionError(err_msg)
 
 
def main(filename, deployment_id):
    """
    Return an exit code on script completion or error. Codes > 0 are errors to the shell.
    Also useful as a usage demonstration of
    `make_datarobot_deployment_predictions(data, deployment_id)`
    """
    if not filename:
        print(
            'Input file is required argument. '
            'Usage: python datarobot-predict.py <input-file.csv>')
        return 1
    data = open(filename, 'rb').read()
    data_size = sys.getsizeof(data)
    if data_size >= MAX_PREDICTION_FILE_SIZE_BYTES:
        print(
            'Input file is too large: {} bytes. '
            'Max allowed size is: {} bytes.'
        ).format(data_size, MAX_PREDICTION_FILE_SIZE_BYTES)
        return 1
    try:
        predictions = make_datarobot_deployment_predictions(data, deployment_id)
    except DataRobotPredictionError as exc:
        print(exc)
        return 1
    return predictions

filename = 'dataset_to_be_scored.csv'
result = main(filename, DEPLOYMENT_ID)
result_df = pd.DataFrame(result['data'])

#### 5. Writeback predictions in the database

**Instructions**
1. Join the results table with the original data used to score
2. Save the results in a table called `prediction_results` within the `test_database_db`

**Hint**: There are Pandas DataFrame methods that allow you to save the results to SQL and also append results if table already exists so keep that in mind.

In [None]:
# Possible Solution

#They can be joined based on index without an issue
final_scored_data = df.join(result_df)

#Drop column with probabilities (Could also be preprocessed to get the actual predicted probability)
final_scored_data.drop('predictionValues',axis=1,inplace=True)

#Save results to the database
final_scored_data.to_sql('prediction_results',conn,if_exists = 'append', index = False)