# UbiOps / Whylabs
This is a cookbook that show cases an example integration between UbiOps and WhyLabs. In this cookbook we will train a model, build it and deploy it to the UbiOps environment, using whylogs to log our data for future monitoring.

## Creating the model
This model is trained on a modified version of the [Used cars dataset](https://www.kaggle.com/valchovalev/car-predictor-usa).

This will be a very simplistic model to predict the prices of used cars based on features such as (horsepower, mileage, year) which could be a helpful tool to check if a car is worth the price it is offered at.

**First we will install our dependencies**

In [None]:
import sys 
!{sys.executable} -m pip install -U pip
!{sys.executable} -m pip install pandas --user
!{sys.executable} -m pip install sklearn --user
!{sys.executable} -m pip install ubiops --user
!{sys.executable} -m pip install whylogs --user # Version needed for lib to work

## Please fill in the configuration variables needed for this cookbook

In [None]:
import os

# Set WhyLabs config variables
WHYLABS_API_KEY = "whylabs.apikey"
WHYLABS_DEFAULT_ORG_ID = "org-1"
WHYLABS_DEFAULT_DATASET_ID = "model-1"


# Set ubiops config variables
API_TOKEN = "Token ubiopsapitoken" # Make sure this is in the format "Token token-code"
PROJECT_NAME = "blog-post"

# Set environment variables
os.environ["WHYLABS_API_KEY"] = WHYLABS_API_KEY
os.environ["WHYLABS_DEFAULT_ORG_ID"] = WHYLABS_DEFAULT_ORG_ID
os.environ["WHYLABS_DEFAULT_DATASET_ID"] = WHYLABS_DEFAULT_DATASET_ID

## Training the model
You execute the below cell to see the training code and then run it to see the model being trained and then generating a model file to use in the deployment in our next step.

In [None]:
import datetime
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from whylogs.app.writers import WhyLabsWriter
from whylogs.app import Session
from whylogs.app.session import get_or_create_session
import pickle

# Loading the data 
data = pd.read_csv("model/training_used_cars_data_modified.csv")

#profile data and write to WhyLabs
today = datetime.datetime.now()
yesterday = today - datetime.timedelta(days=1)

writer = WhyLabsWriter("", formats=[],)
session = Session(project="demo-project", pipeline="pipeline-id", writers=[writer])
with session.logger(dataset_timestamp=yesterday) as ylog:
    ylog.log_dataframe(data)

# Remove rows that are missing data
data.dropna(subset=["horsepower", "mileage"], inplace=True)

# Get prediction column seperate
y = data.price.values
x_data = data.drop(['price'], axis = 1)

# Split the data for testing
x_train, x_test, y_train, y_test = train_test_split(x_data, y, random_state=0)

# Create the linear regression and fit it to the training data
regr = LinearRegression()
regr.fit(x_train, y_train)

# Make predictions using the testing set
y_pred = regr.predict(x_test)

# The coefficients
print(f'Coefficients: \n{regr.coef_}')
# The mean squared error
print(f'Mean squared error: {mean_squared_error(y_test, y_pred)}')
# The coefficient of determination: 1 is perfect prediction
print(f'Coefficient of determination: {r2_score(y_test, y_pred)}')

# Save the built model to our dployment folder
with open('deployment_folder/model.pkl', 'wb') as f:
    pickle.dump(regr, f)

## Creating UbiOps deployment
Now that we have built our AI model and saved it let's create a ubiops deployment to serve requests.

In [None]:
DEPLOYMENT_NAME = 'used-cars-model'
DEPLOYMENT_VERSION = 'v1'

# Import all necessary libraries
import shutil
import os
import ubiops as ubiops

client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN}, 
                                               host='https://api.ubiops.com/v2.1'))
api = ubiops.CoreApi(client)

**Create the deployment**

In [None]:
import os
import pickle
import pandas as pd


class Deployment:

    def __init__(self, base_directory, context):
        """
        Initialisation method for the deployment. It can for example be used for loading modules that have to be kept in
        memory or setting up connections. Load your external model files (such as pickles or .h5 files) here.
        :param str base_directory: absolute path to the directory where the deployment.py file is located
        :param dict context: a dictionary containing details of the deployment that might be useful in your code.
            It contains the following keys:
                - deployment (str): name of the deployment
                - version (str): name of the version
                - input_type (str): deployment input type, either 'structured' or 'plain'
                - output_type (str): deployment output type, either 'structured' or 'plain'
                - language (str): programming language the deployment is running
                - environment_variables (str): the custom environment variables configured for the deployment.
                    You can also access those as normal environment variables via os.environ
        """

        print("Initialising the model")
        self.wl_session = get_or_create_session()
        
        model_file_name = "model.pkl"
        model_file = os.path.join(base_directory, model_file_name)

        with open(model_file, 'rb') as file:
            self.model = pickle.load(file)

    def request(self, data):
        """
        Method for deployment requests, called separately for each individual request.
        :param dict/str data: request input data. In case of deployments with structured data, a Python dictionary
            with as keys the input fields as defined upon deployment creation via the platform. In case of a deployment
            with plain input, it is a string.
        :return dict/str: request output. In case of deployments with structured output data, a Python dictionary
            with as keys the output fields as defined upon deployment creation via the platform. In case of a deployment
            with plain output, it is a string. In this example, a dictionary with the key: output.
        """
        print('Loading data')
        X = pd.read_csv(data['data'])

        print("Prediction being made")
        prediction = self.model.predict(X)
        
        # Writing the prediction to a csv for further use
        print('Writing prediction to csv')
        pd.DataFrame(prediction).to_csv('prediction.csv', header = ['target'], index_label= 'index')
        
        return {
            "prediction": 'prediction.csv',
        }

**Deploy to our UbiOps environment**

In [None]:
import time

# Create the deployment
deployment_template = ubiops.DeploymentCreate(
    name=DEPLOYMENT_NAME,
    description='Used cars predictions',
    input_type='structured',
    output_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='data',
            data_type='blob',
        ),
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='prediction',
            data_type='blob'
        ),
    ],
    labels={"demo": "whylabs"}
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    language='python3.8',
    memory_allocation=512,
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800 # = 30 minutes
)

api.deployment_versions_create(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    data=version_template
)

# Zip the deployment package
shutil.make_archive('deployment_folder', 'zip', '.', 'deployment_folder')

# Upload the zipped deployment package
file_upload_result = api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    file='deployment_folder.zip'
)

ready = False
while not ready:   
    time.sleep(60)
    response = api.deployment_versions_list(project_name=PROJECT_NAME,
        deployment_name=DEPLOYMENT_NAME)
    statuses = [d.status == 'available' for d in response]
    ready = all(statuses)
    
    print("Deployments are NOT ready")

print("Deployments are ready")

## Making requests
If the previous steps were successful now we should have a deployment ready to receive requests. You will notice that there is a test file called `production_used_cars_data.csv` which we will use to create a deployment request.

In [None]:
Y = pd.read_csv('production_used_cars_data.csv')
Y
Y = Y.drop(columns='price')
Y.to_csv('production_used_cars_data.csv')

In [None]:
pd.read_csv('production_used_cars_data.csv')

In [None]:
file_name = 'production_used_cars_data.csv'

# First upload the data to create a blob
blob = api.blobs_create(project_name=PROJECT_NAME, file=file_name)

# Make a request using the blob id as input.
data = {'data': blob.id}
res = api.deployment_version_requests_create(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    data=data
)

# Retrieve the resulting blob
res_blob_id = res.result['prediction']
res_blob = api.blobs_get(PROJECT_NAME, res_blob_id)
result_file_name = 'prediction.csv'

# Write it to a file for further examination
with open(result_file_name, 'w') as f:
    f.write(res_blob.read().decode('utf-8'))

In [None]:
# With our predictions made, we can write the inferencing data to WhyLabs and compare it against the training data

X = pd.read_csv('production_used_cars_data.csv')
Y = pd.read_csv('prediction.csv')
combined = X
combined['price'] = Y['target']
with session.logger() as ylog:
    ylog.log_dataframe(combined)

# #TODO here show some findings from whylabs

## Conclusion
We have now trained a model, used whylabs in the process to gain some insight into our training data, saved the AI model file and used it to create a deployment in our UbiOps environment which is now ready to receive requests and logs each request data to whylabs so you can spot the kind of data coming in and be able to improve on your model in the future.