# Algorithmia Integration Notebook

Let's get started on using Arize with Algorithmia!✨

Arize helps you visualize your model performance, understand drift & data quality issues, and share insights learned from your models. Algorithmia is a platform for model serving and help you manage machine learning at scale.

In this notebook, we show that we can quickly create a model and download it to serve on Algorithmia, and integration with Arize platform can be done directly on Algorithmia, without needing any environment dependency aside from Algorithmia from user environment.

1. **Step 1-2:** creates a basic model and downloads it as a .pkl file to be loaded on algorithmia.
2. **Step 3-4** Importing and testing Arize API
2. **Step 5** Develop and Build on Algorithmia
3. **Step 6** Test and verify results from Algorithmia!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Arize-ai/client_python/blob/main/arize/examples/tutorials/Arize_Tutorial_Algorithmia_Integration.ipynb)

## Step 1: Load Data and Build Model

In [None]:
!wget https://storage.googleapis.com/arize-assets/tutorials/b_open_source_dataset.csv
import pandas as pd
import xgboost
import uuid
import concurrent.futures as cf
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

###############################################################################
# 1 Load data and split data
def load_dataset(file):
    data = pd.read_csv(file, delimiter=";", header='infer')
    data = pd.get_dummies(data, columns=['job','marital',
                                         'education','default',
                                         'housing','loan',
                                         'contact','month',
                                         'poutcome'])
    data.y.replace(('yes', 'no'), (1, 0), inplace=True)
    return data

data = load_dataset('b_open_source_dataset.csv')
X, y = data.drop(['y'], axis=1), pd.Series(data['y'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2, stratify=y)

###############################################################################
# 2 Fit a classification model

clf = XGBClassifier().fit(X_train, y_train.values.ravel())

# 3 Use the model to generate predictions
y_train_pred = clf.predict(X_train)
y_test_pred = clf.predict(X_test)

print('Step 1 ✅: Load Data & Build Model Done!')

## Step 2: Download Model for Serving on Algorithmia

In [None]:
import pickle
from IPython.display import display, FileLink # if not using Google Colab
from google.colab import files

# Creating the pickle file
model_name = "algorithmia_example"
filename = "{}.pkl".format(model_name)
pickle.dump(clf, open(filename, 'wb'))

## Download file from Colab
files.download(filename)
print('Step 2 ✅: The file should have been successfully downloaded!')

## Step 3: Import and Setup Arize Client
You can find your `API_KEY` and `SPACE_KEY` by navigating to the settings page in your workspace (only space admins can see the keys). Copy those over to the set-up section. We will also be setting up some metadata to use across all logging.

In [None]:
!pip install arize -q
from arize.api import Client
from arize.utils.types import ModelTypes

SPACE_KEY = 'YOUR_SPACE_KEY'
API_KEY = 'YOUR_API_KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

# Saving model metadata for passing in later
model_id = 'your_model_name'
model_version = '1.0'
model_type = ModelTypes.SCORE_CATEGORICAL

print('Step 3 ✅: Import and Setup Arize Client Done! Now we can start using Arize!')

## Step 4: Testing Arize API
We test one of Arize's Client API `log_bulk_predictions`. We will need to use this segment of code in our `.py` file, and it would behave the exact same way.

In [None]:
readable_features = X_test.loc[X_test.index]
pred = pd.Series([str(y) for y in y_test_pred])
ids = pd.DataFrame([str(uuid.uuid4()) for _ in range(len(y_test_pred))])

tfuture = arize_client.log_bulk_predictions(
    model_id=model_name,
    model_version=model_version, # same as a above cell, named
    model_type =ModelTypes.SCORE_CATEGORICAL, # Arize version 2.0, soon to be optional
    features=readable_features,
    prediction_ids=ids,
    prediction_labels=pred)

## Helper to listen to response code to ensure successful delivery
def arize_responses_helper(responses):
  for response in cf.as_completed(responses):
    res = response.result()
    if res.status_code != 200:
      print(f'future failed with response code {res.status_code}, {res.text}')

arize_responses_helper(tfuture)

print('Step 4 ✅: If no errors show up, you can use this code on Algorithmia!')

## Step 5: Upload the Model & Develop on Algorithmia
Next, follow these steps to build and deploy on Algorithmia:

1. Create a [new algorithm](https://algorithmia.com/users) on Algorithmia
2. Upload the `.pkl` file to the [data center](https://algorithmia.com/data) so that it is accessible by your Algorithm. You can copy the relative path for developing on Algorithmia here as well.
3. Create the `apply(input)` and add any dependencies you have for the environment.
4. Click `Build` and see the model is working in production

**NOTE:** The following are example codes that could get you started on Algorithmia. You don't have to run it in this notebook.

In [None]:
!pip install algorithmia
import Algorithmia
import pandas as pd
from xgboost import XGBClassifier
from arize.api import Client
from arize.utils.types import ModelTypes

import sklearn
import joblib

import datetime
import pickle

# API calls will begin at the apply() method, with the request body passed as 'input'
# For more details, see algorithmia.com/developers/algorithm-development/languages

# setting up algorithmia portal
"""
Note: You need to uncomment this in Algorithmia

filename = "model/path/"
client = Algorithmia.client()
model_file = client.file(filename).getFile().name
versioned_model = joblib.load(model_file)
"""

# setting up arize client
space_key = 'ARIZE_SPACE_KEY'
api_key = 'ARIZE_API_KEY'
arize_client = Client(space_key=space_key, api_key=api_key)
logging = True

def apply(input):
    data = pd.read_json(input)
    result = versioned_model.predict(data)

    if logging:
        # creating names for logging
        model_name = "alan_algorithmia_client"
        model_name_versioned = "{}_ver_{}".format(model_name, datetime.datetime.today().strftime('%m_%d_%Y__%H_%M_%S'))
        readable_features = data.loc[data.index]
        pred = pd.DataFrame([str(y) for y in result])
        ids = pd.DataFrame([str(id) for id in data.index])

        tfuture = arize_client.log_bulk_predictions(
            model_id=model_name,
            model_version=model_name_versioned,
            model_type=ModelTypes.SCORE_CATEGORICAL,
            features=readable_features,
            prediction_ids=ids,
            prediction_labels=pred)

    return pd.DataFrame(result).to_json()

print('Step 5 ✅: This section is just sample code to get you starts on Algorithmia!')

## Step 6: Testing Algorithmia API
Next, we test and simulate an API call to algorithmia to
1. Obtain the prediction made by our model, deployed and ran on Algorithmia
2. Log our production features and results to Arize platform. No need for client to install anything!

To get your Algorithmia API and algorithm name, go to "ALGO_NAME/Install and Use/Python/Use"

In [None]:
import Algorithmia

# Create a dummy tutorial
_, X_test_2, _, y_test_2 = train_test_split(data_X, data_y, test_size=0.04, stratify=data_y)

ALGO_API_KEY = 'ALGO_API_KEY' # TODO: put your Algorithmia API Key
ALGO_ALG_NAME = 'USERNAME/ALGO_ALG_NAME/VERSION' # TODO: put your algorithm name, as shown on API call documentation

# Note: input must be json formatted
input = X_test_2.to_json()
client = Algorithmia.client(ALGO_API_KEY)
algo = client.algo(ALGO_ALG_NAME)
algo.set_options(timeout=60) # optional, for your testing purposes

# Actual format shape of input for pipe(input) should depend on your algorithm
res = algo.pipe(input).result

# Result should also be json formatted
pd.read_json(res)

### Overview
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.
