# 🚀 **Arize and Neptune Walkthrough**

Let's get started on using Arize with Neptune! ✨

Arize and Neptune are MLOps tools that aim to improve connected, but different parts of your ML pipeline and ML workflow. Arize helps you visualize your production model performance, understand drift & data quality issues. Neptune logs, stores, displays, and compares all your MLOps metadata for better experiment tracking. 

With Arize and Neptune, you will be able to train the best model, and pre-launch validate your model, and compare production performances of those models.


## ✔️ Steps for this Walkthrough
1. Initialize Neptune and set-up Arize client
2. Logging training callbacks to Neptune
3. Logging training and validation records to Arize
4. Storing and versioning model weights with Neptune
5. Logging and versioning model in production with Arize


# Step 1: Initialize Neptune and set-up Arize client

## Step 1.2 Set-up Neptune Project
First you will need to create a Neptune account and follow these steps
1. Sign up for an account and replace `YOUR_USER_NAME` with your client name
2. Copy your `API_TOKEN`  from top right of the neptune nav bar
3. Create a new `Project` and name it `ArizeIntegration`.

In [None]:
!pip install neptune-client -q
!pip install neptune-tensorflow-keras -q

import neptune.new as neptune
from neptune.new.integrations.tensorflow_keras import NeptuneCallback

NEPTUNE_USER_NAME = 'NEPTUNE_USER_NAME'
NEPTUNE_API_TOKEN = 'NEPTUNE_API_TOKEN'

if NEPTUNE_USER_NAME == 'NEPTUNE_USER_NAME' or NEPTUNE_API_TOKEN == 'NEPTUNE_API_TOKEN': 
    raise ValueError("❌ NEED TO CHANGE USERNAME AND/OR API TOKEN")

# set parameters for initializing Neptune
PROJECT_NAME = f"{NEPTUNE_USER_NAME}/ArizeIntegration"
run = neptune.init(project=PROJECT_NAME, api_token=NEPTUNE_API_TOKEN)

print('Step 1.1 ✅: Initialize Neptune run and project complete!')

https://app.neptune.ai/alanschen/ArizeIntegration/e/AR3-1
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
Step 1.1 ✅: Initialize Neptune run and project complete!


## Step 1.2: Set-up Arize Client
To set up Arize, copy the Arize `API_KEY` and `ORG_KEY` from your admin page linked below!

[![Button_Open.png](https://storage.googleapis.com/arize-assets/fixtures/Button_Open.png)](https://app.arize.com/admin)

In [None]:
!pip install arize -q
from arize.api import Client
from arize.types import ModelTypes

ORGANIZATION_KEY = 'ORGANIZATION_KEY'
API_KEY = 'API_KEY'
arize = Client(organization_key=ORGANIZATION_KEY, api_key=API_KEY)

model_id = 'neptune_cancer_prediction_model'
model_version = 'v1'
model_type = ModelTypes.BINARY

if ORGANIZATION_KEY == 'ORGANIZATION_KEY' or API_KEY == 'API_KEY': 
    raise ValueError("❌ NEED TO CHANGE ORGANIZATION AND/OR API_KEY")
else: 
    print("Step 1.2 ✅: Initialize Arize client complete!")

Step 1.2 ✅: Initialize Arize client complete!


# Step 2: Logging training callbacks to Neptune

Neptune tracks your model training callbacks, allowing training loss curves to be logged and visualized for each different training iterations. In this example, we will be working with a `tensorflow.keras` model to build a model for classifying whether an individual has breast cancer or not.

## Step 2.1: Import Dataset

In [None]:
import numpy as np
import pandas as pd
import uuid
import os
import concurrent.futures as cf
from sklearn import datasets, preprocessing
from sklearn.model_selection import train_test_split
import datetime

def process_data(X, y):
    scaler = preprocessing.MinMaxScaler()
    X = np.array(X).reshape((len(X), 30))
    y = np.array(y)
    return X, y

# 1 Load data and split data
data = datasets.load_breast_cancer()

X, y = datasets.load_breast_cancer(return_X_y=True)
X, y = X.astype(np.float32), y

X, y = pd.DataFrame(X, columns=data['feature_names']), pd.Series(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, random_state=42)

print('Step 2.1 ✅: Load Data Done!')

Step 2.1 ✅: Load Data Done!


# Step 2.2 Logging Training Callbacks
By passing `run` instance and `base_namespace=PROJECT_NAME` defined earlier, a live training curve should show up on Neptune under the **Charts** tab.

In [None]:
import tensorflow.keras as keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
import tensorflow as tf

# Step 1: Define and compile model
model = Sequential()
model.add(Dense(10, activation='sigmoid', input_shape=((30,))))
model.add(Dropout(0.25))
model.add(Dense(20, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(10, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer=keras.optimizers.Adam(), 
              loss=keras.losses.mean_squared_logarithmic_error)

# Step 2: Fit model and log callbacks

params = {'batch_size': 30,
          'epochs': 50,
          'verbose': 0,
         }

callbacked = model.fit(X_train, y_train, 
                batch_size=params['batch_size'], 
                epochs=params['epochs'], 
                verbose=params['verbose'], 
                validation_data=(X_test, y_test),
                # log to Neptune using NeptuneCallback
                callbacks=[NeptuneCallback(run=run, base_namespace=PROJECT_NAME)]
                )

print('Step 2.2 ✅: Training callbacks successfully logged!')

Step 2.2 ✅: Training callbacks successfully logged!


# Step 3: Logging training and validation records to Arize
Arize allows you to log training and validation records to an **Evaluation Store** for model pre-launch validation, such as visualizing performance across different feature slices (i.e, model accuracy for lower income individuals v.s higher). 

The records you send in can also serve as your model baseline, which can be compared against the features your models predict on in production to inform you when the distributions of the features have shifted. You can click here to access the documentation for our Python SDK and `arize.log_training_records`.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://arize.gitbook.io/arize/apis/python-sdk-1/arize.log_training_records)

In [None]:
# OPTIONAL: A quick helper function to validate Arize responses
def arize_responses_helper(responses):
    for response in cf.as_completed(responses):
        res = response.result()
        if res.status_code != 200:
            raise ValueError(f'future failed with response code {res.status_code}, {res.text}')

## Step 3.1: Logging Training Records to Arize

In [None]:
# Use the model to generate predictions
y_train_pred = model.predict(X_train).T[0]
y_val_pred = model.predict(X_val).T[0]
y_test_pred = model.predict(X_test).T[0]

# Logging training
train_prediction_labels = pd.Series(y_train_pred)
train_actual_labels = pd.Series(y_train)
train_feature_df = pd.DataFrame(X_train, columns=data['feature_names'])

train_responses = arize.log_training_records(
    model_id=model_id,
    model_version=model_version,
    model_type=model_type, # this will change depending on your model type
    prediction_labels=train_prediction_labels,
    actual_labels=train_actual_labels,
    features=train_feature_df,
    )

arize_responses_helper(train_responses)

print('Step 3.1 ✅: If no errors showed up, you have sent Training Inferences!')

Step 3.1 ✅: If no errors showed up, you have sent Training Inferences!


## Step 3.2 Logging Validation to Arize

In [None]:
val_prediction_labels = pd.Series(y_val_pred)
val_actual_labels = pd.Series(y_val)
val_features_df = pd.DataFrame(X_val, columns=data['feature_names'])

val_responses = arize.log_validation_records(
    model_id=model_id,
    model_version=model_version,
    model_type=model_type,
    batch_id='batch0',
    prediction_labels=val_prediction_labels,
    actual_labels=val_actual_labels,
    features=val_features_df,
    )

arize_responses_helper(val_responses)
print('Step 3.2 ✅: If no errors showed up, you have sent Validation Inferences!')

Step 3.2 ✅: If no errors showed up, you have sent Validation Inferences!


# Step 4: Storing and Versioning Model Weights with Neptune
Neptune allows for you to organize your models in a folder like structure through the `run` instance of each project. You can organize different trained iterations using tag `model_version` you used to log training records to Arize for better integration.

**Note: Code for model storing is different for different frameworks. The following is only applicable for tf.keras**

In [None]:
import glob

# Storing model version 1
directory_name = f'keras_model_{model_version}'
model.save(directory_name)

run[f'{directory_name}/saved_model.pb'].upload(f'{directory_name}/saved_model.pb')
for name in glob.glob(f'{directory_name}/variables/*'):
    run[name].upload(name)

print('Step 4 ✅: If no errors showed up, can should now see the folders in your Neptune Project')

INFO:tensorflow:Assets written to: keras_model_v1/assets
Step 4 ✅: If no errors showed up, can should now see the folders in your Neptune Project


# Step 5: Logging and versioning model in production with Arize
During production, you can use `arize.bulk_log` or `arize.log` in the Python SDK to log any data in your model serving endpoint. In this example, we send in our test data simulating production setting. But in production, you would deploy the models saved by Neptune prior to logging to Arize!

You can find more about `arize.bulk_log` here.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://arize.gitbook.io/arize/apis/python-sdk-1/arize.bulk_log)

In [None]:
import datetime
# Generating Predictions
y_test_pred = pd.Series(y_test_pred)
num_preds = len(y_test_pred) # num_preds == 143

# Generating Prediction IDs
ids_df = pd.DataFrame([str(uuid.uuid4()) for _ in range(num_preds)])

# Logging the Predictions, Features, and Actuals
log_predictions_responses = arize.bulk_log(
    # Required arguments
    model_id=model_id,
    prediction_ids=ids_df,
    # Optional arguments
    model_version=model_version,
    prediction_labels=y_test_pred,
    actual_labels=y_test,
    features=X_test, # we recommend logging features with predictions
    model_type=model_type, # we recommend using model_type on first time logging to Arize
    feature_names_overwrite=None,
    )

arize_responses_helper(log_predictions_responses)
print('Step 5 ✅: If no errors appear, you just logged {} total predictions, features, and actuals to Arize!'.format(num_preds))

Step 5 ✅: If no errors appear, you just logged 143 total predictions, features, and actuals to Arize!
