# üöÄ **Arize and Neptune Walkthrough**

Let's get started on using Arize with Neptune! ‚ú®

Arize and Neptune are MLOps tools that aim to improve connected, but different parts of your ML pipeline and ML workflow. Arize helps you visualize your production model performance, understand drift & data quality issues. Neptune logs, stores, displays, and compares all your MLOps metadata for better experiment tracking and model registry.

With Arize and Neptune, you will be able to train the best model, and pre-launch validate your model, and compare production performances of those models.


## ‚úîÔ∏è Steps for this Walkthrough
1. Initialize Neptune and set-up Arize client
2. Logging training callbacks to Neptune
3. Logging training and validation records to Arize
4. Storing and versioning model weights with Neptune
5. Logging and versioning model in production with Arize


# Step 1: Initialize Neptune and set-up Arize client

## Step 1.2 Set-up Neptune Project
First you will need to create a Neptune account and follow these steps
1. Sign up for an account and replace `YOUR_USER_NAME` with your client name
2. Copy your `API_TOKEN`  from top right of the neptune nav bar
3. Create a new `Project` and name it `ArizeIntegration`. Here is how to [create project](https://docs.neptune.ai/administration/projects#create-project).

In [None]:
!pip install neptune-client -q
!pip install neptune-tensorflow-keras -q

import neptune.new as neptune
from neptune.new.integrations.tensorflow_keras import NeptuneCallback

NEPTUNE_USER_NAME = 'NEPTUNE_USER_NAME'
NEPTUNE_API_TOKEN = 'NEPTUNE_API_TOKEN'

if NEPTUNE_USER_NAME == 'NEPTUNE_USER_NAME' or NEPTUNE_API_TOKEN == 'NEPTUNE_API_TOKEN': 
    raise ValueError("‚ùå NEED TO CHANGE USERNAME AND/OR API TOKEN")

# set parameters for initializing Neptune
PROJECT_NAME = f"{NEPTUNE_USER_NAME}/ArizeIntegration"
run = neptune.init(api_token=NEPTUNE_API_TOKEN, project=PROJECT_NAME)

print('Step 1.1 ‚úÖ: Initialize Neptune run and project complete!')

You can find more info about the NeptuneCallback in the [TensorFlow / Keras integration](https://docs.neptune.ai/integrations-and-supported-tools/model-training/tensorflow-keras) docs.

## Step 1.2: Set-up Arize Client
To set up Arize, copy the Arize `API_KEY` and `SPACE_KEY` from your admin page linked below!



In [None]:
!pip install arize -q
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

SPACE_KEY = 'SPACE_KEY'
API_KEY = 'API_KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

model_id = 'neptune_cancer_prediction_model'
model_version = 'v1'
model_type = ModelTypes.BINARY

if SPACE_KEY == 'SPACE_KEY' or API_KEY == 'API_KEY': 
    raise ValueError("‚ùå NEED TO CHANGE SPACE AND/OR API_KEY")
else: 
    print("Step 1.2 ‚úÖ: Initialize Arize client complete!")

# Step 2: Logging training callbacks to Neptune

Neptune tracks your model training callbacks, allowing training loss curves to be logged and visualized for each different training iterations. In this example, we will be working with a `tensorflow.keras` model to build a model for classifying whether an individual has breast cancer or not.

## Step 2.1: Import Dataset

In [None]:
import numpy as np
import pandas as pd
import uuid
import os
import concurrent.futures as cf
from sklearn import datasets, preprocessing
from sklearn.model_selection import train_test_split
import datetime

def process_data(X, y):
    scaler = preprocessing.MinMaxScaler()
    X = np.array(X).reshape((len(X), 30))
    y = np.array(y)
    return X, y

# 1 Load data and split data
data = datasets.load_breast_cancer()

X, y = datasets.load_breast_cancer(return_X_y=True)
X, y = X.astype(np.float32), y

X, y = pd.DataFrame(X, columns=data['feature_names']), pd.Series(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, random_state=42)

print('Step 2.1 ‚úÖ: Load Data Done!')

## Step 2.2 Logging Training Callbacks
By passing `run` instance, a live training curve should show up on Neptune under the **Charts** tab.

In [None]:
import tensorflow.keras as keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
import tensorflow as tf

# Step 1: Define and compile model
model = Sequential()
model.add(Dense(10, activation='sigmoid', input_shape=((30,))))
model.add(Dropout(0.25))
model.add(Dense(20, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(10, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer=keras.optimizers.Adam(), 
              loss=keras.losses.mean_squared_logarithmic_error)

# Step 2: Fit model and log callbacks

params = {'batch_size': 30,
          'epochs': 50,
          'verbose': 0,
         }

callbacked = model.fit(X_train, y_train, 
                batch_size=params['batch_size'], 
                epochs=params['epochs'], 
                verbose=params['verbose'], 
                validation_data=(X_test, y_test),
                # log to Neptune using NeptuneCallback
                callbacks=[NeptuneCallback(run=run)]
                )

print('Step 2.2 ‚úÖ: Training callbacks successfully logged!')

# Step 3: Logging training and validation records to Arize
Arize allows you to log training and validation records to an **Evaluation Store** for model pre-launch validation, such as visualizing performance across different feature slices (i.e, model accuracy for lower income individuals v.s higher). 

The records you send in can also serve as your model baseline, which can be compared against the features your models predict on in production to inform you when the distributions of the features have shifted. You can click here to access the documentation for our Python SDK. This section uses `arize.pandas.log()`. You can check the documentations by clicking the button below.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

In [None]:
# OPTIONAL: A quick helper function to validate Arize responses
def arize_responses_helper(responses):
    for response in cf.as_completed(responses):
        res = response.result()
        if res.status_code != 200:
            raise ValueError(f'future failed with response code {res.status_code}, {res.text}')

def generate_prediction_ids(X):
    return pd.Series((str(uuid.uuid4()) for _ in range(len(X))), index=X.index)

## Step 3.1: Logging Training Records to Arize

In [None]:
# Use the model to generate predictions
y_train_pred = model.predict(X_train).T[0]
y_val_pred = model.predict(X_val).T[0]
y_test_pred = model.predict(X_test).T[0]

# Defining a Schema() for training environment
train_data = X_train.copy()
feature_column_names = train_data.columns
train_data['prediction_ids'] = generate_prediction_ids(train_data)
train_data['predictions'] = y_train_pred
train_data['actuals'] = y_train

train_schema = Schema(
    feature_column_names=feature_column_names,
    prediction_id_column_name="prediction_ids",
    prediction_label_column_name="predictions",
    actual_label_column_name="actuals",
)

# Logging to Arize platform using arize_client.log
train_response = arize_client.log(
    dataframe=train_data,
    model_id=model_id,
    model_version=model_version,
    batch_id="training",
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.TRAINING,
    schema=train_schema,
)

arize_responses_helper(train_response)
print('Step 3.1 ‚úÖ: If no errors showed up, you have sent Training Inferences!')

## Step 3.2 Logging Validation to Arize

In [None]:
# Defining a Schema() for training environment
val_data = X_val.copy()
val_data['prediction_ids'] = generate_prediction_ids(val_data)
val_data['predictions'] = y_val_pred
val_data['actuals'] = y_val

val_schema = Schema(
    feature_column_names=feature_column_names,
    prediction_id_column_name="prediction_ids",
    prediction_label_column_name="predictions",
    actual_label_column_name="actuals",
)

# Logging to Arize platform using arize_client.log
val_response = arize_client.log(
    dataframe=val_data,
    model_id=model_id,
    model_version=model_version,
    batch_id="baseline",
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.VALIDATION,
    schema=val_schema,
)

arize_responses_helper(val_response)
print('Step 3.2 ‚úÖ: If no errors showed up, you have sent Validation Inferences!')

# Step 4: Storing and Versioning Model Weights with Neptune
Neptune allows you to organize your models in a folder like structure through the `run` instance of each project. For each run, you can log model weights or checkpoints. You can organize different trained iterations using tag `model_version` you used to log training records to Arize for better integration.

You can also easily log `model_id` for better reference information.

**Note: Code for model storing is different for different frameworks. The following is only applicable for tf.keras**

In [None]:
import glob

# Storing model version 1
directory_name = f'keras_model_{model_version}'
model.save(directory_name)

run[f'{directory_name}/saved_model.pb'].upload(f'{directory_name}/saved_model.pb')
for name in glob.glob(f'{directory_name}/variables/*'):
    run[name].upload(name)

# Log 'model_id', for better reference
run['model_id'] = model_id

print('Step 4 ‚úÖ: If no errors showed up, can should now see the folders in your Neptune Project')

# Step 5: Logging and versioning model in production with Arize
During production, you can use `arize.bulk_log` or `arize.log` in the Python SDK to log any data in your model serving endpoint. In this example, we send in our test data simulating production setting. But in production, you would deploy the models saved by Neptune prior to logging to Arize!

You can find more about `arize.bulk_log` here.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://arize.gitbook.io/arize/apis/python-sdk-1/arize.bulk_log)

In [None]:
# Defining a Schema() for training environment
prod_data = X_test.copy()
prod_data['prediction_ids'] = generate_prediction_ids(prod_data)
prod_data['predictions'] = y_test_pred
prod_data['actuals'] = y_test

prod_schema = Schema(
    feature_column_names=feature_column_names,
    prediction_id_column_name="prediction_ids",
    prediction_label_column_name="predictions",
    actual_label_column_name="actuals",
)

# Logging to Arize platform using arize_client.log
prod_response = arize_client.log(
    dataframe=prod_data,
    model_id=model_id,
    model_version=model_version,
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
    schema=prod_schema,
)

arize_responses_helper(prod_response)
print('Step 5 ‚úÖ: If no errors appear, you just logged predictions, features, and actuals to Arize!')