In [4]:
pip install -U "whylogs[whylabs]>1.0.9"

Defaulting to user installation because normal site-packages is not writeable
Collecting whylabs-client<0.4.0,>=0.3.0
  Using cached whylabs_client-0.3.0-py3-none-any.whl (183 kB)
Installing collected packages: whylabs-client
Successfully installed whylabs-client-0.3.0
Note: you may need to restart the kernel to use updated packages.


## ✔️ Setting the Environment Variables

In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.

We will need three pieces of information:

- API token
- Organization ID
- Dataset ID (or model-id)

Go to https://whylabs.ai/free and grab a free account. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.

After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`).

We'll now set the credentials as environment variables. The WhyLabs Writer will check for the existence of these variables in order to send the profiles to your dashboard.

In [7]:
import getpass
import os

# set your org-id here - should be something like "org-xxxx"
print("Enter your WhyLabs Org ID") 
os.environ["WHYLABS_DEFAULT_ORG_ID"] = input()

# set your datased_id (or model_id) here - should be something like "model-xxxx"
print("Enter your WhyLabs Dataset ID")
os.environ["WHYLABS_DEFAULT_DATASET_ID"] = input()


# set your API key here
print("Enter your WhyLabs API key")
os.environ["WHYLABS_API_KEY"] = getpass.getpass()
print("Using API Key ID: ", os.environ["WHYLABS_API_KEY"][0:10])

Enter your WhyLabs Org ID
Enter your WhyLabs Dataset ID
Enter your WhyLabs API key
Using API Key ID:  pUtqnO0hhC


## Fetching the Data

For demonstration, let's use data for transactions from a small retail business:

In [2]:
import pandas as pd

csv_url = "https://whylabs-public.s3.us-west-2.amazonaws.com/datasets/tour/current.csv"
df = pd.read_csv(csv_url)

df.head()

Unnamed: 0,Transaction ID,Customer ID,Quantity,Item Price,Total Tax,Total Amount,Store Type,Product Category,Product Subcategory,Gender,Transaction Type,Age
0,T14259136777,C274477,1,148.9,15.6345,164.5345,TeleShop,Electronics,Audio and video,F,Purchase,37.0
1,T7313351894,C267568,4,48.1,20.202,212.602,Flagship store,Home and kitchen,Furnishing,M,Purchase,25.0
2,T37745642681,C267098,1,10.9,1.1445,12.0445,Flagship store,Footwear,Mens,F,Purchase,42.0
3,T13861409908,C271608,2,135.2,28.392,298.792,MBR,Footwear,Mens,F,Purchase,43.0
4,T58956348529,C272484,4,144.3,60.606,637.806,TeleShop,Clothing,Mens,F,Purchase,39.0


## 📊 Profiling the Data

Let's profile the data with whylogs to create a profile to use as a reference:

In [3]:
import whylogs as why
from datetime import datetime

print(why.__version__)

current_date = datetime.now()
reference_profile = why.log(df).profile()
reference_profile.set_dataset_timestamp(current_date)

No timezone set in the datetime_timestamp object. Default to local timezone


1.0.10


The reference profile can be uploaded using a whylabs_client directly. First, we need to reference the profile as a file on disk, so write it out.

In [6]:
import tempfile

# write out the profile we just 
tmp_dir = tempfile.mkdtemp()
profile_path = os.path.join(tmp_dir, "reference-profile.bin")
reference_profile.view().write(profile_path)
print(f"Reference profile written to temporary file in preparation to upload to Whylabs as a reference profile: {profile_path}")

Reference profile written to temporary file in preparation to upload to Whylabs as a reference profile: /tmp/tmpzf8lvtj6/reference-profile.bin


In [10]:
import requests
import whylabs_client
from whylabs_client.api.log_api import LogApi
from whylabs_client.model.log_reference_request import LogReferenceRequest

# Now setup some of the inputs required to make the request to upload to Whylabs using the whylabs_client
whylabs_api_endpoint = "https://songbird.development.whylabsdev.com" # "https://api.whylabsapp.com"
reference_profile_alias = "demo-reference-profile-in-v1"
api_key = os.environ["WHYLABS_API_KEY"]
print(f"Using API key ID: {api_key[:10]} and endpoint {whylabs_api_endpoint}")
config = whylabs_client.Configuration(host=whylabs_api_endpoint, api_key={"ApiKeyAuth": api_key}, discard_unknown_keys=True)
api_log_client = whylabs_client.ApiClient(config)
log_api = LogApi(api_log_client)

org_id = os.environ.get("WHYLABS_DEFAULT_ORG_ID")
dataset_id = os.environ.get("WHYLABS_DEFAULT_DATASET_ID")
dataset_timestamp = int(reference_profile.dataset_timestamp.timestamp() * 1000)
alias = reference_profile_alias

try:
    with open(profile_path, "rb") as f:
        request = LogReferenceRequest(dataset_timestamp=dataset_timestamp, alias=alias)
        print(f"Making initial call to log_reference to get upload url for {alias} and in [{org_id}] for [{dataset_id}] using request: {request}")
        async_result = log_api.log_reference(org_id=org_id, model_id=dataset_id, log_reference_request=request, async_req=True)
        result = async_result.get()
        upload_url = result["upload_url"]
        print(f"got async_result from log_reference, upload url is: {upload_url[:140]}")
        print(f"About to upload reference profile...")
        http_response = requests.put(upload_url, data=f.read())
        if http_response.ok:
            print(f"Done uploading reference profile with alias: {alias} to: {upload_url[:140]} with API token ID: {api_key[:10]}")
        else:
            print(
                f"Failed to upload reference profile with alias: {alias} to: {upload_url[:140]} with API token ID: {api_key[:10]} to "
                + f"{whylabs_api_endpoint}: unexpected HTTP status {http_response}"
            )
except Exception as e:
    print(f"Failed to upload reference profile: {e}.")

Using API key ID: pUtqnO0hhC and endpoint https://songbird.development.whylabsdev.com
Making initial call to log_reference to get upload url for demo-reference-profile-in-v1 and in [org-e2qTar] for [model-5] using request: {'alias': 'demo-reference-profile-in-v1', 'dataset_timestamp': 1660351684528}
got async_result from log_reference, upload url is: https://development-songbird-20201028054020481800000001.s3.us-west-2.amazonaws.com/reference-profiles/2022-08-13/org-e2qTar-model-5-NnxWQKii
About to upload reference profile...
Done uploading reference profile with alias: demo-reference-profile-in-v1 to: https://development-songbird-20201028054020481800000001.s3.us-west-2.amazonaws.com/reference-profiles/2022-08-13/org-e2qTar-model-5-NnxWQKii with API token ID: pUtqnO0hhC
