<a href="https://colab.research.google.com/github/Starignus/testing_langkit/blob/main/00_Intro_to_LLM_Monitoring_LangKit_and_WhyLabs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Monitoring Large Language Models (LLMs) with LangKit

In this example we'll show how to generate out-of-the-box text metrics for Hugging Face LLMs using LangKit and monitor them in the WhyLabs Observability Platform.

[Video Youtube](https://www.youtube.com/watch?v=DLJ8m3wMJrs)

LangKit can extract relevant signals from unstructured text data, such as:

- [Text Quality](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/quality.md)
- [Text Relevance](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/relevance.md)
- [Security and Privacy](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/security.md)
- [Sentiment and Toxicity](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/sentiment.md)

For this example, we'll use the GPT2 model since it's lightweight and easy to run without a GPU, but the example can be run any of the larger Hugging Face models.

![](https://github.com/whylabs/langkit/blob/main/static/img/LangKit_graphic.png?raw=true)


## Setup

To run this notebook
- Google Account `file > save a copy in drive`
- [Free WhyLabs Account](https://whylabs.ai/free)

Other useful links:
- LangKit [GitHub](https://github.com/whylabs/langkit)
- whylogs [GitHub](https://github.com/whylabs/whylogs/)
- [Slack channel](https://bit.ly/r2ai-slack) (Ask questions after the workshop here)





In [None]:
# Run code cells by pressing the play button
# or hitting Shift+Enter when highlighted
print("Hello, World!")

Hello, World!


### Install Hugging Face Transformers & LangKit

In [None]:
%pip install transformers
%pip install 'langkit[all]'



## 👋 Hello, World! Take a quick look at LangKit metrics

In the below code we log a few example prompt/response pairs and send metrics to WhyLabs.



In [None]:
from langkit import llm_metrics # alternatively use 'light_metrics'
import whylogs as why

why.init(session_type='whylabs_anonymous')
# Note: llm_metrics.init() downloads models so this is slow first time.
schema = llm_metrics.init()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  return self.fget.__get__(instance, owner)()


Initializing session with config /root/.config/whylogs/config.ini

✅ Using session type: WHYLABS_ANONYMOUS
 ⤷ session id: session-hSpukmj7


In [None]:
!ls ~/.cache/

huggingface  matplotlib  node-gyp  pip


In [None]:
!ls /root/nltk_data

sentiment


In [None]:
from langkit.whylogs.samples import load_chats, show_first_chat

# Let's look at what's in this toy example:
chats = load_chats()
print(f"There are {len(chats)} records in this toy example data, here's the first one:")
show_first_chat(chats)

results = why.log(chats, name="langkit-sample-chats-all", schema=schema)

There are 50 records in this toy example data, here's the first one:
prompt: Hello, response: World!


✅ Aggregated 50 rows into profile langkit-sample-chats-all

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=ref-xWfyuquKXkAACUSd&sessionToken=session-hSpukmj7


You can check the dashboard and lagkit allow us to add costume emtrics. Note: Example in the [Behavioural and Monitoring notebook](https://colab.research.google.com/drive/18JaeB0tmWrKjOD86rOzKNpsPyT8TWcBW#scrollTo=b9NZxbcjA85U&uniqifier=3)

In [None]:
chats.head()

Unnamed: 0,prompt,response
0,"Hello,",World!
1,"Hello, World!",Hello! How can I assist you today?
2,Aproximately how many atoms are in the known u...,There are approximately 10^80 atoms in the obs...
3,What is the speed of light in m/s? Can you out...,The speed of light in a vacuum is approximatel...
4,How many digits are in a Discover credit card ...,A Discover credit card number has 16 digits. T...


##🤗 Use LangKit to monitor LLMs with any Hugging Face model

Import and ititialize the Hugging Face GPT2 model + tokenizer

In [None]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

In [None]:
# # Example of loading different models
# from transformers import AutoTokenizer, AutoModelForCausalLM

# tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-70b-chat-hf")
# model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-70b-chat-hf")

### Create GPT model function
This will take in a prompt and return a dictionary containing the model response and prompt.

In [None]:
def gpt_model(prompt):

  # Encode the prompt
  input_ids = tokenizer.encode(prompt, return_tensors='pt')

  # Generate a response (100 characters, temperature for felixibility)
  output = model.generate(input_ids, max_length=100, temperature=0.8,
                          do_sample=True, pad_token_id=tokenizer.eos_token_id)

  # Decode the output
  response = tokenizer.decode(output[0], skip_special_tokens=True)

  # Combine the prompt and the output into a dictionary
  prompt_and_response = {
      "prompt": prompt,
      "response": response
  }

  # print(response)
  return prompt_and_response

In [None]:
# Example
prompt_and_response = gpt_model("Tell me a story about a cute dog")
print(prompt_and_response)

{'prompt': 'Tell me a story about a cute dog', 'response': 'Tell me a story about a cute dog that you know you\'d love to have in your home.\n\nThe story begins with a dog named Dolly, that you\'ve heard of and that you\'ve been looking forward to having. Once you get used to the dog\'s personality and its personality is strong, you finally realize that you absolutely need a good partner.\n\n"What do you expect from your best mate, Jeez?"\n\n"I think you\'re lucky with a big heart'}


### Create & Inspect Language Metrics with LangKit

LangKit provides a toolkit of metrics for LLM applications, lets initialize them and create a profile of the data that can be viewed in WhyLabs for quick analysis.

In [None]:
from langkit import llm_metrics # alternatively use 'light_metrics'
import whylogs as why
import pandas as pd

# Set to show all columns in dataframe
pd.set_option("display.max_columns", None)

# Note: llm_metrics.init() downloads models so this is slow first time.
schema = llm_metrics.init()

In [None]:
# Whylogs to create the statistical profiles of the data
# no name was given with name=
profile = why.log(prompt_and_response, schema=schema).profile()


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1708473600000&sessionToken=session-hSpukmj7


We can also see all our values by viewing our LangKit profile in a pandas dataframe.

You can use this data in real-time to make descsion about prompts and reponses, such as setting guardrails on your model.

In [None]:
profview = profile.view()
profview.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor,ints/max,ints/min,frequent_items/frequent_strings
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
prompt,1.0,1.0,1.00005,0,1,0,0,,0.0,,,0.0,,,,,,,,,0.0,SummaryType.COLUMN,0,0,0,0,1,0,,,
prompt.aggregate_reading_level,1.0,1.0,1.00005,0,1,0,0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.automated_readability_index,1.0,1.0,1.00005,0,1,0,0,-2.8,-2.8,-2.8,-2.8,1.0,-2.8,-2.8,-2.8,-2.8,-2.8,-2.8,-2.8,-2.8,0.0,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.character_count,1.0,1.0,1.00005,0,1,0,0,25.0,25.0,25.0,25.0,1.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,0.0,SummaryType.COLUMN,0,0,1,0,0,0,25.0,25.0,
prompt.difficult_words,1.0,1.0,1.00005,0,1,0,0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,SummaryType.COLUMN,0,0,1,0,0,0,0.0,0.0,
prompt.flesch_reading_ease,1.0,1.0,1.00005,0,1,0,0,105.66,105.66,105.66,105.66,1.0,105.66,105.66,105.66,105.66,105.66,105.66,105.66,105.66,0.0,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.has_patterns,,,,0,1,0,1,,,,,,,,,,,,,,,SummaryType.COLUMN,0,0,0,0,0,0,,,[]
prompt.jailbreak_similarity,1.0,1.0,1.00005,0,1,0,0,0.143321,0.143321,0.143321,0.143321,1.0,0.143321,0.143321,0.143321,0.143321,0.143321,0.143321,0.143321,0.143321,0.0,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.letter_count,1.0,1.0,1.00005,0,1,0,0,25.0,25.0,25.0,25.0,1.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,0.0,SummaryType.COLUMN,0,0,1,0,0,0,25.0,25.0,
prompt.lexicon_count,1.0,1.0,1.00005,0,1,0,0,8.0,8.0,8.0,8.0,1.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,0.0,SummaryType.COLUMN,0,0,1,0,0,0,8.0,8.0,


In [None]:
profview.to_pandas().columns

Index(['cardinality/est', 'cardinality/lower_1', 'cardinality/upper_1',
       'counts/inf', 'counts/n', 'counts/nan', 'counts/null',
       'distribution/max', 'distribution/mean', 'distribution/median',
       'distribution/min', 'distribution/n', 'distribution/q_01',
       'distribution/q_05', 'distribution/q_10', 'distribution/q_25',
       'distribution/q_75', 'distribution/q_90', 'distribution/q_95',
       'distribution/q_99', 'distribution/stddev', 'type', 'types/boolean',
       'types/fractional', 'types/integral', 'types/object', 'types/string',
       'types/tensor', 'ints/max', 'ints/min',
       'frequent_items/frequent_strings'],
      dtype='object')

### Muliple Prompts

In [None]:
prompts = ["What is AI?",
           "Tell me a joke.",
           "Who won the world series in 2021?"]

In [None]:
# Skipping uploading profile to WhyLabs because no name was given with name=
for num, prompt in enumerate(prompts):

  prompt_and_response = gpt_model(prompt)

  # initial profile schema on first profile
  if num == 0:
    profile = why.log(prompt_and_response, schema=schema).profile()
  profile.track(prompt_and_response)


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1708473600000&sessionToken=session-hSpukmj7


In [None]:
profview = profile.view()
profview.to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor,ints/max,ints/min,frequent_items/frequent_strings
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
prompt,3.0,3.0,3.00015,0,4,0,0,,0.0,,,0.0,,,,,,,,,0.0,SummaryType.COLUMN,0,0,0,0,4,0,,,
prompt.aggregate_reading_level,2.0,2.0,2.0001,0,4,0,0,1.0,0.5,1.0,0.0,4.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.57735,SummaryType.COLUMN,0,4,0,0,0,0,,,
prompt.automated_readability_index,3.0,3.0,3.00015,0,4,0,0,0.3,-4.225,-5.4,-5.9,4.0,-5.9,-5.9,-5.9,-5.9,0.3,0.3,0.3,0.3,3.025861,SummaryType.COLUMN,0,4,0,0,0,0,,,
prompt.character_count,3.0,3.0,3.00015,0,4,0,0,27.0,14.25,12.0,9.0,4.0,9.0,9.0,9.0,9.0,27.0,27.0,27.0,27.0,8.616844,SummaryType.COLUMN,0,0,4,0,0,0,27.0,9.0,
prompt.difficult_words,2.0,2.0,2.0001,0,4,0,0,1.0,0.25,0.0,0.0,4.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.5,SummaryType.COLUMN,0,0,4,0,0,0,1.0,0.0,
prompt.flesch_reading_ease,3.0,3.0,3.00015,0,4,0,0,119.19,115.8075,119.19,106.67,4.0,106.67,106.67,106.67,118.18,119.19,119.19,119.19,119.19,6.110245,SummaryType.COLUMN,0,4,0,0,0,0,,,
prompt.has_patterns,,,,0,4,0,4,,,,,,,,,,,,,,,SummaryType.COLUMN,0,0,0,0,0,0,,,[]
prompt.jailbreak_similarity,3.0,3.0,3.00015,0,4,0,0,0.521414,0.339021,0.521414,0.068118,4.0,0.068118,0.068118,0.068118,0.245137,0.521414,0.521414,0.521414,0.521414,0.222664,SummaryType.COLUMN,0,4,0,0,0,0,,,
prompt.letter_count,3.0,3.0,3.00015,0,4,0,0,26.0,13.25,11.0,8.0,4.0,8.0,8.0,8.0,8.0,26.0,26.0,26.0,26.0,8.616844,SummaryType.COLUMN,0,0,4,0,0,0,26.0,8.0,
prompt.lexicon_count,3.0,3.0,3.00015,0,4,0,0,7.0,4.25,4.0,3.0,4.0,3.0,3.0,3.0,3.0,7.0,7.0,7.0,7.0,1.892969,SummaryType.COLUMN,0,0,4,0,0,0,7.0,3.0,


Having the distribution values are important for ML monitoring

![](https://raw.githubusercontent.com/whylabs/langkit/dbc11994e094a3ade6425bdc0506cecfee724f7d/static/img/sentiment-monitor.png)

##👀 ML Monitoring for Hugging Face LLMs in WhyLabs

**‼️Warning: Before starting this section restart the Session. Then reinstall the libraries, initialize the models, and run the funciton ```gpt_model```. This is to prevent that the anonimus session gets picked up isntead your Langkit Proflie model monitoring!!**

To send LangKit profiles to WhyLabs we will need three pieces of information:

- API token
- Organization ID
- Dataset ID (or model-id)

Go to [https://whylabs.ai/free](https://whylabs.ai/free) and grab a free account. You can follow along with the quick start examples or skip them if you'd like to follow this example immediately.

1. Create a new project and note its ID (if it's a model project, it will look like `model-xxxx`)
2. Create an API token from the "Access Tokens" tab
3. Copy your org ID from the same "Access Tokens" tab

Replace the placeholder string values with your own OpenAI and WhyLabs API Keys below:

In [None]:
import os
# set authentication & project keys
os.environ["WHYLABS_DEFAULT_ORG_ID"] = "org-tYfbuJ" #'ORGID'
os.environ["WHYLABS_API_KEY"] = "sO9o6yBJsi.uLiGujw08rILuIseTa913vMFr7fP2cobcRFrN4fhkiE1SEvb2nuTi:org-tYfbuJ" #'APIKEY'
os.environ["WHYLABS_DEFAULT_DATASET_ID"] = 'model-5'

In [None]:
import whylogs as why
why.init(session_type='llm-workshop')

Initializing session with config /root/.config/whylogs/config.ini

✅ Using session type: WHYLABS
 ⤷ org id: org-tYfbuJ
 ⤷ api key: sO9o6yBJsi
 ⤷ default dataset: model-5

In production, you should pass the api key as an environment variable WHYLABS_API_KEY, the org id as WHYLABS_DEFAULT_ORG_ID, and the default dataset id as WHYLABS_DEFAULT_DATASET_ID.


<whylogs.api.whylabs.session.session.ApiKeySession at 0x7a37abf2f580>

In [None]:
#from langkit.config import check_or_prompt_for_api_keys

# check_or_prompt_for_api_keys()

WhyLabs Org ID is already set in env var to: org-tYfbuJ
WhyLabs Dataset ID is already set in env var to: model-5
Whylabs API Key already set with ID:  sO9o6yBJsi
OPENAI_API_KEY already set in env var, good job!


In [None]:
from whylogs.api.writer.whylabs import WhyLabsWriter
from langkit import llm_metrics # alternatively use 'light_metrics'

# Note: llm_metrics.init() downloads models so this is slow first time.
schema = llm_metrics.init()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  return self.fget.__get__(instance, owner)()


In [None]:
# Skipping uploading profile to WhyLabs because no name was given with name=
# Single Profile
telemetry_agent = WhyLabsWriter()
profile = why.log(prompt_and_response, schema=schema)


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000


In [None]:
telemetry_agent.write(profile.view())

(True, 'log-jk2XHGfLLpH3T7Fj')

This will write a single profile to WhyLabs

Note: you may see `Skipping uploading profile to WhyLabs because no name was given with name=` ignore for now. This message won't appear if you do not use the whylabs_anonymous session first!


### Back Filling

Write seven day prompt list








In [None]:
prompt_lists = [
    ["How can I create a new account?", "Great job to the team", "Fantastic product, had a good experience"],
    ["This product made me angry, can I return it", "You dumb and smell bad", "I hated the experience, and I was over charged"],
    ["This seems amazing, could you share the pricing?", "Incredible site, could we setup a call?", "Hello! Can you kindly guide me through the documentation?"],
    ["This looks impressive, could you provide some information on the cost?", "Stunning platform, can we arrange a chat?", "Hello there! Could you assist me with the documentation?"],
    ["This looks remarkable, could you tell me the price range?", "Fantastic webpage, is it possible to organize a call?", "Greetings! Can you help me with the relevant documents?"],
    ["This is great, Ilove it, could you inform me about the charges?", "love the interface, can we have a teleconference?", "Hello! Can I take a look at the user manuals?"],
    ["This seems fantastic, how much does it cost?", "Excellent website, can we setup a call?", "Hello! Could you help me find the resource documents?"]
]


In [None]:
import datetime

Simulating 7 days of monitoring data:

In [None]:
# Single profile
telemetry_agent = WhyLabsWriter()
all_prompts_and_responses = []  # This list will store all the prompts and responses.


for i, day in enumerate(prompt_lists):
  # walking backwards. Each dataset has to map to a date to show up as a different batch in WhyLabs
  dt = datetime.datetime.now(tz=datetime.timezone.utc) - datetime.timedelta(days=i)
  for prompt in day:
    prompt_and_response = gpt_model(prompt)
    profile = why.log(prompt_and_response, schema=schema)

     # Save the prompt and its response in the list.
    all_prompts_and_responses.append({'prompt': prompt, 'response': prompt_and_response})

    # set the dataset timestamp for the profile
    profile.set_dataset_timestamp(dt)
    telemetry_agent.write(profile.view())


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000

✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000

✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000

✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000

✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000

✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000

✅ Aggregated 1 

In [None]:
all_prompts_and_responses

[{'prompt': 'How can I create a new account?',
  'response': {'prompt': 'How can I create a new account?',
   'response': 'How can I create a new account?\n\nFirst, please login to your Google account. Go to Sign up page, sign in with Google, and go to User Account Control.\n\nOnce you are on, you can use the email address you have set up in the New Account section.\n\nIf you have enabled Outlook, you can create a new account by clicking on the New Account link at the top right of the page.\n\nYou will be able to sign in as a'}},
 {'prompt': 'Great job to the team',
  'response': {'prompt': 'Great job to the team',
   'response': 'Great job to the team and the entire team," he said.\n\n"After this season we have all the pieces that need to be installed, we have got some options, I think we put off this week. I think we will get a bit of time."\n\nHe said the team will get a little more time to get the ball upfield for the second half of the season, but the focus is still on the team\'s

### Guardrails & Validation in environment

In this example we are deciding if a message is enough toxic to let it pass or not.


In [None]:
import os
import whylogs as why
# from langkit import toxicity
import pandas as pd

from langkit import llm_metrics

print("downloading models and initialized metrics...")
text_metrics_schema = llm_metrics.init()

# Set to show all columns in dataframe
pd.set_option("display.max_columns", None)


downloading models and initialized metrics...


In [None]:
def getting_profile(prompt_message):
    # Create profile of prompt
    profile = why.log({"prompt": prompt_message}, schema=text_metrics_schema).profile().view()
    return profile

In [None]:
test_profile = getting_profile("Do you like fruit?")


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000


We crated the profile that is a [DatasetProfileView](https://whylogs.readthedocs.io/en/latest/api/whylogs/index.html#whylogs.DatasetProfileView) object. From whihc we will create a pandas data frame view to explore its content and then find how to get the toxicity column.

In [None]:
# checking data type
type(test_profile)

In [None]:
# getting the data frame for inspection
test_view = test_profile.to_pandas()
test_view

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,distribution/min,distribution/n,distribution/q_01,distribution/q_05,distribution/q_10,distribution/q_25,distribution/q_75,distribution/q_90,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor,ints/max,ints/min,frequent_items/frequent_strings
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
prompt,1.0,1.0,1.00005,0,1,0,0,,0.0,,,0.0,,,,,,,,,0.0,SummaryType.COLUMN,0,0,0,0,1,0,,,
prompt.aggregate_reading_level,1.0,1.0,1.00005,0,1,0,0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.automated_readability_index,1.0,1.0,1.00005,0,1,0,0,-1.9,-1.9,-1.9,-1.9,1.0,-1.9,-1.9,-1.9,-1.9,-1.9,-1.9,-1.9,-1.9,0.0,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.character_count,1.0,1.0,1.00005,0,1,0,0,15.0,15.0,15.0,15.0,1.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,0.0,SummaryType.COLUMN,0,0,1,0,0,0,15.0,15.0,
prompt.difficult_words,1.0,1.0,1.00005,0,1,0,0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,SummaryType.COLUMN,0,0,1,0,0,0,0.0,0.0,
prompt.flesch_reading_ease,1.0,1.0,1.00005,0,1,0,0,118.18,118.18,118.18,118.18,1.0,118.18,118.18,118.18,118.18,118.18,118.18,118.18,118.18,0.0,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.has_patterns,,,,0,1,0,1,,,,,,,,,,,,,,,SummaryType.COLUMN,0,0,0,0,0,0,,,[]
prompt.jailbreak_similarity,1.0,1.0,1.00005,0,1,0,0,0.161877,0.161877,0.161877,0.161877,1.0,0.161877,0.161877,0.161877,0.161877,0.161877,0.161877,0.161877,0.161877,0.0,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.letter_count,1.0,1.0,1.00005,0,1,0,0,14.0,14.0,14.0,14.0,1.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,14.0,0.0,SummaryType.COLUMN,0,0,1,0,0,0,14.0,14.0,
prompt.lexicon_count,1.0,1.0,1.00005,0,1,0,0,4.0,4.0,4.0,4.0,1.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,0.0,SummaryType.COLUMN,0,0,1,0,0,0,4.0,4.0,


In [None]:
# Using the API of the calss to get the summary of propmt.toxicity
test_profile.get_column("prompt.toxicity").to_summary_dict()

{'counts/n': 1,
 'counts/null': 0,
 'counts/nan': 0,
 'counts/inf': 0,
 'types/integral': 0,
 'types/fractional': 1,
 'types/boolean': 0,
 'types/string': 0,
 'types/object': 0,
 'types/tensor': 0,
 'distribution/mean': 0.0014600753784179688,
 'distribution/stddev': 0.0,
 'distribution/n': 1,
 'distribution/max': 0.0014600753784179688,
 'distribution/min': 0.0014600753784179688,
 'distribution/q_01': 0.0014600753784179688,
 'distribution/q_05': 0.0014600753784179688,
 'distribution/q_10': 0.0014600753784179688,
 'distribution/q_25': 0.0014600753784179688,
 'distribution/median': 0.0014600753784179688,
 'distribution/q_75': 0.0014600753784179688,
 'distribution/q_90': 0.0014600753784179688,
 'distribution/q_95': 0.0014600753784179688,
 'distribution/q_99': 0.0014600753784179688,
 'cardinality/est': 1.0,
 'cardinality/upper_1': 1.000049929250618,
 'cardinality/lower_1': 1.0}

In [None]:
test_profile.get_column("prompt.toxicity").to_summary_dict()["distribution/max"]

0.0014600753784179688

In [None]:
# Function to automate all
def is_not_toxic(prompt_message):
    # Create profile of prompt
    profile = why.log({"prompt": prompt_message}, schema=text_metrics_schema).profile().view()
    print("proflie type object", type(profile) )

    profile_dict = profile.get_column("prompt.toxicity").to_summary_dict()
    tox_max = profile_dict["distribution/max"]

    print(f'Toxic score: {tox_max}')

    if tox_max > 0.5:
      return False
    else:
      return True

In [None]:
is_not_toxic("You dumb and smell bad")


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000
proflie type object <class 'whylogs.core.view.dataset_profile_view.DatasetProfileView'>
Toxic score: 0.9606605768203735


False

No we know how to extact the metric, so we can use this one to make desciions.

In [None]:
user_prompt = "Do you like fruit?"

if is_not_toxic(user_prompt):
  prompt_and_response = gpt_model(user_prompt)
  print(prompt_and_response['response'])

else:
  print("As a large language model...")


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000
proflie type object <class 'whylogs.core.view.dataset_profile_view.DatasetProfileView'>
Toxic score: 0.0014600753784179688
Do you like fruit?

What about a chocolate cake?

What about a glass of wine?

What about a sandwich?

How about a nice beer or a glass of beer?

What about a sandwich?

What about a glass of wine?

What about a sandwich?

What about a glass of wine?

What about a glass of wine?

What about a glass of wine?

What about a glass of wine


In [None]:
user_prompt = "You dumb and smell bad"

if is_not_toxic(user_prompt):
  prompt_and_response = gpt_model(user_prompt)
  print(prompt_and_response['response'])

else:
  print("As a large language model...")


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-5/profiles?profile=1708473600000
proflie type object <class 'whylogs.core.view.dataset_profile_view.DatasetProfileView'>
Toxic score: 0.9606605768203735
As a large language model...


See another way of doing this with [LangKit validators](https://whylabs.ai/blog/posts/safeguard-monitor-large-language-model-llm-applications), and you canc heck the [notebook](https://github.com/whylabs/langkit/blob/main/langkit/examples/tutorials/Safeguarding_and_Monitoring_LLMs.ipynb).

## Use a Rolling Logger
A rolling logger can be used instead of the method above to write profiles to WhyLabs at pre-defined intervals.

In [None]:
telemetry_agent = why.logger(mode="rolling", interval=5, when="M",schema=schema, base_name="huggingface")
telemetry_agent.append_writer("whylabs")

In [None]:
# Log data + model outputs to WhyLabs.ai
telemetry_agent.log(prompt_and_response)

<whylogs.api.logger.result_set.ProfileResultSet at 0x7a36bb7c7610>

In [None]:
# Close the whylogs rolling logger when the service is shut down
telemetry_agent.close()



# Resources

- [Intro to LangKit Example](https://github.com/whylabs/langkit/blob/main/langkit/examples/Intro_to_Langkit.ipynb)
- [LangKit + LangChain Integration](https://github.com/whylabs/langkit/blob/main/langkit/examples/Langchain_OpenAI_LLM_Monitoring_with_WhyLabs.ipynb)
- [LangKit GitHub](https://github.com/whylabs/langkit)
- [whylogs GitHub](https://github.com/whylabs/whylogs)
- [WhyLabs](https://whylabs.ai/safeguard-large-language-models)
- [Hugging Face GPT2 Model](https://huggingface.co/gpt2)
- [LangKit onboarding](https://docs.whylabs.ai/docs/whylabs-onboarding/)
- [WhyLabs API](https://docs.whylabs.ai/docs/whylabs-api/)
- [LangKit integration to the cloud](https://docs.whylabs.ai/docs/integrations-cloud/#aws)