# Logging and Monitoring Text Metrics for LLMs with LangKit

<!-- ![](https://github.com/whylabs/langkit/blob/main/static/img/LangKit_graphic.png?raw=true) -->

| NLP | Logging and Monitoring | Text Metrics |

To run this notebook
- Google Account `file > save a copy in drive`
- [Free WhyLabs Account](https://whylabs.ai/free)
- [OpenAI Account](https://openai.com/)

Other useful links:
- LangKit [GitHub](https://github.com/whylabs/langkit)
- whylogs [GitHub](https://github.com/whylabs/whylogs/)
- [Slack channel](https://bit.ly/r2ai-slack) (Ask questions after the workshop here)




In [None]:
print('Hello, World!')

Hello, World!


# 🏗️ Setup

LangKit & whylogs can be installed in almost any Python environment with `pip`!

In [None]:
%pip install langkit[all]
%pip install langchain

Collecting langkit[all]

  Downloading langkit-0.0.31-py3-none-any.whl (1.2 MB)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m


Collecting textstat<0.8.0,>=0.7.3 (from langkit[all])

  Downloading textstat-0.7.3-py3-none-any.whl (105 kB)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.1/105.1 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m

[?25hCollecting whylogs<2.0.0,>=1.3.19 (from langkit[all])

  Downloading whylogs-1.3.28-py3-none-any.whl (1.9 MB)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m

[?25hCollecting datasets<3.0.0,>=2.12.0 (from langkit[all])

  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m

[?25hCollecting detoxify<0.6.0,>=0.5.2 (from langkit[all])

  Downloading detoxify-0

In [None]:
import getpass
import os

os.environ["WHYLABS_DEFAULT_ORG_ID"] = input("WhyLabs default organization ID")
os.environ["WHYLABS_DEFAULT_DATASET_ID"] = input("WhyLabs default dataset ID")
os.environ["WHYLABS_API_KEY"] = getpass.getpass("WhyLabs API key")
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key")

WhyLabs default organization IDorg-dzdtaJ

WhyLabs default dataset IDmodel-25

WhyLabs API key··········

OpenAI API key··········


# 💬 Get Started with LangKit

In [None]:
from langkit import llm_metrics # alternatively use 'light_metrics'
import whylogs as why

# Note: llm_metrics.init() downloads models so this is slow first time.
schema = llm_metrics.init()

In [None]:
why.init()

Initializing session with config /root/.config/whylogs/config.ini



✅ Using session type: WHYLABS

 ⤷ org id: org-dzdtaJ

 ⤷ api key: g1VY3NMpY9

 ⤷ default dataset: model-25



In production, you should pass the api key as an environment variable WHYLABS_API_KEY, the org id as WHYLABS_DEFAULT_ORG_ID, and the default dataset id as WHYLABS_DEFAULT_DATASET_ID.


<whylogs.api.whylabs.session.session.ApiKeySession at 0x7e3fbf65c0d0>

In [None]:
from langkit.whylogs.samples import load_chats, show_first_chat

# Let's look at what's in this toy example:
chats = load_chats()
print(f"There are {len(chats)} records in this toy example data, here's the first one:")
show_first_chat(chats)

results = why.log(chats, name="langkit-sample-chats-all", schema=schema)

There are 50 records in this toy example data, here's the first one:

prompt: Hello, response: World!





✅ Aggregated 50 rows into profile langkit-sample-chats-all



Visualize and explore this profile with one-click

🔍 https://hub.whylabsapp.com/resources/model-25/profiles?profile=ref-yT5doZ3WddBC3SoG


In [None]:
chats

In [None]:
import pandas as pd

# Set to show all columns in dataframe
pd.set_option("display.max_columns", None)

In [None]:
chats

In [None]:
profview = results.view()
profview.to_pandas()

# 💬 Get Started with LangKit, OpenAI, LangChain



In [None]:
# Create a system prompt with LangChain template

from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    HumanMessage,
    SystemMessage
)

template_upbeat = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
                "You are a helpful assistant that re-writes the user's text to"
                "sound more upbeat and happy"
            )
        ),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
)


# iInitialize OpenAI model
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI()


  warn_deprecated(


In [None]:
def gpt_model(prompt, template):

  response = llm(template.format_messages(text=prompt))

  # Combine the prompt and the output into a dictionary
  prompt_and_response = {
      "prompt": prompt,
      "response": response.content
  }

  # print(response)
  return prompt_and_response

In [None]:
prompt_and_response = gpt_model("I don't like Mondays... or Tuesdays, for that matter.", template_upbeat)


  warn_deprecated(


In [None]:
prompt_and_response

{'prompt': "I don't like Mondays... or Tuesdays, for that matter.",
 'response': "I'm not particularly fond of Mondays... or Tuesdays, for that matter. But hey, the week is full of endless possibilities, right?"}

## Using LangKit with our LLM function

In [None]:
from langkit import llm_metrics # alternatively use 'light_metrics'
import whylogs as why


# Note: llm_metrics.init() downloads models so this is slow first time.
schema = llm_metrics.init()

In [None]:
profile = why.log(prompt_and_response, name="positive example", schema=schema).profile()



✅ Aggregated 1 rows into profile positive example



Visualize and explore this profile with one-click

🔍 https://hub.whylabsapp.com/resources/model-25/profiles?profile=ref-JzEtdcuG8IWxj2pZ


In [None]:
profview = profile.view()
profview.to_pandas()

## Add Datasets to Profiles

In [None]:
prompts = ["I don't like getting out of bed",
           "I don't want to go to the gym",
           "I'm terrible at doing something",]

In [None]:
for num, prompt in enumerate(prompts):

  prompt_and_response = gpt_model(prompt, template_upbeat)
  profile.track(prompt_and_response)

In [None]:
profview = profile.view()
profview.to_pandas()

## A deeper look into the profiles

In [None]:
profile_dict = profview.get_column('response.aggregate_reading_level').to_summary_dict()
reading_lvl = profile_dict['distribution/max']

print(f'Response Reading Level: {reading_lvl}')

Response Reading Level: 28.0


# 📈 Monitoring and more with WhyLabs

Now that we can see how to extract metrics

Having the distribution values are important for ML monitoring

![](https://raw.githubusercontent.com/whylabs/langkit/dbc11994e094a3ade6425bdc0506cecfee724f7d/static/img/sentiment-monitor.png)


In [None]:
from whylogs.api.writer.whylabs import WhyLabsWriter
from langkit import llm_metrics # alternatively use 'light_metrics'
import whylogs as why

schema = llm_metrics.init()

In [None]:
# Single Profile
pos_telemetry_agent = WhyLabsWriter()
profile = why.log(prompt_and_response, schema=schema)
pos_telemetry_agent.write(profile.view())



✅ Aggregated 1 rows into profile 



Visualize and explore this profile with one-click

🔍 https://hub.whylabsapp.com/resources/model-25/profiles?profile=1705968000000


(True, 'log-B7TUBL3QYDZMiiVJ')

Look in your WhyLabs project to see the individual profile

## Compare another application

In [None]:
import os

os.environ["WHYLABS_DEFAULT_DATASET_ID"] = input()

KeyboardInterrupt: Interrupted by user

In [None]:
# Make another system prompt template with LangChain
template_sass = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
                "You are a unhelpful assistant that re-writes the user's text to"
                "sound more depressing and negative."
            )
        ),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
)

In [None]:
gpt_model("I like doing difficult things.", template_sass)

{'prompt': 'I like doing difficult things.',
 'response': 'I reluctantly engage in challenging tasks.'}

In [None]:
neg_telemetry_agent = WhyLabsWriter()
profile = why.log(prompt_and_response, schema=schema)
neg_telemetry_agent.write(profile.view())

Used for evaluation, observability, insights, security

# 🦜 ⛓️ Official LangChain Callback Integration
WhyLabs has an official Langchain Integration


In [None]:
from langchain.callbacks import WhyLabsCallbackHandler
from langchain.llms import OpenAI

In [None]:
# Intialize WhyLabs Callback & GPT LLM
whylabs = WhyLabsCallbackHandler.from_params()
llm = OpenAI(temperature=0, callbacks=[whylabs])


  warn_deprecated(


In [None]:
result = llm.generate(
    [
        "I love nature, its beautilful and amazing!",
        "This product is awesome. I really enjoy it.",
        "Chatting with you has been a great experience! you're very helpful."
    ]
)
print(result)
# you don't need to call flush, this will occur periodically(20 mins), but to demo let's not wait.
whylabs.flush()

generations=[[Generation(text="\n\nNature is truly a wonder to behold. From the majestic mountains to the serene oceans, there is so much beauty and diversity in the natural world. The vibrant colors of flowers, the intricate patterns of leaves, and the graceful movements of animals all contribute to the awe-inspiring experience of being in nature.\n\nOne of the things I love most about nature is its ability to bring a sense of peace and tranquility. When I am surrounded by nature, I feel a sense of calmness and connectedness to the world around me. It's a reminder that we are all a part of something much bigger than ourselves.\n\nNature also has a way of teaching us important lessons. The changing of the seasons reminds us of the cycle of life and the importance of adaptation. The resilience of plants and animals in the face of adversity shows us the power of perseverance. And the delicate balance of ecosystems teaches us the importance of coexisting with our environment.\n\nBut perha



# Resources

- [Intro to LangKit Example](https://github.com/whylabs/langkit/blob/main/langkit/examples/Intro_to_Langkit.ipynb)
- [LangKit + LangChain Integration](https://github.com/whylabs/langkit/blob/main/langkit/examples/Langchain_OpenAI_LLM_Monitoring_with_WhyLabs.ipynb)
- [LangKit GitHub](https://github.com/whylabs/langkit)
- [whylogs GitHub](https://github.com/whylabs/whylogs)
- [WhyLabs](https://whylabs.ai/safeguard-large-language-models)
- [Want a promo code?](https://bit.ly/whylabs-expert-plan)








