<a href="https://colab.research.google.com/github/Starignus/testing_langkit/blob/main/01_Trying_LangKit_Sentiment_Toxicity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Investigating LangKit

[LangKit is an open-source](https://github.com/whylabs/langkit/tree/main) text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library [whylogs](https://whylogs.readthedocs.io/en/latest/).

[LangKit can monitor and safeguard](https://whylabs.ai/blog/posts/langkit-making-large-language-models-safe-and-responsible) your LLMs by quickly detecting and preventing malicious prompts, toxicity, hallucinations, and jailbreak attempts. You can check the [metrics that covers](https://github.com/whylabs/langkit/blob/main/langkit/docs/modules.md).

First let's install the required libraries.

In [None]:
!pip install transformers[torch]

Collecting accelerate>=0.20.3 (from transformers[torch])
  Downloading accelerate-0.26.1-py3-none-any.whl (270 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.9/270.9 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate
Successfully installed accelerate-0.26.1


In [None]:
!pip install langkit[all]

Collecting langkit[all]
  Downloading langkit-0.0.29-py3-none-any.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
Collecting textstat<0.8.0,>=0.7.3 (from langkit[all])
  Downloading textstat-0.7.3-py3-none-any.whl (105 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.1/105.1 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting whylogs<2.0.0,>=1.3.19 (from langkit[all])
  Downloading whylogs-1.3.21-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets<3.0.0,>=2.12.0 (from langkit[all])
  Downloading datasets-2.16.1-py3-none-any.whl (507 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate<0.5.0,>=0.4.0 (from langkit[all])
  Downloading evaluate-0.4.1-py3-none-

In [None]:
!pip install huggingface-hub==0.20.2

Collecting huggingface-hub==0.20.2
  Downloading huggingface_hub-0.20.2-py3-none-any.whl (330 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m330.3/330.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface-hub
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.20.3
    Uninstalling huggingface-hub-0.20.3:
      Successfully uninstalled huggingface-hub-0.20.3
Successfully installed huggingface-hub-0.20.2


In [None]:
!pip install bigframes==0.18.0

Collecting bigframes==0.18.0
  Downloading bigframes-0.18.0-py2.py3-none-any.whl (411 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/411.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.6/411.8 kB[0m [31m4.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m411.8/411.8 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: bigframes
  Attempting uninstall: bigframes
    Found existing installation: bigframes 0.19.2
    Uninstalling bigframes-0.19.2:
      Successfully uninstalled bigframes-0.19.2
Successfully installed bigframes-0.18.0


In [None]:
!pip install torch



In [None]:
! pip install datasets==2.16.1



# Tracking Sentiment and Toxicity Scores in Text with [Langkit](https://github.com/whylabs/langkit/blob/main/langkit/examples/Sentiment_and_Toxicity.ipynb)

In this example, we'll show how you can easily track sentiment and toxicity scores in text with Langkit.

As an example, we'll use the [tweet_eval](https://huggingface.co/datasets/tweet_eval) dataset. We'll use the hateful subset of the dataset, which contains tweets labeled as hateful or not hateful.

Lagkit is a wrapper that when importing the sentiment and toxicity module imports NLTK or downloads hugging faces models.

## Some details


For the [sentiment module](https://github.com/whylabs/langkit/blob/main/langkit/sentiment.py)it uses:

* The [sentiment module](https://docs.whylabs.ai/docs/langkit-modules/#sentiment) will compute sentiment scores for each value in every column of type String. It will create a new udf submetric called sentiment_nltk.

* The sentiment_nltk will contain metrics related to the compound sentiment score calculated for each value in the string column. The sentiment score is calculated using nltk's Vader sentiment analyzer. The score ranges from -1 to 1, where -1 is the most negative sentiment and 1 is the most positive sentiment.

* Lexicon-based analysis using the [NLTK VADER sentiment analyzer](https://www.datacamp.com/tutorial/text-analytics-beginners-nltk). It in nvolves using a set of predefined rules and heuristics to determine the sentiment of a piece of text. These rules are typically based on lexical and syntactic features of the text, such as the presence of positive or negative words and phrases.

* VADER has the advantage of assessing the sentiment of any given text without the need for previous training as we might have to for Machine Learning models.

* The result generated is a dictionary of 4 keys **neg, neu, pos, compound** meaning negative, neutral, and positive respectively. Their sum should be equal to 1 or close to it with float operation.

* **Compound** corresponds to the sum of the valence score of each word in the lexicon and determines the degree of the sentiment rather than the actual value as opposed to the previous ones. Its value is between -1 (most extreme negative sentiment) and +1 (most extreme positive sentiment). Using the compound score can be enough to determine the underlying sentiment of a text, because for:
 * a positive sentiment, compound ≥ 0.05
 * a negative sentiment, compound ≤ -0.05
 * a neutral sentiment, the compound is between ]-0.05, 0.05[
on.

* While lexicon-based analysis can be relatively simple to implement and interpret, it may not be as accurate as ML-based or transformed-based approaches, especially when dealing with complex or ambiguous text data.


For the [toxicity module](https://github.com/whylabs/langkit/blob/main/langkit/toxicity.py):

* The [toxicity module](https://docs.whylabs.ai/docs/langkit-modules/#toxicity) will compute toxicity scores for each value in every column of type String. It will create a new udf submetric called toxicity.

* The toxicity will contain metrics related to the toxicity score calculated for each value in the string column. The toxicity score is calculated using HuggingFace's [martin-ha/toxic-comment-model](https://huggingface.co/martin-ha/toxic-comment-model) toxicity analyzer. The score ranges from 0 to 1, where 0 is no toxicity and 1 is maximum toxicity.

* Under the hood, it uses [AutoModelForSequenceClassification](https://stackoverflow.com/questions/69907682/what-are-differences-between-autom) which has a classification head on top of the model outputs which can be easily trained with the base model. It also uses [TextClassificationPipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextClassificationPipeline).

Extra link on [Whylogs](https://whylogs.readthedocs.io/en/latest/index.html) for data logging UDFs:
* [Examples](https://whylogs.readthedocs.io/en/latest/examples/experimental/whylogs_UDF_examples.html?highlight=udf#Logging)
* [UDFs (User-defined functions)](https://whylabs.ai/blog/posts/announcing-user-defined-functions-in-whylogs) allow you to craft custom metrics and they are the foundation for monitoring complex data.
  * [Soruce code udfs](https://github.com/whylabs/whylogs/blob/mainline/python/whylogs/experimental/core/udf_schema.py#L443)
  * [source code for metrics](https://github.com/whylabs/whylogs/blob/mainline/python/whylogs/core/metrics/metrics.py)

In [None]:
# Huggung faces library for data sets
from datasets import load_dataset

In [None]:
# Requesting an Iterable data set
heateful_comments = load_dataset("tweet_eval", "hate", split="train", streaming=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

In [None]:
heateful_comments

IterableDataset({
    features: ['text', 'label'],
    n_shards: 1
})

In [None]:
# Iterator
comments_viewer = iter(heateful_comments)

In [None]:
comments = iter(heateful_comments)

In [None]:
# visualising some messages
for _ in range(5):
  comment = next(comments_viewer)
  if comment['label'] == 0:
    print('Non-heatful: ')
    print(comment['text'])
    print()
  else:
    print('Hateful: ')
    print(comment['text'])
    print()

Hateful: 
This account was temporarily inactive due to an irrational woman reporting us to Twitter. What a lack of judgement, shocking. #YesAllMen

Non-heatful: 
RT @user @user I am flattered tbh. I got an orgasm just by thinking of it you really are good at this.

Non-heatful: 
Making them look ~anatomically correct~ just makes them... Bland. Not all women are tiny and fit, not all men are bulky.

Hateful: 
listen... i love lil b but i would not fuck with dej loaf at all. she prob got poison up in her nail polish like that bitch from holes

Hateful: 
@user @user keep calling her a bitch and a whore dude real grown up.



In [None]:
from whylogs.experimental.core.udf_schema import udf_schema

## Initializing the Metrics
To initialize the toxicity and sentiment metrics, we simply import the respective modules from langkit. This will *automatically register the metrics*, so we can start using them right away by creating a schema by calling *generate_udf_schema* (in the [code](https://github.com/whylabs/langkit/blob/main/langkit/toxicity.py#L33) I see *register_dataset_udf* not the generate one). We will pass that schema to whylogs, so that it knows which metrics to track.

In [None]:
from langkit import toxicity

tokenizer_config.json:   0%|          | 0.00/403 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/704 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

In [None]:
# Path where the models from huggingface are downloaded
!ls ~/.cache/huggingface/hub/models--martin-ha--toxic-comment-model/snapshots

9842c08b35a4687e7b211187d676986c8c96256d


In [None]:
!ls -alrt ~/.cache/huggingface/hub/models--martin-ha--toxic-comment-model/

total 24
drwxr-xr-x 3 root root 4096 Jan 30 16:10 snapshots
drwxr-xr-x 2 root root 4096 Jan 30 16:10 refs
drwxr-xr-x 3 root root 4096 Jan 30 16:10 .no_exist
drwxr-xr-x 2 root root 4096 Jan 30 16:10 blobs
drwxr-xr-x 4 root root 4096 Jan 30 16:10 ..
drwxr-xr-x 6 root root 4096 Jan 30 16:10 .


In [None]:
# The bigger one is the one missing in the VCSe and times out.
!ls -alhrt ~/.cache/huggingface/hub/models--martin-ha--toxic-comment-model/blobs

total 257M
-rw-r--r-- 1 root root  403 Jan 30 16:10 45394c87f8b707c55c41c98cdfb2027d9a372bb8
-rw-r--r-- 1 root root 227K Jan 30 16:10 fb140275c155a9c7c5a3b3e0e77a9e839594a938
-rw-r--r-- 1 root root 456K Jan 30 16:10 40c4a0f6c414c8218190234bbce9bf4cc04fa3ac
-rw-r--r-- 1 root root  112 Jan 30 16:10 e7b0375001f109a6b8873d756ad4f7bbb15fbaa5
-rw-r--r-- 1 root root  704 Jan 30 16:10 6dec3961f7bf17826a0c431a7b52b62f74a51d9a
-rw-r--r-- 1 root root 256M Jan 30 16:10 569aed60978bec9cdc5a90e660fe860e2eccd4f72479c1aac0c9b6c64a581e94
drwxr-xr-x 6 root root 4.0K Jan 30 16:10 ..
drwxr-xr-x 2 root root 4.0K Jan 30 16:10 .


In [None]:
!ls -alrt ~/.cache/huggingface/hub/models--martin-ha--toxic-comment-model/refs

total 12
-rw-r--r-- 1 root root   40 Jan 30 16:10 main
drwxr-xr-x 2 root root 4096 Jan 30 16:10 .
drwxr-xr-x 6 root root 4096 Jan 30 16:10 ..


In [None]:
# importing the sentiment and gettig the nltk sentiment VADER analiser
from langkit import sentiment

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


In [None]:
# The collection of UDFs that you want to run,
# then generate a schema using `udf_schema()` and pass it to the logger
text_schema = udf_schema()

In [None]:
# Importing the bits to start a Whylogs session
import whylogs as why

In [None]:
why.init()

❓ What kind of session do you want to use?
 ⤷ 1. WhyLabs. Use an api key to upload to WhyLabs.
 ⤷ 2. WhyLabs Anonymous. Upload data anonymously to WhyLabs and get a viewing url.

Enter a number from the list: 2
Initializing session with config /root/.config/whylogs/config.ini

✅ Using session type: WHYLABS_ANONYMOUS
 ⤷ session id: <will be generated before upload>


<whylogs.api.whylabs.session.session.GuestSession at 0x7a701a7f34f0>

## Now we're set to log our data.

To make sure the metrics make sense, we will profile two separate groups of data:

* hateful comments: comments that are labeled as hateful
* non-hateful comments: comments that are labeled as non-hateful
* We can expect hateful comments to have a higher toxicity score and a lower sentiment score than non-hateful comments.

Let's see if our metrics will reflect that.

In [None]:
# Just initializing the profiles with generic comments.
non_hateful_profile = why.log({"prompt":"I love flowers."}, schema=text_schema).profile()
hateful_profile = why.log({"prompt":"I hate biscuits."}, schema=text_schema).profile()


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1706572800000&sessionToken=session-8SSf3EeM

✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1706572800000&sessionToken=session-8SSf3EeM


In [None]:
# Lookp through the tweeter examples
for _ in range(200):
  comment = next(comments)
  if comment['label'] == 0:
    non_hateful_profile.track({"prompt":comment['text']})
  else:
    hateful_profile.track({"prompt":comment['text']})

Now that we have our profiles, let's check out the metrics by pulling it progreamatically. Let's compare the mean for our sentiment and toxicity scores, for each group (hateful and non-hateful):

In [None]:
hateful_sentiment = hateful_profile.view().get_column("prompt.sentiment_nltk").to_summary_dict()["distribution/mean"]
non_hateful_sentiment = non_hateful_profile.view().get_column("prompt.sentiment_nltk").to_summary_dict()["distribution/mean"]

In [None]:
hateful_toxicity = hateful_profile.view().get_column("prompt.toxicity").to_summary_dict()["distribution/mean"]
non_hateful_toxicity = non_hateful_profile.view().get_column("prompt.toxicity").to_summary_dict()["distribution/mean"]

In [None]:
print("######### Sentiment #########")
print(f"The average sentiment score for the hateful comments is {hateful_sentiment}")
print(f"The average sentiment score for the non-hateful comments is {non_hateful_sentiment}")

print("######### Toxicity #########")
print(f"The average toxicity score for the hateful comments is {hateful_toxicity}")
print(f"The average toxicity score for the non-hateful comments is {non_hateful_toxicity}")

######### Sentiment #########
The average sentiment score for the hateful comments is -0.37580107526881734
The average sentiment score for the non-hateful comments is -0.062103669724770626
######### Toxicity #########
The average toxicity score for the hateful comments is 0.3786836074244593
The average toxicity score for the non-hateful comments is 0.13610612689901935


In [None]:
hateful_profile.view().to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,...,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
prompt,93.000021,93.0,93.004665,0,93,0,0,,0.0,,...,,,0.0,SummaryType.COLUMN,0,0,0,0,93,0
prompt.sentiment_nltk,77.000015,77.0,77.003859,0,93,0,0,0.8475,-0.375801,-0.5093,...,0.624,0.8475,0.482811,SummaryType.COLUMN,0,93,0,0,0,0
prompt.toxicity,93.000021,93.0,93.004665,0,93,0,0,0.967503,0.378684,0.15616,...,0.96345,0.967503,0.41487,SummaryType.COLUMN,0,93,0,0,0,0


In [None]:
non_hateful_profile.view().to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,...,distribution/q_95,distribution/q_99,distribution/stddev,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
prompt,109.000029,109.0,109.005472,0,109,0,0,,0.0,,...,,,0.0,SummaryType.COLUMN,0,0,0,0,109,0
prompt.sentiment_nltk,66.000011,66.0,66.003306,0,109,0,0,0.9493,-0.062104,0.0,...,0.7003,0.765,0.468997,SummaryType.COLUMN,0,109,0,0,0,0
prompt.toxicity,109.000029,109.0,109.005472,0,109,0,0,0.950484,0.136106,0.003526,...,0.92814,0.948145,0.29043,SummaryType.COLUMN,0,109,0,0,0,0
