<a href="https://colab.research.google.com/github/Starignus/testing_langkit/blob/main/03_Trying_LangKit_Logging_text.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Investigating LangKit

[LangKit is an open-source](https://github.com/whylabs/langkit/tree/main) text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library [whylogs](https://whylogs.readthedocs.io/en/latest/).

[LangKit can monitor and safeguard](https://whylabs.ai/blog/posts/langkit-making-large-language-models-safe-and-responsible) your LLMs by quickly detecting and preventing malicious prompts, toxicity, hallucinations, and jailbreak attempts. You can check the [metrics that covers](https://github.com/whylabs/langkit/blob/main/langkit/docs/modules.md).

First let's install the required libraries.

In [None]:
!pip install transformers[torch]



In [None]:
!pip install langkit[all]



In [None]:
!pip install huggingface-hub==0.20.2



In [None]:
!pip install bigframes==0.18.0



In [None]:
!pip install torch



In [None]:
! pip install datasets==2.16.1



# [Loging text](https://github.com/whylabs/langkit/blob/main/langkit/examples/Intro_to_Langkit.ipynb)


In [None]:
# Path where the models from huggingface are downloaded before importing
!ls ~/.cache/

huggingface  matplotlib  node-gyp  pip


In [None]:
import whylogs as why
from langkit import light_metrics

llm_schema = light_metrics.init()
print("Done initializing metrics.")

Done initializing metrics.


In [None]:
# Path where the models from huggingface are downloaded
!ls ~/.cache/huggingface/hub/

models--martin-ha--toxic-comment-model		 version.txt
models--sentence-transformers--all-MiniLM-L6-v2


In [None]:
#  Path where the nltk data is downloaded
!ls /root/nltk_data/sentiment

vader_lexicon.zip


```light_metrics``` is composed by the following modules:

* ```textstat```: Text quality, readability, complexity, and grade level.
* ```regexes```: Regex pattern matching for sensitive information

We cans see that importing the metrics, some models where downloaded. You can check details on the models and submodules used for the text statistics and regular expresion search paterns in the [documentation](https://github.com/whylabs/langkit/blob/main/langkit/docs/modules.md). The regexes can be customised and is a json file ```pattern_groups.json```.

# Feature Extraction

Langkit can be used to extract features from text data. The llm_schema created previously will guide the feature extraction process.

In [None]:
from langkit import extract
import pandas as pd

In [None]:
df = pd.DataFrame({'prompt': ['Hello', 'What is your number?', 'What is your adress?'],
                   'response': ['World','my phone is +1 309-404-7587', 'my adress is 304 Brown streest W1 0F3']})

In [None]:
df

Unnamed: 0,prompt,response
0,Hello,World
1,What is your number?,my phone is +1 309-404-7587
2,What is your adress?,my adress is 304 Brown streest W1 0F3


In [None]:
enhanced_df = extract(df, schema=llm_schema)

enhanced_df

Unnamed: 0,prompt,response,prompt.flesch_reading_ease,response.flesch_reading_ease,prompt.automated_readability_index,response.automated_readability_index,prompt.aggregate_reading_level,response.aggregate_reading_level,prompt.syllable_count,response.syllable_count,...,prompt.letter_count,response.letter_count,prompt.polysyllable_count,response.polysyllable_count,prompt.monosyllable_count,response.monosyllable_count,prompt.difficult_words,response.difficult_words,prompt.has_patterns,response.has_patterns
0,Hello,World,36.62,121.22,2.6,2.6,0.0,0.0,2,1,...,5,5,0,0,0,1,0,0,,
1,What is your number?,my phone is +1 309-404-7587,92.8,117.16,0.6,2.7,1.0,2.0,5,5,...,16,20,0,0,3,5,0,0,,phone number
2,What is your adress?,my adress is 304 Brown streest W1 0F3,118.18,114.12,0.6,0.2,0.0,0.0,4,8,...,16,30,0,0,4,8,0,0,,


You can also pass a *dictionary*.

In [None]:
enhanced_row = extract({"prompt": "What is your number?","response": "my phone is +1 309-404-7587"},
                       schema=llm_schema)
enhanced_row

{'prompt': 'What is your number?',
 'response': 'my phone is +1 309-404-7587',
 'prompt.flesch_reading_ease': 92.8,
 'response.flesch_reading_ease': 117.16,
 'prompt.automated_readability_index': 0.6,
 'response.automated_readability_index': 2.7,
 'prompt.aggregate_reading_level': 1.0,
 'response.aggregate_reading_level': 2.0,
 'prompt.syllable_count': 5,
 'response.syllable_count': 5,
 'prompt.lexicon_count': 4,
 'response.lexicon_count': 5,
 'prompt.sentence_count': 1,
 'response.sentence_count': 1,
 'prompt.character_count': 17,
 'response.character_count': 23,
 'prompt.letter_count': 16,
 'response.letter_count': 20,
 'prompt.polysyllable_count': 0,
 'response.polysyllable_count': 0,
 'prompt.monosyllable_count': 3,
 'response.monosyllable_count': 5,
 'prompt.difficult_words': 0,
 'response.difficult_words': 0,
 'prompt.has_patterns': None,
 'response.has_patterns': 'phone number'}

# Statistical Profiling with whylogs

LangKit modules contain UDFs that automatically wire into the collection of UDFs on String features provided by whylogs by default.

All we have to do is pass the schema to ```why.log()```:

In [None]:
# Initialize a session and authenticate
why.init()

❓ What kind of session do you want to use?
 ⤷ 1. WhyLabs. Use an api key to upload to WhyLabs.
 ⤷ 2. WhyLabs Anonymous. Upload data anonymously to WhyLabs and get a viewing url.

Enter a number from the list: 2
Initializing session with config /root/.config/whylogs/config.ini

✅ Using session type: WHYLABS_ANONYMOUS
 ⤷ session id: <will be generated before upload>


<whylogs.api.whylabs.session.session.GuestSession at 0x7fbb850201c0>

In [None]:
results = why.log({"prompt": "Hello,", "response": "World!"}, schema=llm_schema)
print("Done profiling! Let's look at some of the metrics:")


✅ Aggregated 1 rows into profile 

Visualize and explore this profile with one-click
🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1706832000000&sessionToken=session-PqLk2vsP
Done profiling! Let's look at some of the metrics:


In [None]:
view = results.view()
for col_name in view.get_columns():
    print(col_name)
print()
print("Here is the summary for response metrics")
view.get_column("response").to_summary_dict()

prompt
response
prompt.flesch_reading_ease
response.flesch_reading_ease
prompt.automated_readability_index
response.automated_readability_index
prompt.aggregate_reading_level
response.aggregate_reading_level
prompt.syllable_count
response.syllable_count
prompt.lexicon_count
response.lexicon_count
prompt.sentence_count
response.sentence_count
prompt.character_count
response.character_count
prompt.letter_count
response.letter_count
prompt.polysyllable_count
response.polysyllable_count
prompt.monosyllable_count
response.monosyllable_count
prompt.difficult_words
response.difficult_words
prompt.has_patterns
response.has_patterns

Here is the summary for response metrics


{'counts/n': 1,
 'counts/null': 0,
 'counts/nan': 0,
 'counts/inf': 0,
 'types/integral': 0,
 'types/fractional': 0,
 'types/boolean': 0,
 'types/string': 1,
 'types/object': 0,
 'types/tensor': 0,
 'distribution/mean': 0.0,
 'distribution/stddev': 0.0,
 'distribution/n': 0,
 'distribution/max': nan,
 'distribution/min': nan,
 'distribution/q_01': None,
 'distribution/q_05': None,
 'distribution/q_10': None,
 'distribution/q_25': None,
 'distribution/median': None,
 'distribution/q_75': None,
 'distribution/q_90': None,
 'distribution/q_95': None,
 'distribution/q_99': None,
 'cardinality/est': 1.0,
 'cardinality/upper_1': 1.000049929250618,
 'cardinality/lower_1': 1.0}

In [None]:
results.view().to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,...,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor,ints/max,ints/min,frequent_items/frequent_strings
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
prompt,1.0,1.0,1.00005,0,1,0,0,,0.0,,...,SummaryType.COLUMN,0,0,0,0,1,0,,,
prompt.aggregate_reading_level,1.0,1.0,1.00005,0,1,0,0,0.0,0.0,0.0,...,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.automated_readability_index,1.0,1.0,1.00005,0,1,0,0,7.3,7.3,7.3,...,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.character_count,1.0,1.0,1.00005,0,1,0,0,6.0,6.0,6.0,...,SummaryType.COLUMN,0,0,1,0,0,0,6.0,6.0,
prompt.difficult_words,1.0,1.0,1.00005,0,1,0,0,0.0,0.0,0.0,...,SummaryType.COLUMN,0,0,1,0,0,0,0.0,0.0,
prompt.flesch_reading_ease,1.0,1.0,1.00005,0,1,0,0,36.62,36.62,36.62,...,SummaryType.COLUMN,0,1,0,0,0,0,,,
prompt.has_patterns,,,,0,1,0,1,,,,...,SummaryType.COLUMN,0,0,0,0,0,0,,,[]
prompt.letter_count,1.0,1.0,1.00005,0,1,0,0,5.0,5.0,5.0,...,SummaryType.COLUMN,0,0,1,0,0,0,5.0,5.0,
prompt.lexicon_count,1.0,1.0,1.00005,0,1,0,0,1.0,1.0,1.0,...,SummaryType.COLUMN,0,0,1,0,0,0,1.0,1.0,
prompt.monosyllable_count,1.0,1.0,1.00005,0,1,0,0,0.0,0.0,0.0,...,SummaryType.COLUMN,0,0,1,0,0,0,0.0,0.0,


In [None]:
results_profile = results.profile()

In [None]:
type(results_profile)

whylogs.core.dataset_profile.DatasetProfile

The object is a dataset profile represents a collection of in-memory profiling stats for a dataset. The [track method](https://whylogs.readthedocs.io/en/latest/api/whylogs/core/dataset_profile/index.html#whylogs.core.dataset_profile.DatasetProfile.track) updates the inmemory data set and calcuate the metric, but does not upload it to the Dashbaord.

In [None]:
results_profile.track({"prompt": "What is your number?","response": "my phone is +1 309-404-7587"})

In [None]:
results_profile.view().to_pandas()

Unnamed: 0_level_0,cardinality/est,cardinality/lower_1,cardinality/upper_1,counts/inf,counts/n,counts/nan,counts/null,distribution/max,distribution/mean,distribution/median,...,type,types/boolean,types/fractional,types/integral,types/object,types/string,types/tensor,ints/max,ints/min,frequent_items/frequent_strings
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
prompt,2.0,2.0,2.0001,0,2,0,0,,0.0,,...,SummaryType.COLUMN,0,0,0,0,2,0,,,
prompt.aggregate_reading_level,2.0,2.0,2.0001,0,2,0,0,1.0,0.5,1.0,...,SummaryType.COLUMN,0,2,0,0,0,0,,,
prompt.automated_readability_index,2.0,2.0,2.0001,0,2,0,0,7.3,3.95,7.3,...,SummaryType.COLUMN,0,2,0,0,0,0,,,
prompt.character_count,2.0,2.0,2.0001,0,2,0,0,17.0,11.5,17.0,...,SummaryType.COLUMN,0,0,2,0,0,0,17.0,6.0,
prompt.difficult_words,1.0,1.0,1.00005,0,2,0,0,0.0,0.0,0.0,...,SummaryType.COLUMN,0,0,2,0,0,0,0.0,0.0,
prompt.flesch_reading_ease,2.0,2.0,2.0001,0,2,0,0,92.8,64.71,92.8,...,SummaryType.COLUMN,0,2,0,0,0,0,,,
prompt.has_patterns,,,,0,2,0,2,,,,...,SummaryType.COLUMN,0,0,0,0,0,0,,,[]
prompt.letter_count,2.0,2.0,2.0001,0,2,0,0,16.0,10.5,16.0,...,SummaryType.COLUMN,0,0,2,0,0,0,16.0,5.0,
prompt.lexicon_count,2.0,2.0,2.0001,0,2,0,0,4.0,2.5,4.0,...,SummaryType.COLUMN,0,0,2,0,0,0,4.0,1.0,
prompt.monosyllable_count,2.0,2.0,2.0001,0,2,0,0,3.0,1.5,3.0,...,SummaryType.COLUMN,0,0,2,0,0,0,3.0,0.0,
