# Annotations for LLMs with Streamlit and W&B

With [Weights & Biases](https://wandb.ai/site), log inputs and outputs from LLM experiments, then evaluate results. Examine individual prompts and responses at the application scale.

W&B Tables stores these critical assets in a single system of record alongside other artifacts, such as input datasets and model checkpoints, with essential metadata and lineage tracked for transparency and reproducibility.

One smart strategy is revising these assets in a table to improve on model performance. [Streamlit's data editor](https://docs.streamlit.io/library/api-reference/data/st.data_editor), showcased in this application, provides an elegant and flexible solution using W&B Tables. Through the application's UI, annotators can flag outlier model responses, select next steps for refinement, and edit results in-place as needed. All of that can be easily exported and stored as a subsequent artifact to a Weights & Biases LLM development or tuning project.

This notebook walks through one simple approach, with the following steps:
1. Run automated summary of news articles with Hugging Face [pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines)
2. Log Tables to W&B to compare two model approaches
3. Download CSV files of Tables to annotate in Streamlit
4. Annotate tables with Streamlit data editor
5. Load annotated Tables to W&B for versioning and evaluation

### 🏁 Let's get started!


First, install dependencies for W&B and Hugging Face.

In [None]:
# Dependencies
! pip install datasets transformers
! pip install wandb -qq
! pip install accelerate -U

In [None]:
import pandas as pd

from transformers import pipeline
from datasets import load_dataset, Dataset

import wandb
wandb.login()

# import weave
# # from weave import ops_arrow
# # from weave.ops_arrow import constructors as arrow_constructors
# from weave.monitoring import StreamTable
# import pyarrow as pa

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

##1. Run automated summary of news articles with Hugging Face pipelines

This notebook will use a summarization example to showcase W&B and Streamlit, together. Summarization can serve a lot of important uses in ML pipelines, from assisting in data quality checks to preprocessing long-form data into something digestible for a downstream task, e.g., classification.There are many options out there for generating summaries automatically, but for ease of use we are going with Hugging Face [pipelines](https://huggingface.co/docs/transformers/v4.17.0/en/main_classes/pipelines#transformers.SummarizationPipeline).

In [None]:
NUM_ARTICLES = 20

We will use the tried-and-true CNN/Daily Mail [dataset](https://huggingface.co/datasets/cnn_dailymail) to test out summarization outputs from 2 different pre-trained models from the Hugging Face [model repository](https://huggingface.co/models).

In [None]:
cnn_dailymail = load_dataset('cnn_dailymail', '3.0.0')

input_df = cnn_dailymail['train'].to_pandas().sample(frac=1)[:NUM_ARTICLES]
articles = input_df['article'].values

In [None]:
# Define summarizers for 2 different models for comparison
bart_summarizer = pipeline("summarization", "facebook/bart-large-cnn")
samsum_summarizer = pipeline("summarization", "philschmid/bart-large-cnn-samsum")

In [None]:
# Create dataframe for each group of 20 summaries
bart_summaries = []
bart_samsum_summaries = []

for article in articles:
    bart_summaries.append(bart_summarizer(article[:1024])[0]["summary_text"])
    bart_samsum_summaries.append(samsum_summarizer(article[:512])[0]["summary_text"])

In [None]:
bart_df = pd.DataFrame({
    "articles": articles,
    "bart_summaries": bart_summaries,
})

In [None]:
bart_df

Unnamed: 0,articles,bart_summaries
0,"By . Chris Parsons . PUBLISHED: . 02:41 EST, 2...",Valerie Trierweiler tweeted support for a poli...
1,"By . Damien Gayle . PUBLISHED: . 08:10 EST, 12...",Nasa and Florida Institute for Human and Machi...
2,A father who imported a stun gun disguised as ...,"John Liddiatt, 40, ordered the device online w..."
3,"February 10, 2015 . Economics, international p...",This page includes the show Transcript. Use th...
4,"By . James Rush . PUBLISHED: . 10:51 EST, 5 Ap...","Boy, three, two teenagers and a man in his 30s..."
5,By . Daily Mail Reporter . PUBLISHED: . 19:11 ...,"Grover J. Prewitt Jr., 60, of Bristow was arre..."
6,"By . James Rush . PUBLISHED: . 06:02 EST, 30 S...",Sheik Mohammed bin Rashid Al Maktoum has order...
7,By . Sam Webb and Amanda Williams . PUBLISHED:...,Six out of the last seven UK summers have seen...
8,(CNN) -- When CNN highlighted some excellent h...,Tampa's Columbia Restaurant is 107 years old. ...
9,(CNN) -- Japanese golf prodigy Ryo Ishikawa ha...,Japanese golf prodigy Ryo Ishikawa will donate...


In [None]:
samsum_df = pd.DataFrame({
    "articles": articles,
    "bart_samsum_summaries": bart_samsum_summaries,
})

In [None]:
samsum_df

Unnamed: 0,articles,bart_samsum_summaries
0,"By . Chris Parsons . PUBLISHED: . 02:41 EST, 2...",Segolene Royal lost the presidential election ...
1,"By . Damien Gayle . PUBLISHED: . 08:10 EST, 12...",Nasa and the Florida Institute for Human and M...
2,A father who imported a stun gun disguised as ...,"John Liddiatt, 40, imported a stun gun disguis..."
3,"February 10, 2015 . Economics, international p...",This page includes CNN Student News stories on...
4,"By . James Rush . PUBLISHED: . 10:51 EST, 5 Ap...",A three-year-old boy and two other people rema...
5,By . Daily Mail Reporter . PUBLISHED: . 19:11 ...,"Grover J. Prewitt Jr., 60, of Bristow, Oklahom..."
6,"By . James Rush . PUBLISHED: . 06:02 EST, 30 S...",Banned equine drugs were found on a Dubai gove...
7,By . Sam Webb and Amanda Williams . PUBLISHED:...,Met Office experts predict a decade of wet sum...
8,(CNN) -- When CNN highlighted some excellent h...,"Last month, CNN highlighted some excellent his..."
9,(CNN) -- Japanese golf prodigy Ryo Ishikawa ha...,Ryo Ishikawa will donate his tournament earnin...


In [None]:
# Combine dataframes for logging to W&B

joint_df = pd.DataFrame({
    "articles": articles,
    "bart_summaries": bart_summaries,
    "bart_samsum_summaries": bart_samsum_summaries,
})

In [None]:
joint_df

Unnamed: 0,articles,bart_summaries,bart_samsum_summaries
0,"By . Chris Parsons . PUBLISHED: . 02:41 EST, 2...",Valerie Trierweiler tweeted support for a poli...,Segolene Royal lost the presidential election ...
1,"By . Damien Gayle . PUBLISHED: . 08:10 EST, 12...",Nasa and Florida Institute for Human and Machi...,Nasa and the Florida Institute for Human and M...
2,A father who imported a stun gun disguised as ...,"John Liddiatt, 40, ordered the device online w...","John Liddiatt, 40, imported a stun gun disguis..."
3,"February 10, 2015 . Economics, international p...",This page includes the show Transcript. Use th...,This page includes CNN Student News stories on...
4,"By . James Rush . PUBLISHED: . 10:51 EST, 5 Ap...","Boy, three, two teenagers and a man in his 30s...",A three-year-old boy and two other people rema...
5,By . Daily Mail Reporter . PUBLISHED: . 19:11 ...,"Grover J. Prewitt Jr., 60, of Bristow was arre...","Grover J. Prewitt Jr., 60, of Bristow, Oklahom..."
6,"By . James Rush . PUBLISHED: . 06:02 EST, 30 S...",Sheik Mohammed bin Rashid Al Maktoum has order...,Banned equine drugs were found on a Dubai gove...
7,By . Sam Webb and Amanda Williams . PUBLISHED:...,Six out of the last seven UK summers have seen...,Met Office experts predict a decade of wet sum...
8,(CNN) -- When CNN highlighted some excellent h...,Tampa's Columbia Restaurant is 107 years old. ...,"Last month, CNN highlighted some excellent his..."
9,(CNN) -- Japanese golf prodigy Ryo Ishikawa ha...,Japanese golf prodigy Ryo Ishikawa will donate...,Ryo Ishikawa will donate his tournament earnin...


Here, we define a simple word function and [lexical diversity](https://en.wikipedia.org/wiki/Lexical_diversity) function, which can be useful data points for examining text inputs and gauging how completely and fluently summarization outputs capture their "meaning."
<br>
<br>
There are many methods and dimensions to consider when evaluating summaries, to a quick vibes check to reference-based metrics (if you are lucky enough to have gold-standard reference summaries 🍀). This walkthrough shows a simple manual approach, where automated summaries are evaluated for further refinement.

In [None]:
# Function to calculate word count
def calculate_word_count(text):
    words = text.split()
    return len(words)

# Function to calculate lexical diversity
def calculate_lexical_diversity(text):
    words = text.split()
    unique_words = set(words)
    return round((len(unique_words) / len(words)), 3)


In [None]:
# Compute word count and append to bart dataframe
bart_df['source_word_count'] = bart_df['articles'].apply(lambda x: calculate_word_count(x))

# Compute summary word count and append to bart dataframe
bart_df['summary_word_count'] = bart_df['bart_summaries'].apply(lambda x: calculate_word_count(x))

# Compute lexical diversity and append to dataframe
bart_df['source_lexical_diversity'] = bart_df['articles'].apply(lambda x: calculate_lexical_diversity(x))

# Compute summary. lexical diversity and append to dataframe
bart_df['summary_lexical_diversity'] = bart_df['bart_summaries'].apply(lambda x: calculate_lexical_diversity(x))

# Display the DataFrame
bart_df

Unnamed: 0,articles,bart_summaries,source_word_count,summary_word_count,source_lexical_diversity,summary_lexical_diversity
0,"By . Chris Parsons . PUBLISHED: . 02:41 EST, 2...",Valerie Trierweiler tweeted support for a poli...,719,40,0.517,0.925
1,"By . Damien Gayle . PUBLISHED: . 08:10 EST, 12...",Nasa and Florida Institute for Human and Machi...,853,56,0.523,0.857
2,A father who imported a stun gun disguised as ...,"John Liddiatt, 40, ordered the device online w...",538,54,0.507,0.815
3,"February 10, 2015 . Economics, international p...",This page includes the show Transcript. Use th...,239,48,0.64,0.833
4,"By . James Rush . PUBLISHED: . 10:51 EST, 5 Ap...","Boy, three, two teenagers and a man in his 30s...",581,48,0.508,0.938
5,By . Daily Mail Reporter . PUBLISHED: . 19:11 ...,"Grover J. Prewitt Jr., 60, of Bristow was arre...",789,48,0.485,0.917
6,"By . James Rush . PUBLISHED: . 06:02 EST, 30 S...",Sheik Mohammed bin Rashid Al Maktoum has order...,727,53,0.469,0.925
7,By . Sam Webb and Amanda Williams . PUBLISHED:...,Six out of the last seven UK summers have seen...,1351,59,0.432,0.814
8,(CNN) -- When CNN highlighted some excellent h...,Tampa's Columbia Restaurant is 107 years old. ...,1440,48,0.578,0.917
9,(CNN) -- Japanese golf prodigy Ryo Ishikawa ha...,Japanese golf prodigy Ryo Ishikawa will donate...,235,49,0.651,0.878


In [None]:
# Compute word count and append to samsum dataframe
samsum_df['source_word_count'] = samsum_df['articles'].apply(lambda x: calculate_word_count(x))

# Compute lexical diversity and append to dataframe
samsum_df['source_lexical_diversity'] = samsum_df['articles'].apply(lambda x: calculate_lexical_diversity(x))

# Compute summary word count and append to bart dataframe
samsum_df['summary_word_count'] = samsum_df['bart_samsum_summaries'].apply(lambda x: calculate_word_count(x))

# Compute summary. lexical diversity and append to dataframe
samsum_df['summary_lexical_diversity'] = samsum_df['bart_samsum_summaries'].apply(lambda x: calculate_lexical_diversity(x))

# Display the DataFrame
samsum_df

Unnamed: 0,articles,bart_samsum_summaries,source_word_count,source_lexical_diversity,summary_word_count,summary_lexical_diversity
0,"By . Chris Parsons . PUBLISHED: . 02:41 EST, 2...",Segolene Royal lost the presidential election ...,719,0.517,47,0.872
1,"By . Damien Gayle . PUBLISHED: . 08:10 EST, 12...",Nasa and the Florida Institute for Human and M...,853,0.523,56,0.857
2,A father who imported a stun gun disguised as ...,"John Liddiatt, 40, imported a stun gun disguis...",538,0.507,50,0.88
3,"February 10, 2015 . Economics, international p...",This page includes CNN Student News stories on...,239,0.64,54,0.889
4,"By . James Rush . PUBLISHED: . 10:51 EST, 5 Ap...",A three-year-old boy and two other people rema...,581,0.508,50,0.8
5,By . Daily Mail Reporter . PUBLISHED: . 19:11 ...,"Grover J. Prewitt Jr., 60, of Bristow, Oklahom...",789,0.485,54,0.889
6,"By . James Rush . PUBLISHED: . 06:02 EST, 30 S...",Banned equine drugs were found on a Dubai gove...,727,0.469,43,0.93
7,By . Sam Webb and Amanda Williams . PUBLISHED:...,Met Office experts predict a decade of wet sum...,1351,0.432,41,0.854
8,(CNN) -- When CNN highlighted some excellent h...,"Last month, CNN highlighted some excellent his...",1440,0.578,41,0.902
9,(CNN) -- Japanese golf prodigy Ryo Ishikawa ha...,Ryo Ishikawa will donate his tournament earnin...,235,0.651,48,0.875


## 2. Log Tables to W&B to compare two model approaches

[W&B Tables](https://docs.wandb.ai/guides/tables) help you visualize and query tabular data, whether it be numeric, categorical, text, images, or multimodal datasets. Tables help users compare how different models perform on the same test set, identify patterns in data (especially helpful with text analysis), and query inputs and outputs effectively to find outliers or useful patterns.
<br>
<br>
Here we log our automatically-generated summaries to W&B as an initial step in the overall LLM development and evaluation process. If you do not have a W&B account yet, follow this simple [quickstart](https://docs.wandb.ai/quickstart) to get set up 🌟

In [None]:
# log bart table to W&B
run = wandb.init(project="news_summarization", name="load_bart_df")
bart_table_v1 = wandb.Table(dataframe=bart_df)
wandb.log({"BART summaries v1": bart_table_v1})

In [None]:
wandb.finish()

In [None]:
# log samsum table to W&B
run = wandb.init(project="news_summarization", name="load_samsum_df")
samsum_table_v1 = wandb.Table(dataframe=samsum_df)
wandb.log({"SAMSUM summaries v1":samsum_table_v1})

In [None]:
wandb.finish()

In [None]:
# log joint table to W&B
run = wandb.init(project="news_summarization", name="load_joint_df")
joint_table_v1 = wandb.Table(dataframe=joint_df)
wandb.log({"Combined summaries v1": joint_table_v1})

In [None]:
wandb.finish()

## 3. Download CSV files of Tables to annotate in Streamlit

W&B Tables can be exported easily, [programatically](https://docs.wandb.ai/guides/tables/tables-download) or from the UI. To instrument with python, we will convert a table to a W&B artifact (learn more [here](https://docs.wandb.ai/guides/artifacts) and then to a dataframe. From there, it's a simple csv export.
<br>
<br>
These csv files can be loaded to a simple Streamlit app for labeling.

In [None]:
# Example of how to load a table from step 2 to a csv file
bart_WB_df = bart_table_v1.get_dataframe()

In [None]:
# Convert the table data to .csv
bart_WB_df.to_csv("example.csv", encoding="utf-8")

## 4. Annotate tables with Streamlit data editor

This W&B repo contains a simple app that takes a user-loaded .csv file, creates a dataframe, displays that dataframe in a Streamlit app, and enables manual editing and exporting of a revised .csv file.

Once you have built your app and have it stored with any dependencies needed, you can run the app wherever Streamlit is installed with `run streamlit app.py` and you will get a URL for the app (http://localhost:8501/)

## 5. Load annotated Tables to W&B for versioning and evaluation

Once you have revised any or all entries in your Streamlit tables and exported the new .csv files, you can load the annotated version to the same W&B project to capture that step, and all its metadata, to keep in a central system of record for your LLM development project.

In [None]:
# Create DataFrame
annotated_bart_df = pd.read_csv('annotated_bart.csv', index_col=0)
annotated_bart_df.head()

In [None]:
# Create DataFrame
annotated_samsum_df = pd.read_csv('annotated_samsum.csv', index_col=0)
annotated_samsum_df.head()

In [None]:
# Log as artifact to a project
run = wandb.init(project="news_summarization")
bart_table = wandb.Table(dataframe=annotated_bart_df)
wandb.log({"Annotated BART summaries": bart_table})

In [None]:
# Log as artifact to a project
run = wandb.init(project="news_summarization")
samsum_table = wandb.Table(dataframe=annotated_samsum_df)
wandb.log({"Annotated SAMSUM summaries": samsum_table})

In [None]:
wandb.finish()