<a href="https://colab.research.google.com/github/Jay-Nehra/trustworthy_language_models/blob/main/tml_low_quality_data_samples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%pip install --upgrade cleanlab-studio

In [None]:
!pip install datasets


# Use Case: Identify low-quality responses in the Dolly dataset **bold text**

In [5]:
import pandas as pd
from cleanlab_studio import Studio

from datasets import load_dataset

# Initialize the Studio with your Cleanlab API key
studio = Studio("<API Key>")

dataset = load_dataset('databricks/databricks-dolly-15k', split='train')
df = pd.DataFrame(dataset)

print("Dataset loaded successfully. Here are the first few rows:")
print(df.head())


Dataset loaded successfully. Here are the first few rows:
                                         instruction  \
0         When did Virgin Australia start operating?   
1           Which is a species of fish? Tope or Rope   
2     Why can camels survive for long without water?   
3  Alice's parents have three daughters: Amy, Jes...   
4                    When was Tomoaki Komorida born?   

                                             context  \
0  Virgin Australia, the trading name of Virgin A...   
1                                                      
2                                                      
3                                                      
4  Komorida was born in Kumamoto Prefecture on Ju...   

                                            response        category  
0  Virgin Australia commenced services on 31 Augu...       closed_qa  
1                                               Tope  classification  
2  Camels use the fat in their humps to keep them...   

In [18]:
# Initialize the TLM instance
tlm = studio.TLM()

df = df.head(10)
df.to_csv('dataset.csv', index=False)
prompts = df['instruction'].tolist()
human_responses = df['response'].tolist()

results = df.copy(deep=True)

# Use batched TLM prompt calls for efficiency
outputs = tlm.prompt(prompts)
results[['tlm_response', 'trustworthiness_score']] = pd.DataFrame(outputs)
results.to_csv('tlm_results.csv', index=False)
print("TLM responses and trustworthiness scores:")
print(results[['instruction', 'response', 'tlm_response', 'trustworthiness_score']].head())


Querying TLM... 100%|██████████|

TLM responses and trustworthiness scores:
                                         instruction  \
0         When did Virgin Australia start operating?   
1           Which is a species of fish? Tope or Rope   
2     Why can camels survive for long without water?   
3  Alice's parents have three daughters: Amy, Jes...   
4                    When was Tomoaki Komorida born?   

                                            response  \
0  Virgin Australia commenced services on 31 Augu...   
1                                               Tope   
2  Camels use the fat in their humps to keep them...   
3            The name of the third daughter is Alice   
4         Tomoaki Komorida was born on July 10,1981.   

                                        tlm_response  trustworthiness_score  
0  Virgin Australia started operating on August 3...               0.866139  
1  Tope is a species of fish. It is also known as...               0.872518  
2  Camels have several adaptations that allow the.




In [16]:

# Sort results by trustworthiness score to identify least trustworthy responses
low_trustworthiness_responses = results.sort_values(by='trustworthiness_score').head()
print("Responses with the lowest trustworthiness scores:")
print(low_trustworthiness_responses[['instruction', 'response', 'tlm_response', 'trustworthiness_score']])
low_trustworthiness_responses.to_csv('low_trustworthiness_responses.csv', index=False)

Responses with the lowest trustworthiness scores:
                                         instruction  \
7   Who gave the UN the land in NY to build their HQ   
4                    When was Tomoaki Komorida born?   
6  Given a reference text about Lollapalooza, whe...   
8                        Why mobile is bad for human   
2     Why can camels survive for long without water?   

                                            response  \
7                                John D Rockerfeller   
4         Tomoaki Komorida was born on July 10,1981.   
6  Lollapalooze is an annual musical festival hel...   
8  We are always engaged one phone which is not g...   
2  Camels use the fat in their humps to keep them...   

                                        tlm_response  trustworthiness_score  
7  The land for the United Nations Headquarters i...               0.685942  
4  I'm sorry, but I couldn't find any information...               0.821744  
6  Lollapalooza is an annual music festiva

In [17]:

# Sort results by trustworthiness score to identify most trustworthy responses
high_trustworthiness_responses = results.sort_values(by='trustworthiness_score', ascending=False).head()
print("Responses with the highest trustworthiness scores:")
print(high_trustworthiness_responses[['instruction', 'response', 'tlm_response', 'trustworthiness_score']])
high_trustworthiness_responses.to_csv('high_trustworthiness_responses.csv', index=False)

Responses with the highest trustworthiness scores:
                                         instruction  \
3  Alice's parents have three daughters: Amy, Jes...   
1           Which is a species of fish? Tope or Rope   
9                       Who was John Moses Browning?   
0         When did Virgin Australia start operating?   
5  If I have more pieces at the time of stalemate...   

                                            response  \
3            The name of the third daughter is Alice   
1                                               Tope   
9  John Moses Browning is one of the most well-kn...   
0  Virgin Australia commenced services on 31 Augu...   
5  No. \nStalemate is a drawn position. It doesn'...   

                                        tlm_response  trustworthiness_score  
3           The name of the third daughter is Alice.               0.946198  
1  Tope is a species of fish. It is also known as...               0.872518  
9  John Moses Browning was an American fi