In [33]:
# Install dependencies
!pip install pandas gspread oauth2client
!pip install --upgrade pip
!pip install --upgrade cleanlab-studio
import time
from IPython import display
display.clear_output()

from cleanlab_studio import Studio

In [34]:
with open('api.txt') as file: 
    key = file.read()
studio = Studio(key)  # Cleanlab Studio API key from https://app.cleanlab.ai/account?tab=General
tlm = studio.TLM()

# Let's first see the results on https://chat.openai.com

Now let's see the same thing programmatically using Open-AI's private API for chatgpt.

In [221]:
# Runs Open-AI GPT-3.5
chatgpt = studio.TLM(quality_preset='base')
chatgpt.prompt("How many Ns are there in the word enter?")

{'response': 'There are two Ns in the word "enter".', 'confidence_score': nan}

In [36]:
# Runs the Cleanlab TLM with confidence reliablity scores
tlm = studio.TLM(quality_preset='best')
tlm.prompt("How many Ns are there in the word enter?")

{'response': 'There are 1 N in the word "enter".',
 'confidence_score': 0.5591737680569699}

In [37]:
# Runs the Cleanlab TLM with confidence reliablity scores
tlm_fast = studio.TLM(quality_preset='low')
tlm_fast.prompt("How many Ns are there in the word enter?")

{'response': 'There are two Ns in the word "enter".',
 'confidence_score': 0.3528202582001686}

# Now let's see reliable data enrichment for solving tasks on documents using Cleanlab. (trustworthiness scores for every output built-in)

We'll see the power of built-in trustworthiness scores for every output when solving for text/document workflows on arbitrary datasets.

# Read in the data

In [105]:
# Prefer spreadsheet for auto-update, but if you have trouble, just pd.read_csv
# Link to dataset: https://docs.google.com/spreadsheets/d/1V7_fOlmixgC70W2TU6s1Kz_N0Vz0lHr29ogt9j5yQMU/edit?usp=sharing

import pandas as pd
import gspread
from oauth2client.service_account import ServiceAccountCredentials
pd.set_option('display.max_colwidth', None)
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name('tlm_demo_credentials.json', scope)
gc = gspread.authorize(credentials)    #use gspread library to extract spredsheet
wks = gc.open("Document Compliance Dataset").sheet1    #Mention the required sheet
df = pd.DataFrame(wks.get_all_values(), columns=["document_text", "issue"])
print(f'Number of documents in this compliance dataset: {len(df)}')

Number of documents in this compliance dataset: 109


# Let's look at some examples of these documents.
* Some of these documents have compliance issues, many don't. This task is often outsourced entirely to experts who review 100% of documents by hand.
* Here we'll automate the entire workflow and provide trustworthiness scores that help you determine what needs human review.

In [99]:
df.sample(10, random_state=0)

Unnamed: 0,document_text,issue
84,"That means there are a limited number of new opportunities for workers. India, which attends the G7 meeting of seven leading industrialised nations on Friday, is unlikely to be cowed by its newcomer status.",none
10,"Although the chip makers registration form includes marketing consent checkboxes, this method of consent is not considered freely-given because the boxes are pre-ticked by default:",GDPR
75,"Nevertheless it was enough to push down the unemployment rate to 5.2%, its lowest level since September 2001.",none
2,"If someone forgets a document on a table somewhere or leaves patient information on their desktop, it might end up getting into the wrong hands. If information is no longer used, we do not guarantee to delete/shred it if the document itself is of no more use. It is possible that the not all PHI information is disposed of after it is no longer needed.",HIPAA
24,"The G7 meeting is thought unlikely to produce any meaningful movement in Chinese policy. In the meantime, the US Federal Reserve's decision on 2 February to boost interest rates by a quarter of a point - the sixth such move in as many months - has opened up a differential with European rates.",none
100,"Nevertheless, 2.2 million Ethiopians will still need emergency assistance.",none
108,About 80% of Ethiopians depend directly or indirectly on agriculture.,none
7,"Letters of recommendation typically qualify as student records. In order to send a letter from a teacher at one school to the registrar at another, you might expect that schools would need signed consent from parents (if students are under 18) or students themselves (if 18 or older) to comply with FERPA. But under section 34 CFR § 99.31 of the Act, there’s an exception for this sort of record sharing. During potential transfers, educational institutions don’t need consent to send letters of recommendation to the destination school. This exception, however, doesn’t apply to sharing letters of recommendation outside of the educational system. “If [a school official] were sending a letter of recommendation to a potential employer, that official would need consent,” Rooker says. “There’s not an exception that lets [school staff] provide information from the student’s record to a potential employer.”",FERPA
16,"AOL, had has mixed fortunes. It lost 464,000 subscribers in the fourth quarter profits were lower than in the preceding three quarters. However, the company said AOL's underlying profit before exceptional items rose 8% on the back of stronger internet advertising revenues. It hopes to increase subscribers by offering the online service free to TimeWarner internet customers and will try to sign up AOL's existing customers for high-speed broadband.",none
86,He objected to subsidies on agriculture that make it hard for developing nations like India to compete.,none


## But in practice, you dont have the "issue". You need to find that out automatically. All you have is this.

In [100]:
df.sample(10, random_state=0)[['document_text']]

Unnamed: 0,document_text
84,"That means there are a limited number of new opportunities for workers. India, which attends the G7 meeting of seven leading industrialised nations on Friday, is unlikely to be cowed by its newcomer status."
10,"Although the chip makers registration form includes marketing consent checkboxes, this method of consent is not considered freely-given because the boxes are pre-ticked by default:"
75,"Nevertheless it was enough to push down the unemployment rate to 5.2%, its lowest level since September 2001."
2,"If someone forgets a document on a table somewhere or leaves patient information on their desktop, it might end up getting into the wrong hands. If information is no longer used, we do not guarantee to delete/shred it if the document itself is of no more use. It is possible that the not all PHI information is disposed of after it is no longer needed."
24,"The G7 meeting is thought unlikely to produce any meaningful movement in Chinese policy. In the meantime, the US Federal Reserve's decision on 2 February to boost interest rates by a quarter of a point - the sixth such move in as many months - has opened up a differential with European rates."
100,"Nevertheless, 2.2 million Ethiopians will still need emergency assistance."
108,About 80% of Ethiopians depend directly or indirectly on agriculture.
7,"Letters of recommendation typically qualify as student records. In order to send a letter from a teacher at one school to the registrar at another, you might expect that schools would need signed consent from parents (if students are under 18) or students themselves (if 18 or older) to comply with FERPA. But under section 34 CFR § 99.31 of the Act, there’s an exception for this sort of record sharing. During potential transfers, educational institutions don’t need consent to send letters of recommendation to the destination school. This exception, however, doesn’t apply to sharing letters of recommendation outside of the educational system. “If [a school official] were sending a letter of recommendation to a potential employer, that official would need consent,” Rooker says. “There’s not an exception that lets [school staff] provide information from the student’s record to a potential employer.”"
16,"AOL, had has mixed fortunes. It lost 464,000 subscribers in the fourth quarter profits were lower than in the preceding three quarters. However, the company said AOL's underlying profit before exceptional items rose 8% on the back of stronger internet advertising revenues. It hopes to increase subscribers by offering the online service free to TimeWarner internet customers and will try to sign up AOL's existing customers for high-speed broadband."
86,He objected to subsidies on agriculture that make it hard for developing nations like India to compete.


# Use the Trustworthy Language Model to solve this task with reliability scores for every output

In [120]:
prompt_template = \
'''What type of compliance issue is most likely present in the following document?
Please restrict your answer to a one word answer and nothing else.
Your answer should be selected from the following options: HIPAA, FERPA, GDPR, none.
Please be as accurate as possible, the world depends on it.\n\nDocument below here:\n\n'''
print(prompt_template + df.at[0, 'document_text'])

What type of compliance issue is most likely present in the following document?
Please restrict your answer to a one word answer and nothing else.
Your answer should be selected from the following options: HIPAA, FERPA, GDPR, none.
Please be as accurate as possible, the world depends on it.

Document below here:

All medical health records will be accessed one way only. The patient's medical data will be stored on unencrypted public servers at the discretion of the enterprise customer.


# Solve this task with TLM and get Cleanlab Trustworthiness scores to build trust with every output.

In [169]:
df['Cleanlab_TLM_response (Compliance Issue)'], df['Cleanlab_Trustworthiness_Score (Compliance Issue)'] = None, None
for i, row in df.iterrows():
    if i < 20: display.display(df.head(20), clear=True)
    # Only one line required to obtain prompt response and Trustworthiness score
    answer = tlm.prompt(prompt_template + row['document_text'])
    # Add results to dataset
    df.at[i, 'Cleanlab_TLM_response (Compliance Issue)'] = answer['response']
    df.at[i, 'Cleanlab_Trustworthiness_Score (Compliance Issue)'] = answer['confidence_score']
    time.sleep(0.1)

Unnamed: 0,document_text,issue,Cleanlab_TLM_response (Compliance Issue),Cleanlab_Trustworthiness_Score (Compliance Issue)
0,All medical health records will be accessed one way only. The patient's medical data will be stored on unencrypted public servers at the discretion of the enterprise customer.,HIPAA,HIPAA,0.820183
1,Patient data will be stored in secure s3 buckets with access only by the company's employees. It is possible that employees who are not authorized to access the data can still access the data.,HIPAA,none,0.595816
2,"If someone forgets a document on a table somewhere or leaves patient information on their desktop, it might end up getting into the wrong hands. If information is no longer used, we do not guarantee to delete/shred it if the document itself is of no more use. It is possible that the not all PHI information is disposed of after it is no longer needed.",HIPAA,HIPAA,0.993623
3,"To ensure you receive timely feedback, our clinicians may work after-hours and use their personal computer to access PHI. We cannot guarantee that their personal compute is located in a secure location.",HIPAA,HIPAA,0.945212
4,"Fayette county public schools office may release student records in certain situations without student consent, including: Accidentally or purposefully emailing student information to unauthorized parties, Sharing a student athlete’s academic status, Sharing a student’s grades or identifying information with unauthorized parties, or including a student’s social security number in shared documents.",FERPA,FERPA,0.993724
5,"The faculty and proceeding council of Marlborough Schoolare is responsible for protecting student records, whether they are stored electronically or in paper form. It certain situations when the board deems appropriate the schools may stored student records even after they are no longer needed.",FERPA,FERPA,0.848827
6,"Schools are obligated to inform parents and students of their rights at least once a year. They are also required to announce any changes to the school’s FERPA policy. We adhere to this policy for the most part, although last year we did not announce when we changed the policy to parents.",FERPA,FERPA,0.999724
7,"Letters of recommendation typically qualify as student records. In order to send a letter from a teacher at one school to the registrar at another, you might expect that schools would need signed consent from parents (if students are under 18) or students themselves (if 18 or older) to comply with FERPA. But under section 34 CFR § 99.31 of the Act, there’s an exception for this sort of record sharing. During potential transfers, educational institutions don’t need consent to send letters of recommendation to the destination school. This exception, however, doesn’t apply to sharing letters of recommendation outside of the educational system. “If [a school official] were sending a letter of recommendation to a potential employer, that official would need consent,” Rooker says. “There’s not an exception that lets [school staff] provide information from the student’s record to a potential employer.”",FERPA,FERPA,0.999724
8,The coach of Minnesota High shared today that the star quarterback of the Minnesota High Buckaneers is not eligible to play because of academic failing. I read this in the local newspaper this morning.,FERPA,none,0.651151
9,"The McDonald's registration form does not give users an opportunity to provide their express and unambiguous consent for marketing communications; In this form, consent is assumed when a user registers for an account",GDPR,GDPR,0.74024


In [159]:
df['OpenAI_LLM_response'] = df['OpenAI_LLM_response'].apply(lambda x: x.lower() if x == 'None' else x)  # Regex fix outputs
print('Increase LLM Reliability using Cleanlab Trustworthiness scores:\n')
print(f"Base Accuracy: {sum(df['OpenAI_LLM_response'] == df['issue']) / len(df):.1%}\t\t\t\t\t({len(df)} examples)")
sdf = df[df['Cleanlab_Trustworthiness_Score'] > 0.8]
print(f"LLM Accuracy when (Trustworthiness > 0.8): {sum(sdf['issue'] == sdf['OpenAI_LLM_response']) / len(sdf):.1%}\t({len(sdf)} examples)")
sdf = df[df['Cleanlab_Trustworthiness_Score'] > 0.9]
print(f"LLM Accuracy when (Trustworthiness > 0.9): {sum(sdf['issue'] == sdf['OpenAI_LLM_response']) / len(sdf):.1%}\t({len(sdf)} examples)")

Increase LLM Reliability using Cleanlab Trustworthiness scores:

Base Accuracy: 96.3%					(109 examples)
LLM Accuracy when (Trustworthiness > 0.8): 98.0%	(99 examples)
LLM Accuracy when (Trustworthiness > 0.9): 100.0%	(90 examples)


# Remember this works for arbitrary tasks. Let's try another task: Automated Stock Analysis

In [222]:
prompt_template2 = \
'''Imagine you are the best stock broker on wall street. You have read the last thirty
years of stock reports and you are the most accurate stock broker in the world at
determining if a publicly traded comany is a buy, hold, or sell.
Based on the information in the document, name the most relevant publicly traded
company and whether you believe the company is a buy, hold, or sell.
Please restrict your answer to the name of a single company, followed by a comma,
followed by one of the following words: buy, hold, sell
Your answer should have no punctuation.
Please be as accurate as possible, the world depends on it.\n\nDocument below here:\n\n'''
print(prompt_template2 + df.at[0, 'document_text'])

Imagine you are the best stock broker on wall street. You have read the last thirty
years of stock reports and you are the most accurate stock broker in the world at
determining if a publicly traded comany is a buy, hold, or sell.
Based on the information in the document, name the most relevant publicly traded
company and whether you believe the company is a buy, hold, or sell.
Please restrict your answer to the name of a single company, followed by a comma,
followed by one of the following words: buy, hold, sell
Your answer should have no punctuation.
Please be as accurate as possible, the world depends on it.

Document below here:

All medical health records will be accessed one way only. The patient's medical data will be stored on unencrypted public servers at the discretion of the enterprise customer.


In [None]:
df['Cleanlab_TLM_response (Stock Analysis)'], df['Cleanlab_Trustworthiness_Score (Stock Analysis)'] = None, None
for i, row in df.iterrows():
    if i <= 20: display.display(df.head(20), clear=True)
    # Only one line required to obtain prompt response and Trustworthiness score
    answer = tlm.prompt(prompt_template2 + row['document_text'])
    # Add results to dataset
    df.at[i, 'Cleanlab_TLM_response (Stock Analysis)'] = answer['response']
    df.at[i, 'Cleanlab_Trustworthiness_Score (Stock Analysis)'] = answer['confidence_score']
    time.sleep(0.1)

Unnamed: 0,document_text,issue,Cleanlab_TLM_response (Compliance Issue),Cleanlab_Trustworthiness_Score (Compliance Issue),Cleanlab_TLM_response (Stock Analysis),Cleanlab_Trustworthiness_Score (Stock Analysis)
0,All medical health records will be accessed one way only. The patient's medical data will be stored on unencrypted public servers at the discretion of the enterprise customer.,HIPAA,HIPAA,0.820183,"Sorry, but I can't generate a relevant answer to this question.",0.673012
1,Patient data will be stored in secure s3 buckets with access only by the company's employees. It is possible that employees who are not authorized to access the data can still access the data.,HIPAA,none,0.595816,"Sorry, but I cannot provide an answer to the question about the most relevant publicly traded company based on the information given.",0.633463
2,"If someone forgets a document on a table somewhere or leaves patient information on their desktop, it might end up getting into the wrong hands. If information is no longer used, we do not guarantee to delete/shred it if the document itself is of no more use. It is possible that the not all PHI information is disposed of after it is no longer needed.",HIPAA,HIPAA,0.993623,"Sorry, but I'm unable to provide you with the information you're looking for.",0.728213
3,"To ensure you receive timely feedback, our clinicians may work after-hours and use their personal computer to access PHI. We cannot guarantee that their personal compute is located in a secure location.",HIPAA,HIPAA,0.945212,"Sorry, but I can't provide the information you're looking for.",0.598899
4,"Fayette county public schools office may release student records in certain situations without student consent, including: Accidentally or purposefully emailing student information to unauthorized parties, Sharing a student athlete’s academic status, Sharing a student’s grades or identifying information with unauthorized parties, or including a student’s social security number in shared documents.",FERPA,FERPA,0.993724,"Unfortunately, the given document does not provide any information that is relevant to determining whether a publicly traded company is a buy, hold, or sell. Therefore, I cannot provide a specific company recommendation based on this information.",0.702643
5,"The faculty and proceeding council of Marlborough Schoolare is responsible for protecting student records, whether they are stored electronically or in paper form. It certain situations when the board deems appropriate the schools may stored student records even after they are no longer needed.",FERPA,FERPA,0.848827,No relevant publicly traded company can be determined from the information provided.,0.633591
6,"Schools are obligated to inform parents and students of their rights at least once a year. They are also required to announce any changes to the school’s FERPA policy. We adhere to this policy for the most part, although last year we did not announce when we changed the policy to parents.",FERPA,FERPA,0.999724,"Sorry, but I can't provide a relevant answer to this question.",0.74219
7,"Letters of recommendation typically qualify as student records. In order to send a letter from a teacher at one school to the registrar at another, you might expect that schools would need signed consent from parents (if students are under 18) or students themselves (if 18 or older) to comply with FERPA. But under section 34 CFR § 99.31 of the Act, there’s an exception for this sort of record sharing. During potential transfers, educational institutions don’t need consent to send letters of recommendation to the destination school. This exception, however, doesn’t apply to sharing letters of recommendation outside of the educational system. “If [a school official] were sending a letter of recommendation to a potential employer, that official would need consent,” Rooker says. “There’s not an exception that lets [school staff] provide information from the student’s record to a potential employer.”",FERPA,FERPA,0.999724,"There is no relevant company mentioned in the document, therefore I cannot provide an answer.",0.806423
8,The coach of Minnesota High shared today that the star quarterback of the Minnesota High Buckaneers is not eligible to play because of academic failing. I read this in the local newspaper this morning.,FERPA,none,0.651151,"Based on the information provided, there is no relevant publicly traded company mentioned in the document. Therefore, I cannot provide a specific name of a company or a recommendation to buy, hold, or sell.",0.719084
9,"The McDonald's registration form does not give users an opportunity to provide their express and unambiguous consent for marketing communications; In this form, consent is assumed when a user registers for an account",GDPR,GDPR,0.74024,"McDonald's, sell",0.308326


In [201]:
df.sort_values(by='Cleanlab_Trustworthiness_Score (Stock Analysis)', ascending=False)[['document_text', 'Cleanlab_TLM_response (Stock Analysis)', 'Cleanlab_Trustworthiness_Score (Stock Analysis)']].head(3)

Unnamed: 0,document_text,Cleanlab_TLM_response (Stock Analysis),Cleanlab_Trustworthiness_Score (Stock Analysis)
18,"For the full-year, TimeWarner posted a profit of $3.36bn, up 27% from its 2003 performance, while revenues grew 6.4% to $42.09bn. ""Our financial performance was strong, meeting or exceeding all of our full-year objectives and greatly enhancing our flexibility,"" chairman and chief executive Richard Parsons said. For 2005, TimeWarner is projecting operating earnings growth of around 5%, and also expects higher revenue and wider profit margins. T","TimeWarner, buy",0.906766
14,"Ad sales boost Time Warner profit Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier. The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales.","Time Warner, buy",0.867871
33,British Airways has blamed high fuel prices for a 40% drop in profits.,"British Airways, sell",0.864192


In [202]:
df.sort_values(by='Cleanlab_Trustworthiness_Score (Stock Analysis)', ascending=True)[['document_text', 'Cleanlab_TLM_response (Stock Analysis)', 'Cleanlab_Trustworthiness_Score (Stock Analysis)']].head(3)

Unnamed: 0,document_text,Cleanlab_TLM_response (Stock Analysis),Cleanlab_Trustworthiness_Score (Stock Analysis)
31,Yukos had filed for bankruptcy protection in a US court in an attempt to prevent the forced sale of its main production arm. The sale went ahead in December and Yugansk was sold to a little-known shell company which in turn was bought by Rosneft.,"Rosneft, sell",0.212752
15,"TimeWarner said fourth quarter sales rose 2% to $11.1bn from $10.9bn. Its profits were buoyed by one-off gains which offset a profit dip at Warner Bros, and less users for AOL. Time Warner said on Friday that it now owns 8% of search-engine Google.","TimeWarner, hold",0.222675
60,"Allied Domecq's big names include Malibu rum, Courvoisier brandy, Stolichnaya vodka and Ballantine's whisky - as well as snack food chains such as Dunkin' Donuts and Baskin-Robbins ice cream.","Allied Domecq, buy",0.237881


# Create marketing titles for each document

In [223]:
prompt_template3 = \
'''Imagine you are the best click-bait title writer in the world. You have been the CMO
for The New York Times and Verve and in all of those years, you learned the secret
to writing titles of online articles that have the highest click through rates among
all your peers. Your titles are so captivating, even people who normally don't click
on news articles can't help but click on yours when they see your headline.
Based on the information in the document, please create a short click-bait title that
is sure to make the article go viral. This title will be posted online and cannot be
longer than 8 words.
Your answer should have no punctuation.
Please be as accurate as possible, the world depends on it.\n\nDocument below here:\n\n'''
print(prompt_template3 + df.at[0, 'document_text'])

Imagine you are the best click-bait title writer in the world. You have been the CMO
for The New York Times and Verve and in all of those years, you learned the secret
to writing titles of online articles that have the highest click through rates among
all your peers. Your titles are so captivating, even people who normally don't click
on news articles can't help but click on yours when they see your headline.
Based on the information in the document, please create a short click-bait title that
is sure to make the article go viral. This title will be posted online and cannot be
longer than 8 words.
Your answer should have no punctuation.
Please be as accurate as possible, the world depends on it.

Document below here:

All medical health records will be accessed one way only. The patient's medical data will be stored on unencrypted public servers at the discretion of the enterprise customer.


In [188]:
df['Cleanlab_TLM_response (Clickbait Title)'], df['Cleanlab_Trustworthiness_Score (Clickbait Title)'] = None, None
for i, row in df.iterrows():
    if i <= 20: display.display(df.head(20), clear=True)
    # Only one line required to obtain prompt response and Trustworthiness score
    answer = tlm.prompt(prompt_template3 + row['document_text'])
    # Add results to dataset
    df.at[i, 'Cleanlab_TLM_response (Clickbait Title)'] = answer['response']
    df.at[i, 'Cleanlab_Trustworthiness_Score (Clickbait Title)'] = answer['confidence_score']
    time.sleep(1)

Unnamed: 0,document_text,issue,Cleanlab_TLM_response (Compliance Issue),Cleanlab_Trustworthiness_Score (Compliance Issue),Cleanlab_TLM_response (Stock Analysis),Cleanlab_Trustworthiness_Score (Stock Analysis),Cleanlab_TLM_response (Clickbait Title),Cleanlab_Trustworthiness_Score (Clickbait Title)
0,All medical health records will be accessed one way only. The patient's medical data will be stored on unencrypted public servers at the discretion of the enterprise customer.,HIPAA,HIPAA,0.820183,"Sorry, but I can't generate a relevant answer to this question.",0.673012,"""Shocking! Your Medical Records Exposed for All!""",0.751675
1,Patient data will be stored in secure s3 buckets with access only by the company's employees. It is possible that employees who are not authorized to access the data can still access the data.,HIPAA,none,0.595816,"Sorry, but I cannot provide an answer to the question about the most relevant publicly traded company based on the information given.",0.633463,Shocking privacy breach: Unauthorized employees accessing patient data!,0.498725
2,"If someone forgets a document on a table somewhere or leaves patient information on their desktop, it might end up getting into the wrong hands. If information is no longer used, we do not guarantee to delete/shred it if the document itself is of no more use. It is possible that the not all PHI information is disposed of after it is no longer needed.",HIPAA,HIPAA,0.993623,"Sorry, but I'm unable to provide you with the information you're looking for.",0.728213,"""Shocking truth: Your private information is at risk!""",0.462355
3,"To ensure you receive timely feedback, our clinicians may work after-hours and use their personal computer to access PHI. We cannot guarantee that their personal compute is located in a secure location.",HIPAA,HIPAA,0.945212,"Sorry, but I can't provide the information you're looking for.",0.598899,"""Shocking Truth: Your Personal Info at Risk!""",0.395253
4,"Fayette county public schools office may release student records in certain situations without student consent, including: Accidentally or purposefully emailing student information to unauthorized parties, Sharing a student athlete’s academic status, Sharing a student’s grades or identifying information with unauthorized parties, or including a student’s social security number in shared documents.",FERPA,FERPA,0.993724,"Unfortunately, the given document does not provide any information that is relevant to determining whether a publicly traded company is a buy, hold, or sell. Therefore, I cannot provide a specific company recommendation based on this information.",0.702643,"""Shocking! School secrets exposed: What you need to know""",0.683183
5,"The faculty and proceeding council of Marlborough Schoolare is responsible for protecting student records, whether they are stored electronically or in paper form. It certain situations when the board deems appropriate the schools may stored student records even after they are no longer needed.",FERPA,FERPA,0.848827,No relevant publicly traded company can be determined from the information provided.,0.633591,"""School Secrets: Records stored forever! Shocking truth revealed!""",0.61586
6,"Schools are obligated to inform parents and students of their rights at least once a year. They are also required to announce any changes to the school’s FERPA policy. We adhere to this policy for the most part, although last year we did not announce when we changed the policy to parents.",FERPA,FERPA,0.999724,"Sorry, but I can't provide a relevant answer to this question.",0.74219,"""Schools' secret policy change shocks parents!""",0.684041
7,"Letters of recommendation typically qualify as student records. In order to send a letter from a teacher at one school to the registrar at another, you might expect that schools would need signed consent from parents (if students are under 18) or students themselves (if 18 or older) to comply with FERPA. But under section 34 CFR § 99.31 of the Act, there’s an exception for this sort of record sharing. During potential transfers, educational institutions don’t need consent to send letters of recommendation to the destination school. This exception, however, doesn’t apply to sharing letters of recommendation outside of the educational system. “If [a school official] were sending a letter of recommendation to a potential employer, that official would need consent,” Rooker says. “There’s not an exception that lets [school staff] provide information from the student’s record to a potential employer.”",FERPA,FERPA,0.999724,"There is no relevant company mentioned in the document, therefore I cannot provide an answer.",0.806423,"""Shocking Exception: Schools Can Share Letters of Recommendation!""",0.612641
8,The coach of Minnesota High shared today that the star quarterback of the Minnesota High Buckaneers is not eligible to play because of academic failing. I read this in the local newspaper this morning.,FERPA,none,0.651151,"Based on the information provided, there is no relevant publicly traded company mentioned in the document. Therefore, I cannot provide a specific name of a company or a recommendation to buy, hold, or sell.",0.719084,"""Star QB benched for shocking reason!""",0.889255
9,"The McDonald's registration form does not give users an opportunity to provide their express and unambiguous consent for marketing communications; In this form, consent is assumed when a user registers for an account",GDPR,GDPR,0.74024,"McDonald's, sell",0.308326,McDonald's Sneaky Marketing Secrets Exposed!,0.694395


In [203]:
df.sort_values(by='Cleanlab_Trustworthiness_Score (Clickbait Title)', ascending=False)[['document_text', 'Cleanlab_TLM_response (Clickbait Title)', 'Cleanlab_Trustworthiness_Score (Clickbait Title)']].head(3)

Unnamed: 0,document_text,Cleanlab_TLM_response (Clickbait Title),Cleanlab_Trustworthiness_Score (Clickbait Title)
100,"Nevertheless, 2.2 million Ethiopians will still need emergency assistance.","""Shocking: 2.2 million Ethiopians in desperate need!""",0.906926
103,"In eastern and southern Ethiopia, a prolonged drought has killed crops and drained wells.","""Devastating drought wreaks havoc in Ethiopia""",0.897601
8,The coach of Minnesota High shared today that the star quarterback of the Minnesota High Buckaneers is not eligible to play because of academic failing. I read this in the local newspaper this morning.,"""Star QB benched for shocking reason!""",0.889255


In [204]:
df.sort_values(by='Cleanlab_Trustworthiness_Score (Clickbait Title)', ascending=True)[['document_text', 'Cleanlab_TLM_response (Clickbait Title)', 'Cleanlab_Trustworthiness_Score (Clickbait Title)']].head(3)

Unnamed: 0,document_text,Cleanlab_TLM_response (Clickbait Title),Cleanlab_Trustworthiness_Score (Clickbait Title)
43,"For the year to March 2005, the total revenue outlook is slightly better than previous guidance with a 3% to 3.5% improvement anticipated, BA chairman Martin Broughton said.","""Unbelievable Revenue Boost - Expert Reveals Game-Changing Secret!""",0.234234
53,"Allied Domecq shares in London rose 4% by 1200 GMT, while Pernod shares in Paris slipped 1.2%.","""Stock Shock: Shares in London Skyrocket while Paris Falls""",0.271785
11,"TechTarget's Cookies Policy includes the following terminology: ""By continuing to use the site, you agree to the use of cookies.""",The Secret to Getting Everyone to Click!,0.282176


# Adding another column: Summarization with a focus on concerns

In [213]:
prompt_template4 = \
'''Imagine you are the chief of staff for the President of the United States.
The President has asked you to give a five word summary of the following
document. It is extremely important that your five word summary is as accurate
as possible. In your summary, include the most likely compliance, policy, security,
or legal issue that would be important to a national leader to be aware of.
Please answer in five words or less. For every word you go over five words, it will cost
the United States government ten trillion dollars and you will be fired. Five words max.
Please be as accurate as possible, the world depends on it.\n\nDocument below here:\n\n'''
print(prompt_template4 + df.at[0, 'document_text'])

Imagine you are the chief of staff for the President of the United States.
The President has asked you to give a five word summary of the following
document. It is extremely important that your five word summary is as accurate
as possible. In your summary, include the most likely compliance, policy, security,
or legal issue that would be important to a national leader to be aware of.
Please answer in five words or less. For every word you go over five words, it will cost
the United States government ten trillion dollars and you will be fired. Five words max.
Please be as accurate as possible, the world depends on it.

Document below here:

All medical health records will be accessed one way only. The patient's medical data will be stored on unencrypted public servers at the discretion of the enterprise customer.


In [224]:
df['Cleanlab_TLM_response (Summarization)'], df['Cleanlab_Trustworthiness_Score (Summarization)'] = None, None
for i, row in df.iterrows():
    if i <= 20: display.display(df.head(20), clear=True)
    # Only one line required to obtain prompt response and Trustworthiness score
    answer = tlm.prompt(prompt_template4 + row['document_text'])
    # Add results to dataset
    df.at[i, 'Cleanlab_TLM_response (Summarization)'] = answer['response']
    df.at[i, 'Cleanlab_Trustworthiness_Score (Summarization)'] = answer['confidence_score']
    time.sleep(1)

Unnamed: 0,document_text,issue,Cleanlab_TLM_response (Compliance Issue),Cleanlab_Trustworthiness_Score (Compliance Issue),Cleanlab_TLM_response (Stock Analysis),Cleanlab_Trustworthiness_Score (Stock Analysis),Cleanlab_TLM_response (Clickbait Title),Cleanlab_Trustworthiness_Score (Clickbait Title),Cleanlab_TLM_response (Summarization),Cleanlab_Trustworthiness_Score (Summarization)
0,All medical health records will be accessed one way only. The patient's medical data will be stored on unencrypted public servers at the discretion of the enterprise customer.,HIPAA,HIPAA,0.820183,"Sorry, but I can't generate a relevant answer to this question.",0.673012,"""Shocking! Your Medical Records Exposed for All!""",0.751675,Medical records on public servers.\nSecurity concern.,0.773085
1,Patient data will be stored in secure s3 buckets with access only by the company's employees. It is possible that employees who are not authorized to access the data can still access the data.,HIPAA,none,0.595816,"Sorry, but I cannot provide an answer to the question about the most relevant publicly traded company based on the information given.",0.633463,Shocking privacy breach: Unauthorized employees accessing patient data!,0.498725,Data breaches can jeopardize security.,0.8329
2,"If someone forgets a document on a table somewhere or leaves patient information on their desktop, it might end up getting into the wrong hands. If information is no longer used, we do not guarantee to delete/shred it if the document itself is of no more use. It is possible that the not all PHI information is disposed of after it is no longer needed.",HIPAA,HIPAA,0.993623,"Sorry, but I'm unable to provide you with the information you're looking for.",0.728213,"""Shocking truth: Your private information is at risk!""",0.462355,"Data security risks, potential PHI breaches.",0.791906
3,"To ensure you receive timely feedback, our clinicians may work after-hours and use their personal computer to access PHI. We cannot guarantee that their personal compute is located in a secure location.",HIPAA,HIPAA,0.945212,"Sorry, but I can't provide the information you're looking for.",0.598899,"""Shocking Truth: Your Personal Info at Risk!""",0.395253,After-hours PHI access raises security concerns.,0.826763
4,"Fayette county public schools office may release student records in certain situations without student consent, including: Accidentally or purposefully emailing student information to unauthorized parties, Sharing a student athlete’s academic status, Sharing a student’s grades or identifying information with unauthorized parties, or including a student’s social security number in shared documents.",FERPA,FERPA,0.993724,"Unfortunately, the given document does not provide any information that is relevant to determining whether a publicly traded company is a buy, hold, or sell. Therefore, I cannot provide a specific company recommendation based on this information.",0.702643,"""Shocking! School secrets exposed: What you need to know""",0.683183,Student records release threatens privacy.,0.810981
5,"The faculty and proceeding council of Marlborough Schoolare is responsible for protecting student records, whether they are stored electronically or in paper form. It certain situations when the board deems appropriate the schools may stored student records even after they are no longer needed.",FERPA,FERPA,0.848827,No relevant publicly traded company can be determined from the information provided.,0.633591,"""School Secrets: Records stored forever! Shocking truth revealed!""",0.61586,"Student records protection, retention, compliance.",0.857709
6,"Schools are obligated to inform parents and students of their rights at least once a year. They are also required to announce any changes to the school’s FERPA policy. We adhere to this policy for the most part, although last year we did not announce when we changed the policy to parents.",FERPA,FERPA,0.999724,"Sorry, but I can't provide a relevant answer to this question.",0.74219,"""Schools' secret policy change shocks parents!""",0.684041,Compliance: FERPA policy announcement missed.,0.818747
7,"Letters of recommendation typically qualify as student records. In order to send a letter from a teacher at one school to the registrar at another, you might expect that schools would need signed consent from parents (if students are under 18) or students themselves (if 18 or older) to comply with FERPA. But under section 34 CFR § 99.31 of the Act, there’s an exception for this sort of record sharing. During potential transfers, educational institutions don’t need consent to send letters of recommendation to the destination school. This exception, however, doesn’t apply to sharing letters of recommendation outside of the educational system. “If [a school official] were sending a letter of recommendation to a potential employer, that official would need consent,” Rooker says. “There’s not an exception that lets [school staff] provide information from the student’s record to a potential employer.”",FERPA,FERPA,0.999724,"There is no relevant company mentioned in the document, therefore I cannot provide an answer.",0.806423,"""Shocking Exception: Schools Can Share Letters of Recommendation!""",0.612641,"Letters of recommendation exempt from consent, except for sharing with potential employers.",0.739869
8,The coach of Minnesota High shared today that the star quarterback of the Minnesota High Buckaneers is not eligible to play because of academic failing. I read this in the local newspaper this morning.,FERPA,none,0.651151,"Based on the information provided, there is no relevant publicly traded company mentioned in the document. Therefore, I cannot provide a specific name of a company or a recommendation to buy, hold, or sell.",0.719084,"""Star QB benched for shocking reason!""",0.889255,QB ineligible due to academic failing.,0.876648
9,"The McDonald's registration form does not give users an opportunity to provide their express and unambiguous consent for marketing communications; In this form, consent is assumed when a user registers for an account",GDPR,GDPR,0.74024,"McDonald's, sell",0.308326,McDonald's Sneaky Marketing Secrets Exposed!,0.694395,Insufficient consent for McDonald's marketing.,0.832714


In [216]:
df.sort_values(by='Cleanlab_Trustworthiness_Score (Summarization)', ascending=False)[['document_text', 'Cleanlab_TLM_response (Summarization)', 'Cleanlab_Trustworthiness_Score (Summarization)']].head(3)

Unnamed: 0,document_text,Cleanlab_TLM_response (Summarization),Cleanlab_Trustworthiness_Score (Summarization)
96,"In the year to March 2004, the Indian economy grew by 8.5%.",Indian economy grew by 8.5%.,0.940415
34,"Reporting its results for the three months to 31 December 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier.",Decreased profit raises economic concerns.,0.939176
73,"According to Labor Department figures, US firms added only 146,000 jobs in January.",Job growth slowed in January.,0.938762


In [217]:
df.sort_values(by='Cleanlab_Trustworthiness_Score (Summarization)', ascending=True)[['document_text', 'Cleanlab_TLM_response (Summarization)', 'Cleanlab_Trustworthiness_Score (Summarization)']].head(3)

Unnamed: 0,document_text,Cleanlab_TLM_response (Summarization),Cleanlab_Trustworthiness_Score (Summarization)
50,"For example, we have taken delivery of six Airbus A321 aircraft and next month we will start further improvements to our Club World flat beds. BA's shares closed up four pence at 274.5 pence.",Delivery of Airbus A321 aircraft.,0.363029
12,"This is the old, previous intro to the Privacy Policy for USA Citizen and Immigration Services. The language is unnecessarily complex and dense",Privacy policy needs simplification.,0.546113
58,"Last year Pernod tried to buy Glenmorangie, one of Scotland's premier whisky firms, but lost out to luxury goods firm LVMH.",Failed attempt to buy whisky.,0.601867
