In [1]:
import torch
import transformers
from transformers import AutoTokenizer
from  langchain import LLMChain, HuggingFacePipeline, PromptTemplate
import pandas as pd
import guidance

## Summary Falsification

In [5]:
df1 = pd.read_csv('../data/sample_input_for_checker1.csv')
df1.head()

Unnamed: 0.1,Unnamed: 0,summary,text
0,0,The United States Securities and Exchange Comm...,NOTICE: Attorneys MUST Indicate All Re-filed C...
1,1,"According to the Commission's complaint, the d...",The Defendants have engaged in a fraudulent Po...
2,2,"Also on December 29, 2008 Judge Donald M. Midd...",NATURE OF SUIT (Place an “x” in One Box Ont 4 ...
3,3,The Commission's complaint alleges that starti...,21. The investment clubs pool investor funds a...
4,4,"As part of the scheme, the defendants direct i...",NOTICE: Attorneys MUST Indicate All Re-filed C...


In [6]:
print(df1.iloc[1]['text'])


The Defendants have engaged in a fraudulent Ponzi scheme primarily targeting the US Haitian community since at least November 2007.This includes net transfers of at least $1.7 million to his personal bank accounts, cash withdrawals of more than $1.5 million and more than $600,000 for apparent personal expenses such as two luxury vehicles, credit card bills, a wedding payment, and a house down payment.21. The investment clubs pool investor funds and send them to Creative Capital for a 90-day period, during which Theodule purportedly trades stocks and options on behalf of the investment club members.Page 2 of 10 
$15.2 million collected from new investors in typical Ponzi scheme fashion.Page 4 of 10 
made millionaires out of a significant number of people in the time it had taken her to decide to invest, and pressured her to liquidate the equity in her home to invest with him.14. Theodule ingratiates himself with investors by claiming he recently decided to offer his investment expertise

In [7]:
print(df1.iloc[1]['summary'])

According to the Commission's complaint, the defendants raised at least $23.4 million from thousands of investors in the Haitian-American community nationwide through a network of purported investment clubs Theodule directs investors to form.


In [6]:
model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)

In [7]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    do_sample=True,
    top_k=10,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    device_map=0,
    temperature=0.9
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [8]:
llm = HuggingFacePipeline(pipeline = pipeline)

In [9]:
falsify_template = """

Given the input text, manipulate its content to produce a totally falsified version. 
Ensure that the falsified text is coherent, grammatically correct, and appears plausible. 
Use dependency-based manipulations such as changing subjects, objects, or inverting relationships to craft the new falsified text.
Answer the falsified text only, explaination is not required.

Input text: ```{reference_summary}```
Falsified text (No explanation required):
"""


In [10]:
falsify_template2 = """

Given the input text, manipulate the content and falsify the all facts to produce a totally falsified version. 
Ensure that the falsified text is coherent, grammatically correct, and appears plausible. 
Answer the falsified text only, explaination is not required.

Input text: ```{reference_summary}```
Falsified text (No explanation required):
"""

In [11]:
falsify_template3 = """

Generate a completely falsified version of the input text by altering the facts presented. The result should be coherent and grammatically correct, while also maintaining a semblance of plausibility. It should not be an outright absurd or impossible scenario but should represent a believable, though untrue, alternative to the actual facts.

Input text: {reference_summary}

Falsified text (No explanation required):
"""


In [8]:
print(df1['summary'].iloc[1])

According to the Commission's complaint, the defendants raised at least $23.4 million from thousands of investors in the Haitian-American community nationwide through a network of purported investment clubs Theodule directs investors to form.


In [9]:
reference_summary = df1['summary'].iloc[1]

In [14]:
falsify_prompt = PromptTemplate(template=falsify_template2, input_variables=["reference_summary"])
llm_chain = LLMChain(llm=llm, prompt=falsify_prompt)

In [15]:
output = llm_chain.run(reference_summary)
print(output)

```According to the Commission's complaint, the defendants raised at least $234 million from thousands of investors in the Haitian-American community nationwide through a network of purported investment clubs Theodule directs investors to form.```


In [16]:
false_summary = output

## Checking by llama2 (without guidance)

In [15]:
template = """

You are a compliance officer who works at a financial institution. You will be provided with a summary sentence and a set of source sentences. 
Check if the summary sentence is a good summary of the source sentences from Named Entity and Named Entity Relationship perspectives.
Please answer either "True" or "False" only, explaination is not needed.

Source sentences: ```{source}```
Summary sentence: ```{summary}```

Final Answer (True/False only):
           """

In [17]:
template2 = """

You are a compliance officer who works at a financial institution. You will be provided with a suspicious summary sentence and a set of broken source sentences from a financial document. 
Clean up the source sentences first and check if the summary sentence follow every standards:
1. The summary sentence can be summarized from source sentences with no factual error especially on numbers.
2. All Name Entities in summary sentence is also in source sentences.
3. All relationships between each entity in summary sentence should exist in source sentences.
4. The directions of all relationships between each name entites in summary sentence should matched up the relationships in source sentences.
5. The summary sentence should not have any factual error compare with source sentences.
6. There should not be any made-up entities in summary sentence.

Answer false if any of the above standards is violated, otherwise answer true.
Please answer either "True" or "False" only, explaination is not needed.

Summary sentence: ```{summary}```

Source sentences: ```{source}```

Final Answer (True/False only):
           """

In [30]:
template3 = """

You are a compliance officer at a financial institution evaluating a summary sentence against source sentences from a financial document. Ensure the summary adheres to these criteria:

1. It accurately represents the source, especially numerical data.
2. It contains only named entities present in the source.
3. It reflects existing relationships between entities as in the source.
4. It preserves the direction of these relationships accurately.
5. It is free of factual errors in comparison with the source.
6. It introduces no fictitious entities.
Your task is to determine if the summary meets all the above standards based solely on the given sentences.

Please respond with "True" or "False" without further explanation.

Source sentences: {source}

Summary sentence: {summary}

Final Answer (True/False):
           """

In [58]:
template4 = """

Evaluate the compliance of a summary sentence derived from a set of sentences in a financial document. Adhere to the following verification standards:
1. Entity consistency: Check that all named entities in the summary are extracted from the source.
2. Relationship verification: Confirm that relationships between entities in the summary are present and correctly depicted in the source.
3. Directionality check: Ensure that the direction of relationships between entities in the summary matches those in the source.
4. Factual integrity: Ascertain that the summary is free from factual errors when compared to the source.
5. Entity authenticity: Confirm that the summary does not create non-existent entities.

Based on these criteria, determine if the summary sentence is a faithful representation of the source sentences. Respond with "True" if the summary complies with all standards, or "False" if it does not.

Source Sentences: {source}

Summary Sentence: {summary}

Final Compliance Verification (True/False):
"""


In [62]:
template5 = """

As a compliance officer, verify the accuracy of summary sentences against the corresponding source sentences from a financial document. The summary should:

1. Reflect the source sentences, allowing for slight variations in numbers and time that do not materially change the information.
2. Contain named entities that match those in the source.
3. Accurately present relationships between entities as they exist in the source.
4. Ensure the direction of relationships between entities aligns with the source.
5. Be free of substantial factual errors in comparison to the source.
6. Not include fictional entities or events.
Determine if the summary sentence is a true representation of the source sentences. Issue a "True" for summaries that comply within a reasonable margin for minor discrepancies, or "False" for those that contain material inaccuracies or fabrications.

Source Sentences: {source}

Summary Sentence: {summary}

Compliance Verification (True/False):
"""


In [18]:
source = df1['text'].iloc[1]
summary = false_summary

In [19]:
print(source)

The Defendants have engaged in a fraudulent Ponzi scheme primarily targeting the US Haitian community since at least November 2007.This includes net transfers of at least $1.7 million to his personal bank accounts, cash withdrawals of more than $1.5 million and more than $600,000 for apparent personal expenses such as two luxury vehicles, credit card bills, a wedding payment, and a house down payment.21. The investment clubs pool investor funds and send them to Creative Capital for a 90-day period, during which Theodule purportedly trades stocks and options on behalf of the investment club members.Page 2 of 10 
$15.2 million collected from new investors in typical Ponzi scheme fashion.Page 4 of 10 
made millionaires out of a significant number of people in the time it had taken her to decide to invest, and pressured her to liquidate the equity in her home to invest with him.14. Theodule ingratiates himself with investors by claiming he recently decided to offer his investment expertise

In [20]:
print(summary)

```According to the Commission's complaint, the defendants raised at least $234 million from thousands of investors in the Haitian-American community nationwide through a network of purported investment clubs Theodule directs investors to form.```


In [21]:
print(df1['summary'].iloc[1])

According to the Commission's complaint, the defendants raised at least $23.4 million from thousands of investors in the Haitian-American community nationwide through a network of purported investment clubs Theodule directs investors to form.


In [22]:
true_summary = df1['summary'].iloc[1]

In [23]:
prompt = PromptTemplate(template=template2, input_variables=["source", "summary"])
llm_chain = LLMChain(prompt=prompt, 
                     llm=llm)

In [24]:
output = llm_chain.run(source=source, summary=summary)
print(output)

 False


In [25]:
output = llm_chain.run(source=source, summary=true_summary)
print(output)

 True


## Checking by llama2 (with guidance)

In [1]:
import torch
import transformers
from transformers import AutoTokenizer
from  langchain import LLMChain, HuggingFacePipeline, PromptTemplate
import pandas as pd
import guidance

In [2]:

guidance.llm = guidance.llms.transformers.LLaMA("meta-llama/Llama-2-7b-chat-hf", temperature=0.9, num_return_sequences=1)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
df1 = pd.read_csv('../data/sample_input_for_checker1.csv')
df1.head()

Unnamed: 0.1,Unnamed: 0,summary,text
0,0,The United States Securities and Exchange Comm...,NOTICE: Attorneys MUST Indicate All Re-filed C...
1,1,"According to the Commission's complaint, the d...",The Defendants have engaged in a fraudulent Po...
2,2,"Also on December 29, 2008 Judge Donald M. Midd...",NATURE OF SUIT (Place an “x” in One Box Ont 4 ...
3,3,The Commission's complaint alleges that starti...,21. The investment clubs pool investor funds a...
4,4,"As part of the scheme, the defendants direct i...",NOTICE: Attorneys MUST Indicate All Re-filed C...


In [4]:
print(df1.iloc[1]['text'])

The Defendants have engaged in a fraudulent Ponzi scheme primarily targeting the US Haitian community since at least November 2007.This includes net transfers of at least $1.7 million to his personal bank accounts, cash withdrawals of more than $1.5 million and more than $600,000 for apparent personal expenses such as two luxury vehicles, credit card bills, a wedding payment, and a house down payment.21. The investment clubs pool investor funds and send them to Creative Capital for a 90-day period, during which Theodule purportedly trades stocks and options on behalf of the investment club members.Page 2 of 10 
$15.2 million collected from new investors in typical Ponzi scheme fashion.Page 4 of 10 
made millionaires out of a significant number of people in the time it had taken her to decide to invest, and pressured her to liquidate the equity in her home to invest with him.14. Theodule ingratiates himself with investors by claiming he recently decided to offer his investment expertise

In [5]:
print(df1.iloc[1]['summary'])

According to the Commission's complaint, the defendants raised at least $23.4 million from thousands of investors in the Haitian-American community nationwide through a network of purported investment clubs Theodule directs investors to form.


In [6]:
false_summary = "According to the Commission's complaint, the defendants raised at least $234 million from thousands of investors in the Haitian-American community nationwide through a network of purported investment clubs Theodule directs investors to form."

In [8]:
program = guidance("""

Evaluate the compliance of a summary sentence derived from a set of sentences in a financial document. Adhere to the following verification standards:
1. Entity consistency: Check that all named entities in the summary are extracted from the source.
2. Relationship verification: Confirm that relationships between entities in the summary are present and correctly depicted in the source.
3. Directionality check: Ensure that the direction of relationships between entities in the summary matches those in the source.
4. Factual integrity: Ascertain that the summary is free from factual errors when compared to the source.
5. Entity authenticity: Confirm that the summary does not create non-existent entities.

Based on these criteria, determine if the summary sentence is a faithful representation of the source sentences. Respond with "True" if the summary complies with all standards, or "False" if it does not.

Summary sentence: ```{{summary}}```

Source sentences: ```{{source}}```

Final Answer: {{#select "answer" logprobs='logprobs'}} True{{or}} False{{/select}}
""")
executed_program = program(summary=false_summary, source=df1.iloc[1]['text'])


In [9]:
executed_program['logprobs']

{' True': -4.0988688468933105, ' False': -0.016730593517422714}

In [10]:
executed_program['answer']

' False'

In [11]:
guidance.llms.transformers.LLaMA.cache.clear()

In [12]:
executed_program = program(summary=df1.iloc[1]['summary'], source=df1.iloc[1]['text'])

In [13]:
executed_program['logprobs']

{' True': -4.031787395477295, ' False': -0.017901869490742735}

In [14]:
executed_program['answer']

' False'

In [7]:
options = ['True', 'False']
program = guidance("""

You are a compliance officer who works at a financial institution. You will be provided with a suspicious summary sentence and a set of broken source sentences from a financial document. 
Clean up the source sentences first and check if the summary sentence follow every standards:
1. The summary sentence can be summarized from source sentences with no factual error especially on numbers.
2. All Name Entities in summary sentence is also in source sentences.
3. All relationships between each entity in summary sentence should exist in source sentences.
4. The directions of all relationships between each name entites in summary sentence should matched up the relationships in source sentences.
5. The summary sentence should not have any factual error compare with source sentences.
6. There should not be any made-up entities in summary sentence.

Answer false if any of the above standards is violated, otherwise answer true.
Please answer either "True" or "False" only, explaination is not needed.

Summary sentence: ```{{summary}}```

Source sentences: ```{{source}}```

Final Answer (True/False only): {{select "ansewer" logprobs='logprobs' options=options}}
""")


In [8]:
executed_program = program(summary=false_summary, source=df1.iloc[1]['text'], options=options)

In [9]:
executed_program['logprobs']

{'True': -0.480266660451889, 'False': -0.9639550447463989}

In [11]:
executed_program['ansewer']

'True'

In [12]:
guidance.llms.transformers.LLaMA.cache.clear()

In [13]:
executed_program = program(summary=df1.iloc[1]['summary'], source=df1.iloc[1]['text'], options=options)

In [14]:
executed_program['logprobs']

{'True': -0.5186071991920472, 'False': -0.9047308564186096}

In [15]:
executed_program['ansewer']

'True'