# Presentaion of Our Methods
The topic of our project is to design a metric to evaluate teh quality of summaries given the original text. After discussion and paper reading, we think that a good metric for this task should have the following properties:
1. It should be able to tell a good summary from a bad one (The scores of them should be as different as possible).
2. It should be able to discern varying degrees of factual distortion (Given any two summaries according to the same document, the worse one should be scored lower).
3. It should be able to make evaluation based on the detail of the summary (Give the reason why it makes such an evaluation result).

In this notebook, we will present our methods for the project and the results we obtained. The presentation will be divided into following parts, each part will cover the above three properties:
1. Results of baseline metrics
2. Results of our methods
3. Comparison of our methods with baseline metrics
4. Conclusion

In [20]:
# Importing libraries
import os
import pandas as pd
import numpy as np
import pandas as pd
from tqdm import tqdm
from pipeline import SummaryGrader, NER_comparison, highlight, cos_similariy, Baseline

In [14]:
df_summary = pd.read_csv('falsified_summary.csv', index_col = 0)
df_summary['good_cos_similarity'] = df_summary['bad_cos_similarity'] = df_summary['good_llm_score'] = df_summary['bad_llm_score'] = np.nan
df_summary['good_llm_mismatch'] = df_summary['bad_llm_mismatch'] = ''
df_summary.head()

Unnamed: 0,pdf_link,summary,text_extracted,falsified_summary,falsified_index,good_cos_similarity,bad_cos_similarity,good_llm_score,bad_llm_score,good_llm_mismatch,bad_llm_mismatch
0,https://www.sec.gov//litigation/complaints/200...,CORRECTEDThe Securities and Exchange Commissio...,TRACY L. DAVIS (Cal. Bar No. 184129) Attorne...,CORRECTEDThe Securities and Exchange Commissio...,"[2, 5, 6, 8, 9]",,,,,,
1,https://www.sec.gov//litigation/complaints/200...,The United States Securities and Exchange Comm...,"ELECTRONIC \nDEC 29, 2008 \nSTEVEN M, LARIMORE...",The Canadian Securities and Exchange Commissio...,"[0, 1, 2, 4, 5, 6, 8, 12, 13, 18]",,,,,,
2,https://www.sec.gov//litigation/complaints/200...,"The Securities and Exchange Commission (""Commi...",2006 SEP 30 AN 8: 24 \nU.S: COURT MIBDLE GISTR...,"The Securities and Exchange Commission (""Commi...","[1, 2, 5, 6, 8, 9]",,,,,,
3,https://www.sec.gov//litigation/complaints/200...,"The Securities and Exchange Commission (""Commi...",IN THE UNITED STATES DISTRICT COURT FOR THE MI...,"The Securities and Exchange Commission (""Commi...","[0, 1, 4, 7, 8]",,,,,,
4,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission today f...,08-61524-CIV-DIMITROULEAS/ROSENBAUM \nUNITED S...,The Securities and Exchange Commission today f...,"[0, 1, 3, 4, 6, 7]",,,,,,


## 1. Results of baseline metrics
In our analysis, we employ a variety of established metrics to assess the similarity between two text samples. Our baseline approach incorporates four key metrics, each offering a unique perspective on textual similarity:
  - **Cosine Similarity**: This metric evaluates the cosine of the angle between two vectors, derived from the text embeddings of the respective documents. A smaller angle signifies greater similarity in the semantic content of the texts.
  - **Meteor**: Unlike other metrics that focus solely on lexical similarity, METEOR considers both semantic and syntactic elements, making it highly effective for evaluating translations. It balances precision and recall, and incorporates synonyms and stemming for a more nuanced comparison.
  - **Bleu**: Primarily used in machine translation, BLEU assesses the correspondence of n-grams between the translated and reference texts. It prioritizes precision, evaluating the extent to which n-grams in the translated text appear in the reference text.
  - **Rouge 2**: This metric focuses on the recall aspect, measuring how many bigrams in the reference summary appear in the generated summary. It's particularly useful for evaluating the extent of content coverage.

For each of these metrics, we calculate the score comparing a well-crafted summary with a deliberately falsified summary, both derived from the same source text.


In [None]:
model = Baseline()
baseline_results = df_summary[['summary', 'falsified_summary']]
baseline_results['cos_similarity'] = baseline_results.apply(lambda x: model.cal_cos_similarity(x['summary'],x['falsified_summary']), axis=1)
baseline_results['meteor'] = baseline_results.apply(lambda x: model.cal_meteor_score(x['summary'],x['falsified_summary']), axis=1)
baseline_results['bleu'] = baseline_results.apply(lambda x: model.cal_bleu_score(x['summary'],x['falsified_summary']), axis=1)
baseline_results['rouge2'] = baseline_results.apply(lambda x: model.cal_rouge2_score(x['summary'],x['falsified_summary']), axis=1)

In [23]:
baseline_results.head()

Unnamed: 0,summary,falsified_summary,cos_similarity,meteor,bleu,rouge2
0,CORRECTEDThe Securities and Exchange Commissio...,CORRECTEDThe Securities and Exchange Commissio...,0.997728,0.958831,0.949553,0.925121
1,The United States Securities and Exchange Comm...,The Canadian Securities and Exchange Commissio...,0.963819,0.9723,0.963332,0.946882
2,"The Securities and Exchange Commission (""Commi...","The Securities and Exchange Commission (""Commi...",0.98978,0.973521,0.965338,0.949947
3,"The Securities and Exchange Commission (""Commi...","The Securities and Exchange Commission (""Commi...",0.999462,0.820663,0.855938,0.802348
4,The Securities and Exchange Commission today f...,The Securities and Exchange Commission today f...,0.995189,0.95179,0.940407,0.906122


**Result:** The analysis yielded notably high scores for all metrics, each exceeding 0.8. This outcome indicates a significant limitation of the baseline metrics: they predominantly failed to detect subtle factual inaccuracies embedded within the summaries.

## 2. Results of our methods
We designed two methods to evaluate the quality of summaries.
1. Named entity comparison: we compare the named entities in the summary with the named entities in the original text by `NER_comparison`. It will calculate two ratios, one is the ratio of named entities in the summary that are also in the original text, the other is the ratio of named entities in the original text that are also in the summary. In addition, It also provides a method `.comparison_display()` to highlight the named entities in the summary that are not in the original text, or the named entities in the original text that are not in the summary. This will help users to find the details of the result.
2. Summary grading based on sentence-level checking: we apply LLMs to help us check the consistency between the summary and the original text sentence by sentence through `SummaryGrader`. Its `.process()` method can return the ratio of sentences in the summary that are thought to be consistent with the original text and the list of indices of sentences in the summary that are thought to be inconsistent with the original text. In addition, we can use `highlight()` function to highlight the sentences in the summary that are thought to be inconsistent with the original text. This will help users to find the details of the result.

### 2.1 Named entity comparison

In [None]:
# TO DO just show how to use the class
NER_sample = NER_comparison()
sample_original_text = df_summary.iloc[0,2]
sample_summary = df_summary.iloc[0,1]
NER_sample.process(sample_original_text, sample_summary)

(0.46875, 0.12396694214876036)

### 2.2 Summary grading based on sentence-level checking

In [None]:
# TO DO show the result of the class from three properties above

## 3. Comparison of our methods with baseline metrics
In this section, we will compare our methods with the baseline metrics we used in the first section from the three properties we mentioned at the beginning. We will use the same dataset and the same summaries as in the first section. The results are as follows:

In [None]:
# TO DO
sample_original_text = df_summary.iloc[1,2]
sample_summary = df_summary.iloc[1,1]
sample_falsify_summary = df_summary.iloc[1,3]
cos_similariy(sample_original_text, sample_summary, sample_falsify_summary)

(0.6970304250717163, 0.659481406211853)

In [None]:
os.environ['OPENAI_API_KEY'] = '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'
sg = SummaryGrader()
for index, row in df_summary.iterrows():
    text = row['text_extracted']
    summary = row['summary']
    falsi_summary = row['falsified_summary']
    row['good_cos_similarity'], row['bad_cos_similarity'] = cos_similariy(text, summary, falsi_summary)
    row['good_llm_score'], good_mismatch = sg.evaluate(text, summary, 10)
    row['good_llm_mismatch'] = ','.join(str(e) for e in good_mismatch)
    row['bad_llm_score'], bad_mismatch = sg.evaluate(text, falsi_summary, 10)
    row['bad_llm_mismatch'] = ','.join(str(e) for e in bad_mismatch)
    df_summary.iloc[index,:] = row

In [None]:
df_summary

Unnamed: 0,pdf_link,summary,text_extracted,falsified_summary,falsified_index,good_cos_similarity,bad_cos_similarity,good_llm_score,bad_llm_score,good_llm_mismatch,bad_llm_mismatch
0,https://www.sec.gov//litigation/complaints/200...,CORRECTEDThe Securities and Exchange Commissio...,TRACY L. DAVIS (Cal. Bar No. 184129) Attorne...,CORRECTEDThe Securities and Exchange Commissio...,"[2, 5, 6, 8, 9]",0.650034,0.650034,0.636364,0.454545,28910.0,258910
1,https://www.sec.gov//litigation/complaints/200...,The United States Securities and Exchange Comm...,"ELECTRONIC \nDEC 29, 2008 \nSTEVEN M, LARIMORE...",The Canadian Securities and Exchange Commissio...,"[0, 1, 2, 4, 5, 6, 8, 12, 13, 18]",0.69703,0.659481,0.842105,0.263158,161718.0,12456810121315161718
2,https://www.sec.gov//litigation/complaints/200...,"The Securities and Exchange Commission (""Commi...",2006 SEP 30 AN 8: 24 \nU.S: COURT MIBDLE GISTR...,"The Securities and Exchange Commission (""Commi...","[1, 2, 5, 6, 8, 9]",0.860241,0.831173,0.818182,0.454545,10.0,568910
3,https://www.sec.gov//litigation/complaints/200...,"The Securities and Exchange Commission (""Commi...",IN THE UNITED STATES DISTRICT COURT FOR THE MI...,"The Securities and Exchange Commission (""Commi...","[0, 1, 4, 7, 8]",0.793945,0.757519,0.555556,0.166667,467.0,13467891011
4,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission today f...,08-61524-CIV-DIMITROULEAS/ROSENBAUM \nUNITED S...,The Securities and Exchange Commission today f...,"[0, 1, 3, 4, 6, 7]",0.859681,0.851646,0.666667,0.222222,678.0,134678
5,https://www.sec.gov//litigation/complaints/200...,"On September 30, the Securities and Exchange C...",IN THE UNITED STATES DISTRICT COURT FOR THE EA...,"On October 30, the Securities and Exchange Com...","[0, 1, 3, 6, 7]",0.799057,0.812001,1.0,0.777778,,17
6,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission filed a...,UNITED STATES DISTRICT COURT DISTRICT OF MASSA...,The Federal Trade Commission launched an unres...,"[0, 1, 2, 3, 4, 5]",0.818271,0.738413,0.636364,0.214286,18910.0,123456781213
7,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission today c...,Scott L. Black (Bar Number 514792) \nAttorney ...,The Securities and Exchange Commission today c...,"[6, 7, 8, 9, 10, 11]",0.750517,0.750517,0.5,0.25,23791011.0,236789101112131415
8,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission today c...,oOo OD DH FP WYN \n= er SO \nJOHN M. McCOY III...,The Securities and Exchange Commission today c...,"[1, 2, 3, 4, 5]",0.648084,0.621816,0.375,0.181818,13457.0,1234567810
9,https://www.sec.gov//litigation/complaints/200...,"On October 23, 2008, the United States Securit...",Robert Long \nAttorney for Plaintiff \nU.S. Se...,"On October 23, 2008, the United States Securit...","[7, 8, 9, 10, 11, 12, 13, 14]",0.809035,0.809035,0.866667,0.333333,2.0,237891011121314


In [None]:
df_summary['cos_similarity_percentage'] = (df_summary['good_cos_similarity'] - df_summary['bad_cos_similarity'])/df_summary['good_cos_similarity']
df_summary['llm_score_percentage'] = (df_summary['good_llm_score'] - df_summary['bad_llm_score'])/df_summary['good_llm_score']
df_summary

Unnamed: 0,pdf_link,summary,text_extracted,falsified_summary,falsified_index,good_cos_similarity,bad_cos_similarity,good_llm_score,bad_llm_score,good_llm_mismatch,bad_llm_mismatch,cos_similarity_percentage,llm_score_percentage
0,https://www.sec.gov//litigation/complaints/200...,CORRECTEDThe Securities and Exchange Commissio...,TRACY L. DAVIS (Cal. Bar No. 184129) Attorne...,CORRECTEDThe Securities and Exchange Commissio...,"[2, 5, 6, 8, 9]",0.650034,0.650034,0.636364,0.454545,28910.0,258910,0.0,0.285714
1,https://www.sec.gov//litigation/complaints/200...,The United States Securities and Exchange Comm...,"ELECTRONIC \nDEC 29, 2008 \nSTEVEN M, LARIMORE...",The Canadian Securities and Exchange Commissio...,"[0, 1, 2, 4, 5, 6, 8, 12, 13, 18]",0.69703,0.659481,0.842105,0.263158,161718.0,12456810121315161718,0.05387,0.6875
2,https://www.sec.gov//litigation/complaints/200...,"The Securities and Exchange Commission (""Commi...",2006 SEP 30 AN 8: 24 \nU.S: COURT MIBDLE GISTR...,"The Securities and Exchange Commission (""Commi...","[1, 2, 5, 6, 8, 9]",0.860241,0.831173,0.818182,0.454545,10.0,568910,0.033791,0.444444
3,https://www.sec.gov//litigation/complaints/200...,"The Securities and Exchange Commission (""Commi...",IN THE UNITED STATES DISTRICT COURT FOR THE MI...,"The Securities and Exchange Commission (""Commi...","[0, 1, 4, 7, 8]",0.793945,0.757519,0.555556,0.166667,467.0,13467891011,0.04588,0.7
4,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission today f...,08-61524-CIV-DIMITROULEAS/ROSENBAUM \nUNITED S...,The Securities and Exchange Commission today f...,"[0, 1, 3, 4, 6, 7]",0.859681,0.851646,0.666667,0.222222,678.0,134678,0.009346,0.666667
5,https://www.sec.gov//litigation/complaints/200...,"On September 30, the Securities and Exchange C...",IN THE UNITED STATES DISTRICT COURT FOR THE EA...,"On October 30, the Securities and Exchange Com...","[0, 1, 3, 6, 7]",0.799057,0.812001,1.0,0.777778,,17,-0.0162,0.222222
6,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission filed a...,UNITED STATES DISTRICT COURT DISTRICT OF MASSA...,The Federal Trade Commission launched an unres...,"[0, 1, 2, 3, 4, 5]",0.818271,0.738413,0.636364,0.214286,18910.0,123456781213,0.097593,0.663265
7,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission today c...,Scott L. Black (Bar Number 514792) \nAttorney ...,The Securities and Exchange Commission today c...,"[6, 7, 8, 9, 10, 11]",0.750517,0.750517,0.5,0.25,23791011.0,236789101112131415,0.0,0.5
8,https://www.sec.gov//litigation/complaints/200...,The Securities and Exchange Commission today c...,oOo OD DH FP WYN \n= er SO \nJOHN M. McCOY III...,The Securities and Exchange Commission today c...,"[1, 2, 3, 4, 5]",0.648084,0.621816,0.375,0.181818,13457.0,1234567810,0.040532,0.515152
9,https://www.sec.gov//litigation/complaints/200...,"On October 23, 2008, the United States Securit...",Robert Long \nAttorney for Plaintiff \nU.S. Se...,"On October 23, 2008, the United States Securit...","[7, 8, 9, 10, 11, 12, 13, 14]",0.809035,0.809035,0.866667,0.333333,2.0,237891011121314,0.0,0.615385


In [None]:
df_summary.to_csv('10summary_with_result.csv')

## 4. Conclusion
TO DO