# Text summarization

To provide a context for the LLM's, we need to summarize some of the text. Usually LLM's are quite capable of this, but due to their own safeguards and biases, the summaries may not be as non-opinionated.

Key requirement for Klikkikuri is:
- Summary __MUST__ be as non-opinionated as possible, and close to the original text.
- Summary __SHOULD__ distill the essential points of the text.
- Summary __NEEDS__ to contain the most important points of the text.
- Summary __NEEDS__ to contain the entities and relations of the text.
- Summary __SHOULD__ containt the most unique points of the text for the RAG retrieval to be effective.
- Summary __SHOULD__ be as short as possible, but not too short.
- Summarization tool must be able to support multiple languages.

## Common approaches

- Extractive Summarization: Selects and extracts key sentences or phrases from the original text to create a summary. It retains the original wording and structure, making it less prone to introducing biases or opinions.
- Abstractive Summarization: Generates a summary that may not directly quote the original text. It can rephrase and condense information, but it may introduce biases or opinions if not carefully controlled.
- Hybrid Summarization: Combines both extractive and abstractive methods to leverage the strengths of both approaches. It can provide a more comprehensive summary while still maintaining a degree of non-opinionatedness.

Maintaining objectivity requires avoiding the introduction of subjective opinions, interpretations, or biases in the summary. Simultaneously, the summary should focus on the most relevant and unique points and reflect the entities and relationships present in the original text.

Abstactive methods are more prone to introducing errors or "hallucinations" because they generate new text rather than selecting existing sentences. Extractive methods, by relying on the original text, are less likely to introduce such errors but may not capture the essence of the text as effectively.


## Papers
- Abstractive Text Summarization: State of the Art, Challenges, and Improvements: https://arxiv.org/pdf/2409.02413
- A Brief Survey on Text Summarization Methods: https://ijrpr.com/uploads/V5ISSUE3/IJRPR23824.pdf
- Evaluation of Python Text Summarization Libraries: https://rjwave.org/ijedr/papers/IJEDR2101019.pdf
- Text Summarization: A Bibliometric Study and Systematic Literature Review: https://www.iieta.org/download/file/fid/145253
- A Unified Approach to Text Summarization: Classical, Machine Learning, and Deep Learning Methods: https://www.iieta.org/download/file/fid/156413
- An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics: https://arxiv.org/pdf/2207.00939

## Guides
- A-Z Guide to Text Summarization in Python for Beginners: https://www.projectpro.io/article/text-summarization-python-nlp/546
- Comparing Text Summarization Techniques: https://medium.com/@thakermadhav/comparing-text-summarization-techniques-d1e2e465584e
-  LLM Summarization: Getting To Production: https://arize.com/blog/llm-summarization-getting-to-production/

## Evaluation Metrics
From [A Brief Survey on Text Summarization Methods](https://ijrpr.com/uploads/V5ISSUE3/IJRPR23824.pdf):

a. ROUGE (Recall-Oriented Understudy for Gisting Evaluation): ROUGE measures how well the summary covers important
information from the original text. It looks at the overlap between the summary and the reference summaries (i.e., human-generated
summaries). ROUGE calculates several scores, such as ROUGE-N (measuring overlap in n-grams) and ROUGE-L (measuring the
longest common subsequence) [2].
b. BLEU (Bilingual Evaluation Understudy): BLEU evaluates the quality of a summary by comparing it to one or more reference
summaries. It measures how many n-grams (sequences of words) in the summary match those in the reference summaries. BLEU
scores range from 0 to 1, with higher scores indicating better quality summaries [3].
c. METEOR (Metric for Evaluation of Translation with Explicit Ordering): METEOR assesses the overall quality of a summary by
considering both content overlap and surface-level similarity. It takes into account word matching, word order, and stemming.
METEOR scores range from 0 to 1, with higher scores indicating better quality summaries [4].