# Text summarization

Approaches can be roughly categorized into *abstractive* and *extractive* summarization. Extractive summarization works more closely with the input given, *extracting* the most important sentence(s) from a text and making a summary out of that. Abstractive summarization tries to synthesize the text in a more holistic way, producing completely new sentences.

The (extractive) summarization pipeline follows three steps

1. Sentence scoring: Which sentences are the most important?
2. Sentence selection: Which sentences out of 1 carry complementary information?
3. Sentence reformulation: Which material can I reformulate / compress further?

<div class="alert alert-block alert-info"> <b>Discussion.</b> Many approaches are frequency based. That is, they assume that the most important information in a text will appear more frequently in the text. Is this a reasonable assumption?
</div>



# ROUGE-N (Recall-Oriented Understudy for Gisting Evaluation)

ROUGE-N is the n-gram recall between a candidate summary and a set of reference summaries. 

$$\text{ROUGE-N}_{\text{recall}} = \frac{\sum_{S \in \{ \text{Reference Summaries} \}}\sum_{gram_n \in S} \text{Count}_{\text{match}}(gram_n)}{\sum_{S \in \{ \text{Reference Summaries} \}}\sum_{gram_n \in S} \text{Count}(gram_n)},$$

where $n$ is the length of the n-gram, and $\text{Count}_{\text{match}}$ is the maximum number of n-grams co-occurring in a candidate summary and a reference summary $S$.

ROUGE-L is based on the longest common subsequence shared between the reference and the candidate summary (N.B.: the common words are not necessarily consecutive, just in the same sequence).

$$\text{ROUGE-L} = \frac{LCS(S,X)}{m},$$

where $m$ is the length of the reference summary. So if *Government reduces taxes next Monday* is the reference summary and our candidate is *The goverment reduces income taxes starting the following week*, we have a LCS of with "government reduces taxes", so 3 out of a reference summary of 5 (i.e., a ROUGE-L of 0.6).

<div class="alert alert-block alert-info"> <b>Discussion.</b> What are the advantages and weaknesses of ROUGE-N and ROUGE-L?
</div>


# Notes

# For task 2/3

Implement an extractive summarization model that uses Lead-5 together with at least two linguistically motivated compressive capabilities (e.g., with two tree-trimming rules), and compare it to either (i) a fully abstractive summarization model or (ii) another extractive model that uses regression for importance prediction. Explain the main features of your implementation. Evaluate it using ROUGE-L on both the English MLSUM subset and a non-English subset. Discuss the quality of the summarizations in connection to your results within each subset and across.


# For preparation
Read sections 1 and 2.3 of "Recent Advances in Document Summarization" and then the full "MLSUM: The Multilingual Summarization Corpus" paper
https://wanxiaojun.github.io/summ_survey_draft.pdf
https://aclanthology.org/2020.emnlp-main.647.pdf

# For in class

Prepare MLSUM English data to build a model that uses Lead-2, evaluate it using ROUGE-2; and another using regression with importance prediction


regression models for importance prediction [209, 53, 72] -> can we implement this?
We could regress on the ground truth frequencies to "to learn a regression model to
minimize the distance between the ground truth bigram
frequency statistics in the reference summary and the
estimated frequency [97]"