# NLPðŸ¤—

To compress the book summaries, I used the language model [`facebook/bart-large-cnn`](https://huggingface.co/facebook/bart-large-cnn?text=I%27ll+give+you+the+summary+of+a+book+and+I+want+you+to+give+me+the+name+of+that+book%3A%0D%0AThe+book+follows+the+journey+of+an+Anabaptist+radical+across+Europe+in+the+first+half+of+the+16th+century+as+he+joins+in+various+movements+and+uprisings+that+come+as+a+result+of+the+Protestant+reformation.+The+book+spans+30+years+as+he+is+pursued+by+%5C%27Q%5C%27+%28short+for+%22Qo%C3%A8let%22%29%2C+a+spy+for+the+Roman+Catholic+Church+cardinal+Giovanni+Pietro+Carafa.+The+main+character%2C+who+changes+his+name+many+times+during+the+story%2C+first+fights+in+the+German+Peasants%5C%27+War+beside+Thomas+M%C3%BCntzer%2C+then+is+in+M%C3%BCnster%5C%27s+siege%2C+during+the+M%C3%BCnster+Rebellion%2C+and+some+years+later%2C+in+Venice) available on Hugging Face ðŸ¤—. BART is a transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function and (2) learning a model to reconstruct the original text. BART is particularly effective when fine-tuned for text generation tasks such as summarization and translation, but it also performs well for comprehension tasks such as text classification and question answering. This specific checkpoint has been fine-tuned on CNN Daily Mail, a large collection of text-summary pairs.

First, I randomly select a book from the dataset and separate its summary section. Because the summary may exceed the maximum number of tokens accepted by the model, I divide the summary into chunks of 200 words each to address this issue. Then, I sequentially feed each of these chunks into the model and concatenate the obtained summaries to achieve the compressed summary.


In [2]:
from transformers import pipeline
from transformers import AutoModel
import pandas as pd

2024-04-27 21:38:20.149827: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-27 21:38:20.715565: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", max_length=120, min_length=30, do_sample=False)

In [4]:
df = pd.read_excel('bookInfo.xlsx')

In [5]:
df

Unnamed: 0,bookName,author,publishDate,genres,summary
0,Animal Farm,George Orwell,1945-08-17,"['Roman', 'Satire', ""Children's literature"", '...","Old Major, the old boar on the Manor Farm, ca..."
1,A Clockwork Orange,Anthony Burgess,1962,"['Science Fiction', 'Novella', 'Speculative fi...","Alex, a teenager living in near-future Englan..."
2,The Plague,Albert Camus,1947,"['Existentialism', 'Fiction', 'Absurdist ficti...",The text of The Plague is divided into five p...
3,An Enquiry Concerning Human Understanding,David Hume,,['Philosophy'],The argument of the Enquiry proceeds by a ser...
4,A Fire Upon the Deep,Vernor Vinge,,"['Hard science fiction', 'Science Fiction', 'S...",The novel posits that space around the Milky ...
...,...,...,...,...,...
16554,Under Wildwood,Colin Meloy,2012-09-25,['Fantasy'],"Prue McKeel, having rescued her brother from ..."
16555,Transfer of Power,Vince Flynn,2000-06-01,"['Thriller', 'Fiction']",The reader first meets Rapp while he is doing...
16556,Decoded,Jay-Z,2010-11-16,['Autobiography'],The book follows very rough chronological ord...
16557,America Again: Re-becoming The Greatness We Ne...,Stephen Colbert,2012-10-02,['Non-Fiction'],Colbert addresses topics including Wall Stree...


In [6]:
random_book = df.sample()

In [7]:
random_book

bookName                                      A Clockwork Orange
author                                           Anthony Burgess
publishDate                                                 1962
genres         ['Science Fiction', 'Novella', 'Speculative fi...
summary         Alex, a teenager living in near-future Englan...
Name: 1, dtype: object

In [8]:
random_summary =  random_book['summary'].iloc[0]

In [9]:
random_summary

' Alex, a teenager living in near-future England, leads his gang on nightly orgies of opportunistic, random "ultra-violence." Alex\'s friends ("droogs" in the novel\'s Anglo-Russian slang, Nadsat) are: Dim, a slow-witted bruiser who is the gang\'s muscle; Georgie, an ambitious second-in-command; and Pete, who mostly plays along as the droogs indulge their taste for ultra-violence. Characterized as a sociopath and a hardened juvenile delinquent, Alex is also intelligent and quick-witted, with sophisticated taste in music, being particularly fond of Beethoven, or "Lovely Ludwig Van." The novel begins with the droogs sitting in their favorite hangout (the Korova Milkbar), drinking milk-drug cocktails, called "milk-plus", to hype themselves for the night\'s mayhem. They assault a scholar walking home from the public library, rob a store leaving the owner and his wife bloodied and unconscious, stomp a panhandling derelict, then scuffle with a rival gang. Joyriding through the countryside in

In [10]:
random_summary = random_summary.replace('.', '.<eos>')
random_summary = random_summary.replace('!', '!<eos>')
random_summary = random_summary.replace('?', '?<eos>')
sentences = random_summary.split('<eos>')

In [11]:
sentences

[' Alex, a teenager living in near-future England, leads his gang on nightly orgies of opportunistic, random "ultra-violence.',
 '" Alex\'s friends ("droogs" in the novel\'s Anglo-Russian slang, Nadsat) are: Dim, a slow-witted bruiser who is the gang\'s muscle; Georgie, an ambitious second-in-command; and Pete, who mostly plays along as the droogs indulge their taste for ultra-violence.',
 ' Characterized as a sociopath and a hardened juvenile delinquent, Alex is also intelligent and quick-witted, with sophisticated taste in music, being particularly fond of Beethoven, or "Lovely Ludwig Van.',
 '" The novel begins with the droogs sitting in their favorite hangout (the Korova Milkbar), drinking milk-drug cocktails, called "milk-plus", to hype themselves for the night\'s mayhem.',
 ' They assault a scholar walking home from the public library, rob a store leaving the owner and his wife bloodied and unconscious, stomp a panhandling derelict, then scuffle with a rival gang.',
 ' Joyriding 

In [12]:
max_chunk = 200
current_chunk = 0 
chunks = []
for sentence in sentences:
    if len(chunks) == current_chunk + 1: 
        if len(chunks[current_chunk]) + len(sentence.split(' ')) <= max_chunk:
            chunks[current_chunk].extend(sentence.split(' '))
        else:
            current_chunk += 1
            chunks.append(sentence.split(' '))
    else:
        print(current_chunk)
        chunks.append(sentence.split(' '))

for chunk_id in range(len(chunks)):
    chunks[chunk_id] = ' '.join(chunks[chunk_id])

0


In [13]:
len(chunks[0].split(' '))

175

In [14]:
condensedSummary = summarizer(chunks)

In [15]:
condensedSummary

[{'summary_text': 'Alex, a teenager living in near-future England, leads his gang on nightly orgies of opportunistic, random "ultra-violence. " Alex\'s friends ("droogs" in the novel\'s Anglo-Russian slang, Nadsat) are: Dim, a slow-witted bruiser who is the gang\'s muscle.'},
 {'summary_text': 'Alex skips school the next day. Alex meets a pair of ten-year-old girls and takes them back to his parents\' flat. Alex administers hard drugs and then rapes them. Georgie challenges Alex for leadership of the gang, demanding that they pull a "man-sized" job.'},
 {'summary_text': "Alex gets a job at the Wing chapel playing religious music. The prison chaplain mistakes Alex's Bible studies for stirrings of faith. Alex agrees to undergo an experimental behaviour-modification treatment called the Ludovico Technique."},
 {'summary_text': 'Alex abases himself before a scantily-clad young woman whose presence has aroused his predatory sexual inclinations. The effectiveness of the technique is demonstr

In [16]:
' '.join([summary['summary_text'] for summary in condensedSummary])

'Alex, a teenager living in near-future England, leads his gang on nightly orgies of opportunistic, random "ultra-violence. " Alex\'s friends ("droogs" in the novel\'s Anglo-Russian slang, Nadsat) are: Dim, a slow-witted bruiser who is the gang\'s muscle. Alex skips school the next day. Alex meets a pair of ten-year-old girls and takes them back to his parents\' flat. Alex administers hard drugs and then rapes them. Georgie challenges Alex for leadership of the gang, demanding that they pull a "man-sized" job. Alex gets a job at the Wing chapel playing religious music. The prison chaplain mistakes Alex\'s Bible studies for stirrings of faith. Alex agrees to undergo an experimental behaviour-modification treatment called the Ludovico Technique. Alex abases himself before a scantily-clad young woman whose presence has aroused his predatory sexual inclinations. The effectiveness of the technique is demonstrated to a group of VIPs, who watch as Alex collapses. F.  Alexander, a critic of th