# Ai Text Summarization

## Method 1 - Pegasus Pre-Trained Model

In [13]:
from bs4 import BeautifulSoup # Pipeline Method
import requests # Pipeline Method
from transformers import PegasusForConditionalGeneration, PegasusTokenizer # Pegasus Method
from transformers import pipeline # Pipeline Method

### 1. Import and Load Pegasus Model

In [2]:
# Load tokenizer 
tokenizer = PegasusTokenizer.from_pretrained("google/pegasus-xsum")
# Load model 
model = PegasusForConditionalGeneration.from_pretrained("google/pegasus-xsum")

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/259 [00:00<?, ?B/s]

### 2. Perform Abstractive Summarization

In [3]:
text = """
The origin of aerospace engineering can be traced back to the aviation pioneers around the late 19th to early 20th centuries, although the work of Sir George Cayley dates from the last decade of the 18th to mid-19th century. One of the most important people in the history of aeronautics[8] and a pioneer in aeronautical engineering,[9] Cayley is credited as the first person to separate the forces of lift and drag, which affect any atmospheric flight vehicle.[10]

Early knowledge of aeronautical engineering was largely empirical, with some concepts and skills imported from other branches of engineering.[11] Some key elements, like fluid dynamics, were understood by 18th-century scientists.[12]

In December 1903, the Wright Brothers performed the first sustained, controlled flight of a powered, heavier-than-air aircraft, lasting 12 seconds. The 1910s saw the development of aeronautical engineering through the design of World War I military aircraft.

Between World Wars I and II, great leaps were made in the field, accelerated by the advent of mainstream civil aviation. Notable airplanes of this era include the Curtiss JN 4, the Farman F.60 Goliath, and Fokker Trimotor. Notable military airplanes of this period include the Mitsubishi A6M Zero, the Supermarine Spitfire and the Messerschmitt Bf 109 from Japan, United Kingdom, and Germany respectively. A significant development in aerospace engineering came with the first operational Jet engine-powered airplane, the Messerschmitt Me 262 which entered service in 1944 towards the end of the second World War.[13]

The first definition of aerospace engineering appeared in February 1958,[4] considering the Earth's atmosphere and outer space as a single realm, thereby encompassing both aircraft (aero) and spacecraft (space) under the newly coined term aerospace.

In response to the USSR launching the first satellite, Sputnik, into space on October 4, 1957, U.S. aerospace engineers launched the first American satellite on January 31, 1958. The National Aeronautics and Space Administration was founded in 1958 as a response to the Cold War. In 1969, Apollo 11, the first human space mission to the moon took place. It saw three astronauts enter orbit around the Moon, with two, Neil Armstrong and Buzz Aldrin, visiting the lunar surface. The third astronaut, Michael Collins, stayed in orbit to rendezvous with Armstrong and Aldrin after their visit.[14]

An important innovation came on January 30, 1970, when the Boeing 747 made its first commercial flight from New York to London. This aircraft made history and became known as the "Jumbo Jet" or "Whale"[15] due to its ability to hold up to 480 passengers.[16]

Another significant development in aerospace engineering came in 1976, with the development of the first passenger supersonic aircraft, the Concorde. The development of this aircraft was agreed upon by the French and British on November 29, 1962.[17]

On December 21, 1988, the Antonov An-225 Mriya cargo aircraft commenced its first flight. It holds the records for the world's heaviest aircraft, heaviest airlifted cargo, and longest airlifted cargo, and has the widest wingspan of any aircraft in operational service.[18]

On October 25, 2007, the Airbus A380 made its maiden commercial flight from Singapore to Sydney, Australia. This aircraft was the first passenger plane to surpass the Boeing 747 in terms of passenger capacity, with a maximum of 853. Though development of this aircraft began in 1988 as a competitor to the 747, the A380 made its first test flight in April 2005."""

In [4]:
# Create tokens - number representation of our text
tokens = tokenizer(text, truncation = True, padding = "longest", return_tensors = "pt")

In [5]:
# Input tokens
tokens

{'input_ids': tensor([[  139,  5679,   113, 16902,  2487,   137,   129, 19784,   247,   112,
           109, 10720, 21050,   279,   109,  1095,  1925,   307,   112,   616,
           599,   307,  6468,   108,  1670,   109,   201,   113,  6381,  2584,
         33087,  2858,  3060,   135,   109,   289,  3496,   113,   109,  1204,
           307,   112,  2104, 11545,   307,  1902,   107,   614,   113,   109,
           205,   356,   200,   115,   109,   689,   113, 31839, 30651, 19473,
          4101,  2000,  1100,   111,   114, 12649,   115, 77181,  2487,   108,
          4101, 76207, 33087,  2858,   117, 12833,   130,   109,   211,   465,
           112,  1910,   109,  3062,   113,  3811,   111,  6329,   108,   162,
          2384,   189, 14387,  2315,  1143,   107, 65077, 73950,  6236,   825,
           113, 77181,  2487,   140,  4318, 18976,   108,   122,   181,  3924,
           111,   766,  9156,   135,   176,  6106,   113,  2487,   107, 65077,
         49040,  1027,   662,  1811,  

In [8]:
# Summarize 
summary = model.generate(**tokens)

In [9]:
# Output summary tokens
summary[0]

tensor([    0, 25029,  2487,   117,   114,  4444,   113,  2487,   120,  2194,
          122,   109,   354,   111,  6785,   113,  3992,   111, 18328,   107,
            1])

### 3. PEGASUS Summarized Output

In [10]:
# Decode summary
tokenizer.decode(summary[0])

'<pad>Aerospace engineering is a branch of engineering that deals with the design and manufacture of aircraft and spacecraft.</s>'

## Method 2 - Pipeline Method

### 4. Pipeline Loading

In [14]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

### 5. Get Blog Post

In [15]:
#URL = "https://techcrunch.com/2023/04/25/hugging-face-releases-its-own-version-of-chatgpt/"
URL = "https://time.com/6273694/ai-regulation-europe/?utm_source=roundup&utm_campaign=20230202"

In [16]:
r = requests.get(URL)

In [17]:
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all(['h1','h2','p'])
text = [result.text for result in results]
ARTICLE = ' '.join(text)

In [18]:
ARTICLE

"Big Tech Is Already Lobbying to Water Down Europe's AI Rules European lawmakers are putting their finishing touches on a set of wide-ranging rules designed to govern the use of artificial intelligence that, if passed, would make the E.U. the first major jurisdiction outside of China to pass targeted AI regulation. That has made the forthcoming legislation the subject of fierce debate and lobbying, with opposing sides battling to ensure that its scope is either widened or narrowed. Lawmakers are close to agreeing on a draft version of the law, the Financial Times reported last week. After that, the law will progress to negotiations between the bloc’s member states and executive branch.  The E.U. Artificial Intelligence Act is likely to ban controversial uses of AI like social scoring and facial recognition in public, as well as force companies to declare if copyrighted material is used to train their AIs.  The rules could set a global bar for how companies build and deploy their AI sys

### 6. Chunk Text

In [19]:
max_chunk = 500

In [20]:
ARTICLE = ARTICLE.replace('.', '.<eos>')
ARTICLE = ARTICLE.replace('?', '?<eos>')
ARTICLE = ARTICLE.replace('!', '!<eos>')

In [21]:
sentences = ARTICLE.split('<eos>')
current_chunk = 0 
chunks = []
for sentence in sentences:
    if len(chunks) == current_chunk + 1: 
        if len(chunks[current_chunk]) + len(sentence.split(' ')) <= max_chunk:
            chunks[current_chunk].extend(sentence.split(' '))
        else:
            current_chunk += 1
            chunks.append(sentence.split(' '))
    else:
        print(current_chunk)
        chunks.append(sentence.split(' '))

for chunk_id in range(len(chunks)):
    chunks[chunk_id] = ' '.join(chunks[chunk_id])

0


In [22]:
len(chunks)

3

### 7. Summarize Text

In [23]:
res = summarizer(chunks, max_length = 120, min_length = 30, do_sample = False)

In [24]:
res[0]

{'summary_text': " European lawmakers are putting the finishing touches on a set of wide-ranging rules designed to govern the use of artificial intelligence . The E. U.  Artificial Intelligence Act is likely to ban controversial uses of AI like social scoring and facial recognition in public . The rules could set a global bar for how companies build and deploy their AI systems . One of the Act's most contentious points is whether so-called “general purpose AI” should be considered high-risk ."}

In [25]:
' '.join([summ['summary_text'] for summ in res])

" European lawmakers are putting the finishing touches on a set of wide-ranging rules designed to govern the use of artificial intelligence . The E. U.  Artificial Intelligence Act is likely to ban controversial uses of AI like social scoring and facial recognition in public . The rules could set a global bar for how companies build and deploy their AI systems . One of the Act's most contentious points is whether so-called “general purpose AI” should be considered high-risk .  Big Tech companies like Google and Microsoft, which have plowed billions of dollars into AI, are arguing against the proposals . Lobbyists have argued that it is only when general purpose AIs are applied to “high risk” use cases that they become dangerous . Categorizing general-purpose AI systems as ‘high risk,” Google argued, could harm consumers and hamper innovation .  Some argue that the E. U.  has placed itself in a bind by structuring the AI Act in an outdated fashion . Under the prevailing business model t

In [26]:
text = ' '.join([summ['summary_text'] for summ in res])

### 8. PIPELINE Summarized Output

In [27]:
text

" European lawmakers are putting the finishing touches on a set of wide-ranging rules designed to govern the use of artificial intelligence . The E. U.  Artificial Intelligence Act is likely to ban controversial uses of AI like social scoring and facial recognition in public . The rules could set a global bar for how companies build and deploy their AI systems . One of the Act's most contentious points is whether so-called “general purpose AI” should be considered high-risk .  Big Tech companies like Google and Microsoft, which have plowed billions of dollars into AI, are arguing against the proposals . Lobbyists have argued that it is only when general purpose AIs are applied to “high risk” use cases that they become dangerous . Categorizing general-purpose AI systems as ‘high risk,” Google argued, could harm consumers and hamper innovation .  Some argue that the E. U.  has placed itself in a bind by structuring the AI Act in an outdated fashion . Under the prevailing business model t