# **Objective:**
Create a system that summarizes lengthy articles, blogs, or news into concise summaries.

● **Dataset:** CNN/Daily Mail Dataset

● **Steps:**  
1. Preprocess textual data for summarization.  
2. Implement extractive summarization using libraries like spaCy.  
3. Implement abstractive summarization using pre-trained models like BERT or  GPT with HuggingFace's transformers.  
4. Fine-tune models to improve the quality of summaries.  
5. Test the model on real-world articles and evaluate the summary's coherence.

 ● **Outcome:**  A summarization model capable of generating  
concise summaries from long texts.


In [None]:
# upload dataset

from google.colab import files
uploaded = files.upload()



Saving test.csv to test.csv


In [None]:
# dataset

import pandas as pd

# Assuming the uploaded file is a CSV, replace 'filename.csv' with the actual filename
try:
  df = pd.read_csv('test.csv')
  print(df.head())
except FileNotFoundError:
  print("Error: 'filename.csv' not found. Please upload the correct file.")
except pd.errors.EmptyDataError:
  print("Error: The uploaded file is empty.")
except pd.errors.ParserError:
  print("Error: Could not parse the file. Please ensure it is a valid CSV file.")
except Exception as e:
  print(f"An unexpected error occurred: {e}")


                                         id  \
0  92c514c913c0bdfe25341af9fd72b29db544099b   
1  2003841c7dc0e7c5b1a248f9cd536d727f27a45a   
2  91b7d2311527f5c2b63a65ca98d21d9c92485149   
3  caabf9cbdf96eb1410295a673e953d304391bfbb   
4  3da746a7d9afcaa659088c8366ef6347fe6b53ea   

                                             article  \
0  Ever noticed how plane seats appear to be gett...   
1  A drunk teenage boy had to be rescued by secur...   
2  Dougie Freedman is on the verge of agreeing a ...   
3  Liverpool target Neto is also wanted by PSG an...   
4  Bruce Jenner will break his silence in a two-h...   

                                          highlights  
0  Experts question if  packed out planes are put...  
1  Drunk teenage boy climbed into lion enclosure ...  
2  Nottingham Forest are close to extending Dougi...  
3  Fiorentina goalkeeper Neto has been linked wit...  
4  Tell-all interview with the reality TV star, 6...  


In [None]:
# extractive summarization using libraries like spaCy.

import spacy

# Load a spaCy model (you might need to download it first: python -m spacy download en_core_web_sm)
nlp = spacy.load("en_core_web_sm")

def extractive_summarization(text, num_sentences=3):
  doc = nlp(text)
  sentences = [sent.text.strip() for sent in doc.sents]
  sentence_scores = {}
  for sentence in sentences:
    for token in nlp(sentence):
      if token.pos_ in ["NOUN", "PROPN", "VERB", "ADJ"]:  # Consider important POS tags
        if sentence not in sentence_scores:
          sentence_scores[sentence] = 0
        sentence_scores[sentence] += token.dep_ in ["nsubj", "dobj", "pobj", "ROOT"] #Prioritize important grammatical roles

  # Sort sentences by score and return the top N
  sorted_sentences = sorted(sentence_scores.items(), key=lambda x: x[1], reverse=True)
  summary_sentences = [sentence for sentence, score in sorted_sentences[:num_sentences]]

  return " ".join(summary_sentences)


# Example usage (assuming 'df' is your DataFrame and 'article' is a column containing the text)
if 'article' in df.columns:
    for index, row in df.iterrows():
        article_text = row['article']
        summary = extractive_summarization(article_text)
        print(f"Original Article:\n{article_text[:200]}...\n")  # Print first 200 chars
        print(f"Extractive Summary:\n{summary}\n")
else:
  print("Error: 'article' column not found in DataFrame")


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
andy murray clearly has a vision for how his wedding day will play out when he marries his longterm girlfriend kim sears in his hometown of dunblane on saturday the world no 3 posted on twitter a hilarious series of emojis to his 298 million followers displaying his various plans for the day from the tweet its clear murray expects a day featuring plenty of laughter romance alcohol and even dancing andy murray is delighted after some snooker with friends ross hutchins left and jamie delgado centre the world no 3 posted this humorous tweet in emojis on the morning of his wedding day in dunblane  murrays friend and fellow tennis player jamie delgado also tweeted how the scot has been preparing for the big day with a caption on a twitter selfie pre wedding snooker ended in another victory for guess who andy_murray roscohutchins johnnydelgado  the final preparations for the wedding in dunblanes cathedral on saturday afternoon 

In [None]:
GPT MODEL

!pip install transformers

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Example usage (assuming 'df' is your DataFrame and 'article' is a column containing the text)
if 'article' in df.columns:
    for index, row in df.iterrows():
        article_text = row['article']
        summary = summarizer(article_text, max_length=130, min_length=30, do_sample=False)
        print(f"Original Article:\n{article_text[:200]}...\n")  # Print first 200 chars
        print(f"Abstractive Summary:\n{summary[0]['summary_text']}\n")
else:
  print("Error: 'article' column not found in DataFrame")




Device set to use cpu


Original Article:
ever noticed how plane seats appear to be getting smaller and smaller with increasing numbers of people taking to the skies some experts are questioning if having such packed out planes is putting pas...

Abstractive Summary:
Consumer advisory group set up by the department of transportation said at a public hearing that while the government is happy to set standards for animals flying on planes it doesnt stipulate a minimum amount of space for humans. Many economy seats on united airlines have 30 inches of room while some airlines offer as little as 28 inches. Tests conducted by the faa use planes with a 31 inch pitch a standard which on some airlines has decreased.

Original Article:
a drunk teenage boy had to be rescued by security after jumping into a lions enclosure at a zoo in western india rahul kumar 17 clambered over the enclosure fence at the kamla nehru zoological park in...

Abstractive Summary:
 rahul kumar 17 clambered over the enclosure fence at the kam

IndexError: index out of range in self