<a href="https://colab.research.google.com/github/SmirthikaR/Text-summarization/blob/main/TextSummarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
#URL of the webpage to scrape
url= "https://timesofindia.indiatimes.com/business/india-business/union-budget-2024-expectations-live-updates-economic-survey-finance-minister-nirmala-sitharaman-income-tax-slab/liveblog/111897014.cms"
#Send a GET request to the webpage
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
  #Parse the webpage content
  soup = BeautifulSoup(response.content, "html.parser")
  # Extract all paragraph elements
  paragraphs = soup.find_all('p')

  # Extract text from each paragraph and save as sentences
  sentences = [para.get_text().strip() for para in paragraphs]

  # Save the sentences to a file
  with open("scraped_sentences.txt", "w", encoding="utf-8") as file:
    for sentence in sentences:
      if sentence:
        file.write(sentence + "\n")
  print("Text data has been scraped and saved as sentences.")
else:
  print("Failed to retrieve the webpage. Status code:", response.status_code)


Text data has been scraped and saved as sentences.


In [3]:
import pandas as pd
df=pd.read_csv('/content/scraped_sentences.txt', delimiter='\t')
df.head()

Unnamed: 0,Budget 2024 Live: Key numbers to be watched for the first full Budget of Modi 3.0
0,"Fiscal Deficit: The budgeted fiscal deficit, w..."
1,Economic Survey 2024 Live: BJP leader Gourav V...
2,Budget 2024 Live: Budget for women's welfare a...
3,The Indian government has increased its budget...
4,"In a post on social media platform X, Narendra..."


In [4]:
df.shape

(101, 1)

In [5]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

df = pd.read_csv('/content/scraped_sentences.txt', delimiter='\t', header=None, names=['text'])

def solve(df):
    # Load stopwords
    stopwords1 = set(stopwords.words("english"))

    # The 'text' column now exists
    summaries = []

    for index, row in df.iterrows():
        data = row['text']  # Access the text data for the current row

        # Tokenize the words
        words = word_tokenize(data)
        freqTable = {}
        for word in words:
            word = word.lower()
            if word in stopwords1:
                continue
            if word in freqTable:
                freqTable[word] += 1
            else:
                freqTable[word] = 1

        # Tokenize the sentences
        sentences = sent_tokenize(data)
        sentenceValue = {}
        for sentence in sentences:
            for word, freq in freqTable.items():
                if word in sentence.lower():
                    if sentence in sentenceValue:
                        sentenceValue[sentence] += freq
                    else:
                        sentenceValue[sentence] = freq

        # Calculate the average score for the sentences
        sumValues = sum(sentenceValue.values())
        average = int(sumValues / len(sentenceValue))

        # Generate the summary
        summary = ''
        for sentence in sentences:
            if sentence in sentenceValue and sentenceValue[sentence] > (1.2 * average):
                summary += " " + sentence

        summaries.append(summary.strip())

    # Add summaries to the DataFrame and save to a new CSV file
    df['summary'] = summaries
    df.to_csv('summarized_output.csv', index=False)
    print("Summarization completed and saved to 'summarized_output.csv'.")

# Example usage
solve(df)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Summarization completed and saved to 'summarized_output.csv'.


In [6]:
df.head()

Unnamed: 0,text,summary
0,Budget 2024 Live: Key numbers to be watched fo...,
1,"Fiscal Deficit: The budgeted fiscal deficit, w...",The government has projected fiscal deficit at...
2,Economic Survey 2024 Live: BJP leader Gourav V...,
3,Budget 2024 Live: Budget for women's welfare a...,
4,The Indian government has increased its budget...,


In [7]:
df['summary']

Unnamed: 0,summary
0,
1,The government has projected fiscal deficit at...
2,
3,
4,
...,...
97,I extend my greetings to the countrymen on the...
98,
99,"Some parties doing negative politics, get rid ..."
100,


In [8]:
column_name = 'summary'  # Replace with your column name

# Iterate through the column and print each value
for value in df[column_name]:
    print(value)


The government has projected fiscal deficit at 4.5 per cent of the GDP in FY26.Capital Expenditure: The government's planned capital expenditure for this fiscal year is budgeted at Rs 11.1 lakh crore, higher than Rs 9.5 lakh crore in the last fiscal year. The government has been pushing infrastructure creation and also incentivising states to step up capex.Tax Revenue: The Interim Budget had pegged gross tax revenue at Rs 38.31 lakh crore for 2024-25, an 11.46 per cent growth over the last fiscal. This includes Rs 21.99 lakh crore estimated to come from direct taxes (personal income tax + corporate tax), and Rs 16.22 lakh crore from indirect taxes (customs + excise duty + GST).GST: Goods and Services Tax (GST) collection in 2024-25 is estimated to rise to Rs 10.68 lakh crore, an increase of 11.6 per cent. The tax revenue figures will have to be watched out for in the final Budget for the 2024-25 fiscal year.Borrowing: The government's gross borrowing Budget was Rs 14.13 lakh crore in 

In [9]:
!pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=a70f632b5fdd47919166df3a4d514a2025f2190c4ac9a7bc1a92d8f7d4b63b38
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [10]:
# Evaluation metrix
from rouge_score import rouge_scorer, scoring

# Define the function to calculate ROUGE scores
def calculate_rouge_scores(reference, summary):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(reference, summary)
    return scores
# Calculate ROUGE scores for each pair of reference and summary
df['rouge_scores'] = df.apply(lambda row: calculate_rouge_scores(row['text'], row['summary']), axis=1)

# Display the scores
for idx, row in df.iterrows():
    print(f"Summary {idx + 1} ROUGE scores:")
    print(f"ROUGE-1: {row['rouge_scores']['rouge1']}")
    print(f"ROUGE-2: {row['rouge_scores']['rouge2']}")
    print(f"ROUGE-L: {row['rouge_scores']['rougeL']}")
    print("\n")

Summary 1 ROUGE scores:
ROUGE-1: Score(precision=0.0, recall=0.0, fmeasure=0.0)
ROUGE-2: Score(precision=0.0, recall=0.0, fmeasure=0.0)
ROUGE-L: Score(precision=0, recall=0, fmeasure=0)


Summary 2 ROUGE scores:
ROUGE-1: Score(precision=1.0, recall=0.6492027334851936, fmeasure=0.7872928176795579)
ROUGE-2: Score(precision=0.9964788732394366, recall=0.6461187214611872, fmeasure=0.7839335180055402)
ROUGE-L: Score(precision=1.0, recall=0.6492027334851936, fmeasure=0.7872928176795579)


Summary 3 ROUGE scores:
ROUGE-1: Score(precision=0.0, recall=0.0, fmeasure=0.0)
ROUGE-2: Score(precision=0.0, recall=0.0, fmeasure=0.0)
ROUGE-L: Score(precision=0, recall=0, fmeasure=0)


Summary 4 ROUGE scores:
ROUGE-1: Score(precision=0.0, recall=0.0, fmeasure=0.0)
ROUGE-2: Score(precision=0.0, recall=0.0, fmeasure=0.0)
ROUGE-L: Score(precision=0, recall=0, fmeasure=0)


Summary 5 ROUGE scores:
ROUGE-1: Score(precision=0.0, recall=0.0, fmeasure=0.0)
ROUGE-2: Score(precision=0.0, recall=0.0, fmeasure=0.0)
R

In [12]:
#Abstractive summarization
data=pd.read_csv('/content/scraped_sentences.txt', delimiter='\t')
data.head()
from transformers import pipeline

def summarize_text(text, summarizer, max_length=150):
    # Summarize the text using the summarizer
    summary = summarizer(text, max_length=11, min_length=5, do_sample=False)[0]['summary_text']
    return summary

def solve(data,model='t5-base'):

    # Assuming the text data is in a column named 'text'
    if 'text' not in df.columns:
        print("The dataset must contain a 'text' column.")
        return

    # Load the summarization model
    summarizer = pipeline("summarization", model=model)

    # Summarize each text entry in the column
    df['summary'] = df['text'].apply(lambda x: summarize_text(x, summarizer))

    # Save the summaries to a new CSV file
    output_path = 'summarized_output.csv'
    df.to_csv(output_path, index=False)
    print(f"Summarization completed and saved to '{output_path}'.")

# Example usage
solve(data)


Summarization completed and saved to 'summarized_output.csv'.


In [13]:
data1=pd.read_csv('/content/summarized_output.csv')
data1.head()
column_name = 'summary'  # Replace with your column name

# Iterate through the column and print each value
for value in df[column_name]:
    print(value)

Budget 2024 Live: Key numbers to be watched
fiscal deficit is the difference between government expenditure and income
BJP leader: "the survey that has come
budget for women's welfare and empowerment schemes surge
the budget for women's welfare and empowerment has
in a post on social media platform X
he identifies areas for further growth and
the Economic Survey 2023-24 presents a cherry
GTRI urges govt not to cut
think tank urges government to cut import duty on
Union Budget 2024: stock markets close lower on
the Sensex closed at 80,502.0
tax policies to play major role in tackling income
tax policies will have a major role to play
Modi slams opposition for 
prime minister says some parties have practised "negative
india's contribution to global value chain trade has
CEA: India's contribution to global value
Economic Survey 2024 Live: Chief econ
private capex has recovered after decline seen in 20
CEA's Economic Survey 2024 live:
anantha nageswararan says
Survey highlights govt's cruc