Importing necessary libraries

In [1]:
import re
import requests
from bs4 import BeautifulSoup
from googletrans import Translator
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
import tkinter as tk
from tkinter import *

Using the Python package Tkinter to develop a GUI (Graphical User Interface). Identifying the GUI screen's size with.geometry() and the title with .title().
Setting the variables 'title_text' and'summary_text' that are replaces for text widgets with 'None'. In Tkinter, text widgets are features that let users to display and control text.

In [2]:
# Initialize Tkinter window
top = tk.Tk()
top.geometry('680x680')
top.title('News Article Summary')

# Define global variables for text widgets
title_text = None
summary_text = None

The below code will add a scale to the screen layout for users to select the number of sentences for the summary from 1 to 3.

In [3]:
# Add a Scale widget to select the number of sentences for the summary
num_sentences_label = Label(top, text='Select number of sentences for summary:', font=('Arial', 12))
num_sentences_label.pack()

num_sentences_scale = Scale(top, from_=1, to=3, orient=HORIZONTAL)
num_sentences_scale.pack()

To reduce the words creating a function to preprocess the text. To lower the text using '.lower()'' and to remove non-alphanumeric characters and extra spaces using regular expression 're' this will help us to remove noise from the text. Tokenizing and analyzing text using natural language processing. To identify if it is a stop word for example the common words like is, the, etc. using lemmatizing which will iterate through each character in the text and extract it's lemma. To combine the lemmatized tokens into a single string which is separated by whitespace using join function this will make the text ready for further analysis.

In [4]:
# Function to preprocess text
def preprocess_text(text):
    # Convert text to lowercase
    text = text.lower()
    # Remove non-alphanumeric characters and extra spaces
    text = re.sub(r'[^\w\s]', '', text)
    text = re.sub(r'\s+', ' ', text)
    return text

To generate the summary from the given text creating a function with user-selected number of sentences. Using parser to create a plaintext parser object using the provided text by giving the tokenizer as english which indicates that the provided text is expected to be english. Setting up a summariser using Latent Semantic Analysis (LSA) it is technique used for summarising textual data and reducing its size. Using 'summarizer’ we will generate specified number of sentences from the given text. At last, the code will join each elements of the summary and then the sentences will be separated by a whitespace.

In [5]:
# Function to generate summary
def generate_summary(text, num_sentences):
    # Initialize parser and tokenizer
    parser = PlaintextParser.from_string(text, Tokenizer('english'))
    summarizer = LsaSummarizer()
    # Generate summary with the specified number of sentences
    summary = summarizer(parser.document, num_sentences)
    return ' '.join(str(sentence) for sentence in summary)

The function will detect the language of the given text. Then to perform text translation task we will use googletrans librarly to generate a 'Translator()' object. Then using Translator object '.detect()' method to identify the language of the given text. The identified language code will be gathered with '.lang'. The function returns the language code for the identified language.

In [6]:
# Function to detect language
def detect_language(text):
    translator = Translator()
    lang = translator.detect(text).lang
    return lang

Using googletrans library to translate the input text to the target language which is English. the translated content can be available by using the ‘text' property.

In [7]:
# Function to translate text
def translate_text(text, target_language):
    translator = Translator()
    translated_text = translator.translate(text, dest=target_language).text
    return translated_text

The function takes text content from a provided URL. It sends an HTTP request to the URL and uses BeautifulSoup to check the HTML response. It joins text content from all paragraph tags in HTML into one big string. Finally, it provides the extracted text content as output.

In [8]:
# Function to fetch content from URL
def fetch_url_content(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    paragraphs = soup.find_all('p')
    article_text = ' '.join([p.get_text() for p in paragraphs])
    return article_text

Developing a function that shows the news article's summary the code takes the text, detects it and translates it in English, and generates the summary after the user chooses the "url" option and enters the news article's URL.

In [9]:
def display_results():
    global title_text, summary_text, author_text
    option = choice_var.get()
    if option == 2:
        url = url_entry.get()
        num_sentences = num_sentences_scale.get()  # Get the selected number of sentences
        if url.strip() == '':
            return  # Do nothing if the input URL is empty

        # Fetch content from the URL
        article_text = fetch_url_content(url)

        # Preprocess and generate summary
        summary = generate_summary(article_text, num_sentences)  # Pass the selected number of sentences
        translated_summary = translate_text(summary, 'en')

        clear_results()

        # Display new results
        title_label.config(text='Title: ')
        title_text.config(state='normal')
        title_text.delete('1.0', END)
        title_text.insert('1.0', 'Title from URL')
        title_text.config(state='disabled')

        summary_label.config(text='Summary: ')
        summary_text.config(state='normal')
        summary_text.delete('1.0', END)
        summary_text.insert('1.0', translated_summary)
        summary_text.config(state='disabled')


Implementing a function for clearly removing previous results from the graphical user interface. Besides allowing users to enter the URL of a news story, the mainloop loop executes the GUI. The GUI will be updated as per the summary displayed in the display result.

In [10]:
# Function to clear results
def clear_results():
    global title_text, summary_text, author_text
    title_text.config(state='normal')
    title_text.delete('1.0', END)
    title_text.config(state='disabled')

    summary_text.config(state='normal')
    summary_text.delete('1.0', END)
    summary_text.config(state='disabled')

    author_text.config(state='normal')
    author_text.delete('1.0', END)
    author_text.config(state='disabled')

choice_var = IntVar()


url_radio = Radiobutton(top, text='URL', variable=choice_var, value=2)
url_radio.pack()

url_label = Label(top, text='Enter URL:', font=('Arial', 12))
url_label.pack()

url_entry = Entry(top, width=50)
url_entry.pack()

# Display button
display_button = Button(top, text='Display', command=display_results)
display_button.pack()

# Widgets for displaying results
title_label = Label(top, text="", font=('Arial', 12, 'bold'))
title_label.pack()

title_text = Text(top, height=2, width=80, wrap='word')
title_text.pack()

summary_label = Label(top, text="", font=('Arial', 12, 'bold'))
summary_label.pack()

summary_text = Text(top, height=15, width=80, wrap='word')
summary_text.pack()

author_label = Label(top, text="", font=('Arial', 12, 'bold'))
author_label.pack()

author_text = Text(top, height=1, width=80, wrap='word')
author_text.pack()

top.mainloop()