# News Translation and Summarisation using NLP



prepared by Amitha David

Python-based project leveraging NLP techniques for news summarization and translation to foreign language; streamlined NLP pipeline utilizing tools like NLTK, Hugging Face Transformers Library, and the Helsinki-NLP/opus-mt Model for text processing, summarization, and translation.

### Import necessary packages

In [None]:
import transformers
from transformers import pipeline

### Text Preprocessing

In [None]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
import re

# Download NLTK resources (if not already downloaded)
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

def preprocess_text(text):
    # Lowercasing
    text = text.lower()

    # Tokenization
    tokens = word_tokenize(text)

    # Removing special characters, numbers, and punctuation
    tokens = [re.sub(r'[^a-zA-Z]', '', word) for word in tokens if word.isalnum()]

    # Removing stop words
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [word for word in tokens if word not in stop_words]

    # Stemming
    stemmer = PorterStemmer()
    stemmed_tokens = [stemmer.stem(word) for word in filtered_tokens]

    # Lemmatization
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(word) for word in filtered_tokens]

    # Text Normalization (Example: Replacing "it's" with "it is")
    text = re.sub(r"it's", "it is", text)

    # Limit the text to 500 words
    if len(filtered_tokens) > 500:
        filtered_tokens = filtered_tokens[:500]

    preprocessed_text = ' '.join(filtered_tokens)

    return preprocessed_text

### Text Summarisation

In [None]:
def summarize_text(text, max_length=150):
    summarizer = pipeline("summarization")
    summary = summarizer(text, max_length=max_length, min_length=30)
    return summary[0]['summary_text']

### NLP Stages in Translation

In [None]:
def perform_translation(text, source_lang, target_lang):
    model_name = f"Helsinki-NLP/opus-mt-{source_lang}-{target_lang}"
    translator = pipeline("translation", model=model_name)
    translated_text = translator(text)
    return translated_text[0]['translation_text']

### Summarisized Translation in French

In [None]:
def summarize_and_translate(text, source_lang, target_lang):
    # Pre-process the text
    preprocessed_text = preprocess_text(text)

    # Summarize the text
    summarized_text = summarize_text(preprocessed_text)

    # Translate the summarized text into French
    translated_text = perform_translation(preprocessed_text, source_lang, target_lang)

    return translated_text

### Retrieving the news from website

In [None]:
import requests
from bs4 import BeautifulSoup

# Specify the URL of the webpage you want to scrape
url = "https://www.politico.com/news/2023/11/01/gaza-refugee-camp-israeli-strikes-00124706"  # Replace with the URL of the webpage you want to scrape

# Send an HTTP GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content of the webpage using BeautifulSoup
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the HTML elements with the specified class that contain the text content
    # In this case, we're looking for elements with class "story-text__paragraph"
    paragraph_elements = soup.find_all('p', class_='story-text__paragraph')

    # Extract the text content from the selected elements
    text_content = " ".join([p.get_text() for p in paragraph_elements])

    # Print or process the scraped text content
    print(text_content)

    # Store the scraped text in a variable
    news_text = text_content
else:
    print(f"Failed to retrieve content. Status code: {response.status_code}")

# Now, you have the news content in the `news_text` variable, containing text from <p class="story-text__paragraph"> elements.

RAFAH, Gaza Strip — Israeli airstrikes hit apartment buildings in a Gaza refugee camp for the second day in a row Wednesday, Palestinian officials said, as the territory’s only functioning border post opened to allow foreign passport holders to leave for the first time since war broke out over three weeks ago. Al-Jazeera television, one of the few media outlets still reporting from northern Gaza, aired footage of devastation in the Jabaliya camp near Gaza City and of several wounded people, including children, being brought to a nearby hospital. The Hamas-run government said the strikes killed and wounded many people, but the exact toll was not yet known. The Al-Jazeera footage showed nearly identical scenes as the day before, with dozens of men digging through the gray rubble of demolished multistory buildings in search of survivors. The toll from Tuesday’s strikes was also unknown, though the director of a nearby hospital said hundreds were killed or wounded. Israel said those strike

### Main Function

In [None]:
if __name__ == "__main__":
    source_text = news_text
    source_lang = "en"  # Source language code, e.g., "en" for English
    target_lang = "fr"  # Target language code, "fr" for French
    print(news_text)
    translated_text = summarize_and_translate(source_text, source_lang, target_lang)
    print("Summarized and Translated Text in French:")
    print(translated_text)

Israeli airstrikes hit apartment buildings in a Gaza refugee camp for the second day in a row Wednesday, Palestinian officials said, as the territory’s only functioning border post opened to allow foreign passport holders to leave for the first time since war broke out over three weeks ago. Al-Jazeera television, one of the few media outlets still reporting from northern Gaza, aired footage of devastation in the Jabaliya camp near Gaza City and of several wounded people, including children, being brought to a nearby hospital. The Hamas-run government said the strikes killed and wounded many people, but the exact toll was not yet known. The Al-Jazeera footage showed nearly identical scenes as the day before, with dozens of men digging through the gray rubble of demolished multistory buildings in search of survivors. The toll from Tuesday’s strikes was also unknown, though the director of a nearby hospital said hundreds were killed or wounded. Israel said those strikes killed dozens of m

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Summarized and Translated Text in French:
Les frappes aériennes israéliennes ont frappé des immeubles d'appartements dans un camp de réfugiés de Gaza pour le deuxième jour d'affilée, disent les responsables palestiniens. La télévision Al-Jazeera a diffusé des images de dévastation dans le camp de Jabaliya près de Gaza. Le gouvernement dirigé par le Hamas a déclaré que les frappes ont tué et blessé de nombreuses personnes, mais le péage exact n'était pas encore connu.
