In [1]:
import pandas as pd
import numpy as np
import spacy
import re
from sklearn.feature_extraction.text import TfidfVectorizer

# Load the spaCy English language model
nlp = spacy.load("en_core_web_sm")

In [2]:
# Cell 2
df = pd.read_csv('bbc_news.csv')
print(df.head())
print("Column names:", df.columns.tolist())

                                               title  \
0  Ukraine: Angry Zelensky vows to punish Russian...   
1  War in Ukraine: Taking cover in a town under a...   
2         Ukraine war 'catastrophic for global food'   
3  Manchester Arena bombing: Saffie Roussos's par...   
4  Ukraine conflict: Oil price soars to highest l...   

                         pubDate  \
0  Mon, 07 Mar 2022 08:01:56 GMT   
1  Sun, 06 Mar 2022 22:49:58 GMT   
2  Mon, 07 Mar 2022 00:14:42 GMT   
3  Mon, 07 Mar 2022 00:05:40 GMT   
4  Mon, 07 Mar 2022 08:15:53 GMT   

                                               guid  \
0  https://www.bbc.co.uk/news/world-europe-60638042   
1  https://www.bbc.co.uk/news/world-europe-60641873   
2      https://www.bbc.co.uk/news/business-60623941   
3            https://www.bbc.co.uk/news/uk-60579079   
4      https://www.bbc.co.uk/news/business-60642786   

                                                link  \
0  https://www.bbc.co.uk/news/world-europe-606380...   
1  

In [3]:
def preprocess_text(text):
    text = re.sub(r'\s+', ' ', text)  # Remove extra whitespace
    text = text.strip()  # Strip leading/trailing whitespace
    return text

# Use the correct column name based on your DataFrame's structure
df['cleaned_text'] = df['description'].apply(preprocess_text)  # Replace <correct_column_name> accordingly


In [4]:
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

def extractive_summary(text, num_sentences=3):
    # Tokenize sentences using spaCy
    doc = nlp(text)
    sentences = list(doc.sents)

    # If fewer sentences than requested, adjust num_sentences
    if len(sentences) < num_sentences:
        num_sentences = len(sentences)

    # Create a list of sentence texts for TF-IDF
    sentence_texts = [sentence.text.strip() for sentence in sentences if sentence.text.strip()]

    # Create TF-IDF vectorizer
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(sentence_texts)

    # Calculate the total score for each sentence
    sentence_scores = tfidf_matrix.sum(axis=1).A1  # Sum scores instead of average

    # Filter out sentences with low scores
    threshold = np.percentile(sentence_scores, 50)  # Change percentile as needed
    high_score_indices = np.where(sentence_scores >= threshold)[0]

    # If not enough high-scoring sentences, revert to the original scoring
    if len(high_score_indices) < num_sentences:
        high_score_indices = np.argsort(sentence_scores)[-num_sentences:]  # Get top-ranked sentences

    # Select top sentences based on their scores
    ranked_indices = np.argsort(sentence_scores[high_score_indices])[-num_sentences:]  # Get indices of the top sentences
    ranked_indices.sort()  # Sort indices for ordered output

    # Create the summary
    summary = ' '.join([sentence_texts[i] for i in ranked_indices])
    return summary



# Test the summarization function on an article
example_article = df['cleaned_text'].iloc[0]  # Use the correct column name
summary = extractive_summary(example_article, num_sentences=3)
print("Summary of the example article:")
print(summary)


Summary of the example article:
The Ukrainian president says the country will not forgive or forget those who murder its civilians.


In [5]:
def summarize_external_text(external_text, num_sentences=3):
    return extractive_summary(external_text, num_sentences)

# Example external text for summarization
external_text = """
Artificial intelligence (AI) refers to the simulation of human intelligence in machines programmed to think and learn like humans. These machines are designed to perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI has become an integral part of our daily lives, influencing various sectors such as healthcare, finance, transportation, and entertainment.

The history of AI can be traced back to the mid-20th century, when pioneers like Alan Turing and John McCarthy began exploring the concept of machines capable of intelligent behavior. Turing proposed the Turing Test as a measure of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human. This foundational idea sparked research in computer science and cognitive psychology, leading to the development of algorithms and models that simulate human thought processes.

In recent years, advancements in machine learning—a subset of AI—have propelled the field forward. Machine learning algorithms enable computers to learn from data and improve their performance over time without being explicitly programmed. This has resulted in significant breakthroughs in various applications, including image and speech recognition, natural language processing, and autonomous systems.

One of the most notable applications of AI is in healthcare, where it aids in diagnostics, personalized medicine, and drug discovery. AI algorithms can analyze medical images to detect conditions like tumors and fractures with remarkable accuracy. Additionally, AI-driven chatbots and virtual assistants are transforming patient interactions, providing real-time information and support.

In the financial sector, AI is used for fraud detection, algorithmic trading, and risk management. Financial institutions leverage AI to analyze vast amounts of data, identify patterns, and make informed decisions quickly. This not only enhances operational efficiency but also improves customer experiences.

Transportation is another area significantly impacted by AI, particularly with the rise of autonomous vehicles. Companies like Tesla and Waymo are at the forefront of developing self-driving technology, aiming to create safer and more efficient transportation systems. AI enables these vehicles to perceive their surroundings, make decisions, and navigate complex environments.

Despite the numerous benefits of AI, there are also challenges and ethical concerns associated with its implementation. Issues related to data privacy, algorithmic bias, and job displacement due to automation have sparked debates among policymakers, researchers, and the public. Ensuring that AI technologies are developed and deployed responsibly is crucial for maximizing their potential benefits while minimizing risks.

In conclusion, artificial intelligence represents a transformative force in society, with the potential to enhance productivity, improve quality of life, and solve complex problems. As AI continues to evolve, it will be essential to address the ethical implications and ensure that its advancements benefit all segments of society."""
external_summary = summarize_external_text(external_text, num_sentences=2)
print("Summary of the external text:")
print(external_summary)

Summary of the external text:
Artificial intelligence (AI) refers to the simulation of human intelligence in machines programmed to think and learn like humans. AI has become an integral part of our daily lives, influencing various sectors such as healthcare, finance, transportation, and entertainment.
