In [7]:
# STEP 1: Load and inspect dataset
import pandas as pd

# Load the dataset
df = pd.read_csv("test.csv")

# Display first few rows
print("Dataset Preview:")
print(df.head())

# Check columns and missing values
print("\nColumn Info:")
print(df.info())

print("\nMissing Values:")
print(df.isnull().sum())

Dataset Preview:
                                         id  \
0  92c514c913c0bdfe25341af9fd72b29db544099b   
1  2003841c7dc0e7c5b1a248f9cd536d727f27a45a   
2  91b7d2311527f5c2b63a65ca98d21d9c92485149   
3  caabf9cbdf96eb1410295a673e953d304391bfbb   
4  3da746a7d9afcaa659088c8366ef6347fe6b53ea   

                                             article  \
0  Ever noticed how plane seats appear to be gett...   
1  A drunk teenage boy had to be rescued by secur...   
2  Dougie Freedman is on the verge of agreeing a ...   
3  Liverpool target Neto is also wanted by PSG an...   
4  Bruce Jenner will break his silence in a two-h...   

                                          highlights  
0  Experts question if  packed out planes are put...  
1  Drunk teenage boy climbed into lion enclosure ...  
2  Nottingham Forest are close to extending Dougi...  
3  Fiorentina goalkeeper Neto has been linked wit...  
4  Tell-all interview with the reality TV star, 6...  

Column Info:
<class 'pandas.core

In [9]:
# Example: take the first article from the dataset
text = df['article'][0]
print("Original Article:\n", text)

Original Article:
 Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk. They say that the shrinking space on aeroplanes is not only uncomfortable - it's putting our health and safety in danger. More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans. 'In a world where animals have more rights to space and food than humans,' said Charlie Leocha, consumer representative on the committee. 'It is time that the DOT and FAA take a stand for humane treatment of passengers.' But could crowding on planes lead to more serious issues than 

In [11]:
# STEP 2: Preprocess text
import nltk
import re

# Download punkt if not already
nltk.download('punkt')

# Split the article into sentences
sentences = nltk.sent_tokenize(text)

# Clean sentences (remove special chars, numbers, etc.)
clean_sentences = [re.sub(r'[^a-zA-Z]', ' ', s) for s in sentences]
clean_sentences = [s.lower() for s in clean_sentences]

print("\nTotal Sentences:", len(clean_sentences))
print("\nSample Sentence:", clean_sentences[0])


Total Sentences: 16

Sample Sentence: ever noticed how plane seats appear to be getting smaller and smaller 


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\hp\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [13]:
# STEP 3: Sentence Vectors using TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
sentence_vectors = vectorizer.fit_transform(clean_sentences)

print("\nSentence Vectors Shape:", sentence_vectors.shape)


Sentence Vectors Shape: (16, 177)


In [15]:
# STEP 4: Build the Similarity Graph
from sklearn.metrics.pairwise import cosine_similarity
import networkx as nx

# Compute similarity matrix
similarity_matrix = cosine_similarity(sentence_vectors)

# Create graph
similarity_graph = nx.Graph()

# Add edges between sentences based on similarity
for i in range(len(sentences)):
    for j in range(i + 1, len(sentences)):
        if similarity_matrix[i][j] > 0:
            similarity_graph.add_edge(i, j, weight=similarity_matrix[i][j])

print("\nGraph created with", len(similarity_graph.nodes()), "nodes and", len(similarity_graph.edges()), "edges.")


Graph created with 16 nodes and 114 edges.


In [17]:
# STEP 5: Run PageRank
scores = nx.pagerank(similarity_graph)

# Rank sentences based on scores
ranked_sentences = sorted(((scores[i], s) for i, s in enumerate(sentences)), reverse=True)

print("\nTop 3 Sentences (Preview):")
for i in range(3):
    print("-", ranked_sentences[i][1])


Top 3 Sentences (Preview):
- But these tests are conducted using planes with 31 inches between each row of seats, a standard which on some airlines has decreased, reported the Detroit News.
- Tests conducted by the FAA use planes with a 31 inch pitch, a standard which on some airlines has decreased .
- While United Airlines has 30 inches of space, Gulf Air economy seats have between 29 and 32 inches, Air Asia offers 29 inches and Spirit Airlines offers just 28 inches.


In [19]:
# STEP 6: Extract Summary
num_sentences = 5  # number of sentences to include in summary

# Pick top-ranked sentences
top_sentences = sorted(ranked_sentences[:num_sentences], key=lambda x: sentences.index(x[1]))

# Combine them into a summary
summary = ' '.join([s for _, s in top_sentences])

print("\n--- FINAL SUMMARY ---\n")
print(summary)

# Optional: Compare with actual highlight
print("\n--- REFERENCE SUMMARY ---\n")
print(df['highlights'][0])


--- FINAL SUMMARY ---

Tests conducted by the FAA use planes with a 31 inch pitch, a standard which on some airlines has decreased . Many economy seats on United Airlines have 30 inches of room, while some airlines offer as little as 28 inches . But these tests are conducted using planes with 31 inches between each row of seats, a standard which on some airlines has decreased, reported the Detroit News. While United Airlines has 30 inches of space, Gulf Air economy seats have between 29 and 32 inches, Air Asia offers 29 inches and Spirit Airlines offers just 28 inches. British Airways has a seat pitch of 31 inches, while easyJet has 29 inches, Thomson's short haul seat pitch is 28 inches, and Virgin Atlantic's is 30-31.

--- REFERENCE SUMMARY ---

Experts question if  packed out planes are putting passengers at risk .
U.S consumer advisory group says minimum space must be stipulated .
Safety tests conducted on planes with more leg room than airlines offer .
