### Introduction

In this notebook, I clustered the news headlines for each town in order to obtain the top 5 most representative headlines for each town

In [1]:
import numpy as np
import pandas as pd
import ast
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

In [2]:
df = pd.read_csv('straitstimes_articles_processed.csv')
df['Title Processed'] = df['Title Processed'].apply(ast.literal_eval)
df['Article Processed'] = df['Article Processed'].apply(ast.literal_eval)
df

Unnamed: 0,Published Date,Published Date (Standardised),Title,Article,Links,Published Year,Category,Town Involved,Title Processed,Article Processed
0,"January 2, 2013 at 7:49 PM",2013-01-02,Teo Ho Pin: Contract with AIM benefits town co...,The PAP town councils sold the management soft...,https://www.straitstimes.com/singapore/teo-ho-...,2013,,Hougang,"[teo, ho, pin, contract, aim, benefit, town, c...","[pap, town, council, sell, management, softwar..."
1,"January 2, 2013 at 5:10 PM",2013-01-02,Man jailed for punching friend's boyfriend,A salesman was jailed on Wednesday for a month...,https://www.straitstimes.com/singapore/man-jai...,2013,,Punggol,"[man, jail, punch, friend, boyfriend]","[salesman, jail, wednesday, month, punch, 20, ..."
2,"January 2, 2013 at 11:09 AM",2013-01-02,"Man jailed, fined and banned over fatal accident",A truck driver who failed to help a severely i...,https://www.straitstimes.com/singapore/man-jai...,2013,,Ang mo kio,"[man, jail, fin, ban, fatal, accident]","[truck, driver, fail, help, severely, injured,..."
3,"January 3, 2013 at 5:39 PM",2013-01-03,Money mule jailed 4 1/2 years for laundering $...,"In April, US$2.7 million (S$3.3 million) was d...",https://www.straitstimes.com/singapore/money-m...,2013,,Punggol,"[money, mule, jail, 4, 1, 2, year, launder, 85...","[april, us, 2, 7, million, 3, 3, million, draw..."
4,"January 3, 2013 at 5:37 PM",2013-01-03,Two men found hiding in car boot arrested for ...,Two Indian nationals have been arrested for tr...,https://www.straitstimes.com/singapore/two-men...,2013,,Woodlands,"[two, men, find, hiding, car, boot, arrest, il...","[two, indian, national, arrest, try, slip, imm..."
...,...,...,...,...,...,...,...,...,...,...
5456,"October 4, 2023 at 7:16 PM",2023-10-04,Parliament debates on what constitutes basic n...,SINGAPORE - MPs debated what constitutes basic...,https://www.straitstimes.com/singapore/politic...,2023,politics,Sengkang,"[parliament, debate, constitute, basic, need, ...","[singapore, mp, debate, constitute, basic, nee..."
5457,"October 4, 2023 at 6:30 PM",2023-10-04,Workers’ Party’s Louis Chua seeks to enshrine ...,SINGAPORE - Enshrining flexible work arrangeme...,https://www.straitstimes.com/singapore/politic...,2023,politics,Sengkang,"[worker, party, louis, chua, seek, enshrine, f...","[singapore, enshrine, flexible, work, arrangem..."
5458,"October 25, 2023 at 11:05 AM",2023-10-25,$2.6b project to refresh Singapore’s oldest MR...,SINGAPORE – A $2.6 billion programme to renew ...,https://www.straitstimes.com/singapore/transpo...,2023,transport,Bishan,"[2, 6b, project, refresh, singapore, old, mrt,...","[singapore, 2, 6, billion, programme, renew, n..."
5459,"October 25, 2023 at 3:45 PM",2023-10-25,"Porsche, 2 Rolls-Royces among 4 cars seized fr...",SINGAPORE – Four luxury cars linked to Singapo...,https://www.straitstimes.com/singapore/courts-...,2023,courts-crime,Bukit timah,"[porsche, 2, rolls, royces, among, 4, car, sei...","[singapore, four, luxury, car, link, singapore..."


In [3]:
df['Title Processed'] = df['Title Processed'].apply(lambda x: ' '.join(x))

### Get Top 5 Representative Headlines from K-Means Clustering

In [4]:
town_representative_headlines = pd.DataFrame(columns=['Town', 'Representative Headlines'])
town_representative_headlines

Unnamed: 0,Town,Representative Headlines


In [5]:
for town in df['Town Involved'].unique():
    # Filter df to just the town
    town_df = df[df['Town Involved'] == town][['Title','Title Processed']].reset_index(drop=True)
    
    # Get tf-idf matrix
    tfidf_vectorizer = TfidfVectorizer(max_df = 0.9)
    tfidf_matrix = tfidf_vectorizer.fit_transform(town_df['Title Processed'])
    
    # Perform K-Means Clustering
    k = 5
    kmeans = KMeans(n_clusters=k, n_init='auto', random_state=101)
    labels = kmeans.fit_predict(tfidf_matrix)
    
    # Get headline that's closest to each cluster center
    representative_headlines = []
    cluster_center = kmeans.cluster_centers_
    for i in range(k):
        distances = np.linalg.norm(tfidf_matrix - cluster_center[i], axis=1)
        closest_idx = np.argsort(distances)[0]
        representative_headline = town_df.loc[closest_idx, 'Title']
        representative_headlines.append(representative_headline)
        
    # Insert into dataframe
    town_representative_headlines = pd.concat([town_representative_headlines, pd.DataFrame({'Town': [town], 'Representative Headlines':[representative_headlines]})]).reset_index(drop=True)

In [6]:
town_representative_headlines

Unnamed: 0,Town,Representative Headlines
0,Hougang,[Three get 20 months' jail for trying to sell ...
1,Punggol,"[44 new Covid-19 cases in Singapore, all impor..."
2,Ang mo kio,[Singapore rolls out Covid-19 vaccine for seni...
3,Woodlands,[Malaysian jailed for failing to stop at Woodl...
4,Yishun,[Estate agent jailed 2 years for molesting 13-...
5,Geylang,[2 men arrested for stealing from woman in Orc...
6,Toa payoh,[Three weeks' jail for 52-year-old ex-childcar...
7,Sembawang,[Police appeal for information on shop theft c...
8,Clementi,[Police arrest 55-year-old man for stealing tw...
9,Tampines,[Car flips upright in PIE accident near Tampin...


In [7]:
for i in range(len(town_representative_headlines)):
    print(f"{town_representative_headlines.loc[i, 'Town']}:")
    n = 1
    for headline in town_representative_headlines.loc[i, 'Representative Headlines']:
        headline = re.sub(r'\s+', ' ', headline)
        print(f"{n} - {headline}")
        n += 1
    print("\n")

Hougang:
1 - Three get 20 months' jail for trying to sell fake gold to Singapore businessman
2 - Man jailed seven months for committing indecent act with 11-year-old girl
3 - Top stories from The Straits Times on Monday, Feb 19
4 - Manager fined $5,000 for attempting to take upskirt photo of woman in lift
5 - PICTURES Teary farewell for young Singaporean man who died from dengue


Punggol:
1 - 44 new Covid-19 cases in Singapore, all imported, and no community case for 3rd day running
2 - HDB wins international award for public sector innovation in service
3 - Stabbing leaves woman's intestines hanging out; husband jailed 3 years
4 - VIDEO Punggol East by-election: Workers' Party wins Punggol East
5 - VIDEO, PICTURES Cat lover's moving video among top 20 entries this week for ST, Wherever You Are contest


Ang mo kio:
1 - Singapore rolls out Covid-19 vaccine for seniors: How to get your jab
2 - Hacker who called himself 'The Messiah' jailed 4 years and 8 months
3 - 5 things you need to 

Comment: While the representative headlines of Yishun news does contain a few crime related incidents, the same could be said for the other towns. Hence, this does not conclusively prove that Yishun is more chaotic than other towns