# City Health Score (CHS) - A Real-Time Measure of Urban Well-Being & Outlook

**Definition:**
The City Health Score quantifies a city’s wellbeing and outlook by combining emotional sentiment, social vitality, economic optimism, environmental comfort, and public optimism from social media and open data.

**Formula:**


# Social Media Post Classification

In [4]:
from transformers import pipeline
classifier = pipeline('zero-shot-classification', model='roberta-large-mnli')


Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [5]:
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing', 'future','optimism','hope','sadness']
classifier(sequence_to_classify, candidate_labels)

{'sequence': 'one day I will see the world',
 'labels': ['travel',
  'future',
  'hope',
  'optimism',
  'sadness',
  'cooking',
  'dancing'],
 'scores': [0.4268963634967804,
  0.2881767153739929,
  0.18121927976608276,
  0.08956682682037354,
  0.005412669852375984,
  0.00461979303508997,
  0.004108374007046223]}

# Positive / Negative Sentiment Analysis

In [1]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment-latest")
model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment-latest")

  from .autonotebook import tqdm as notebook_tqdm
Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [2]:
from transformers import pipeline
sentiment_task = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
sentiment_task("Covid cases are increasing fast!")

Device set to use cuda:0


[{'label': 'negative', 'score': 0.7235766649246216}]

In [3]:
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax
# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment-latest"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
#model.save_pretrained(MODEL)
text = "Covid cases are increasing fast!"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
# # TF
# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
# model.save_pretrained(MODEL)
# text = "Covid cases are increasing fast!"
# encoded_input = tokenizer(text, return_tensors='tf')
# output = model(encoded_input)
# scores = output[0][0].numpy()
# scores = softmax(scores)
# Print labels and scores
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = config.id2label[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


1) negative 0.7236
2) neutral 0.2287
3) positive 0.0477


## What this code does

- Defines thematic keyword sets (CATEGORIES) for city-wellbeing topics such as housing, income, employment, health, environment, safety, social connections, civic engagement, etc.

- Loads two Transformer pipelines:
    - A zero-shot classifier (roberta-large-mnli) used to score how well an arbitrary text matches each keyword label.
    - A sentiment classifier (cardiffnlp/twitter-roberta-base-sentiment-latest) used to detect positive/negative/neutral sentiment.

- score_category(text, labels, topk_mean=None):
    - Runs the zero-shot pipeline with multi_label=True on a set of labels (keywords).
    - Returns the best label, a sorted list of (label, score) and an aggregated category score (either the top score or the mean of the top-k scores when topk_mean is set).

- classify_citypulse_topic(text, topk_mean=None, threshold=0.25):
    - Applies score_category to every category in CATEGORIES.
    - Chooses the category with the highest aggregated score.
    - If the best category score is below the threshold, returns "none".
    - Returns: predicted_category, category_score, top_keyword, all_category_scores (per category), and top-3 contributing keywords for explanation.

- classify_with_sentiment(text, **kwargs):
    - Calls classify_citypulse_topic to get topical classification.
    - Calls the sentiment pipeline on the same text and attaches the sentiment result to the topic output.

- Example usage:
    - Classifies the sample text "The rent prices are skyrocketing..." and returns a dict containing category prediction (e.g., "housing"), numeric confidence scores per category, the top contributing keyword(s), an explanation (top keywords and their scores), and the sentiment label+confidence.

- Notes and behavior:
    - Uses zero-shot multi-label scoring so keyword scores are independent (sigmoid-like), not mutually exclusive.
    - Device selection is automatic (device_map="auto") so models may run on available GPUs.
    - The output is designed for downstream CHS (City Health Score) aggregation where topical relevance and sentiment can be combined into sub-scores.

In [5]:
from transformers import pipeline
import numpy as np

# ==== OECD WELL-BEING TOPICS (exact names) ===================================

access_to_services_keywords = [
    "public services", "access", "availability", "waiting time", "appointment",
    "queue", "digital services", "online portal", "transport access", "bus route",
    "train frequency", "healthcare access", "school places", "childcare places",
    "broadband", "mobile signal", "coverage", "service outage", "customer service"
]

civic_engagement_keywords = [
    "vote", "voter turnout", "election", "referendum", "petition",
    "public consultation", "civic participation", "community meeting",
    "local council", "town hall", "governance", "transparency", "corruption",
    "public policy", "accountability", "trust in government", "civic duty"
]

education_keywords = [
    "education", "school", "teacher", "university", "college",
    "exam results", "grades", "literacy", "numeracy", "STEM", "curriculum",
    "training", "course", "apprenticeship", "skills", "lifelong learning",
    "tuition fees", "scholarship", "school places"
]

jobs_keywords = [
    "job", "employment", "unemployment", "hiring", "recruiting",
    "job market", "vacancy", "layoff", "redundancy", "promotion",
    "career", "job security", "workforce", "labour demand", "gig economy",
    "payroll", "underemployment"
]

community_keywords = [
    "community", "neighbourhood", "neighbors", "social support",
    "volunteering", "mutual aid", "local club", "association", "festival",
    "street party", "togetherness", "belonging", "isolation", "loneliness",
    "community centre", "food bank"
]

environment_keywords = [
    "air quality", "pollution", "smog", "PM2.5", "green space", "park",
    "recycling", "waste", "sustainability", "climate", "heatwave", "storm",
    "flood", "drought", "biodiversity", "water quality", "noise pollution",
    "emissions", "low-emission zone", "tree planting"
]

income_keywords = [
    "income", "salary", "wage", "earnings", "pay rise", "bonus",
    "purchasing power", "poverty", "low income", "cost of living",
    "affordability", "inequality", "wealth", "savings", "net worth",
    "investment", "financial security", "disposable income"
]

health_keywords = [
    "health", "healthcare", "hospital", "clinic", "GP", "doctor", "nurse",
    "waiting list", "appointment", "A&E", "emergency department",
    "mental health", "wellbeing", "life expectancy", "vaccination",
    "public health", "disease", "screening", "preventive care", "fitness"
]

safety_keywords = [
    "safety", "crime", "burglary", "assault", "robbery", "knife crime",
    "violent crime", "police", "emergency", "911", "999",
    "traffic accident", "dangerous", "safe streets", "CCTV", "fear of crime"
]

housing_keywords = [
    "housing", "rent", "mortgage", "home ownership", "housing cost",
    "affordable housing", "social housing", "council house", "flat search",
    "property price", "eviction", "housing shortage", "tenants", "landlord",
    "apartment market", "overcrowding"
]

life_satisfaction_keywords = [
    "life satisfaction", "satisfied with life", "happiness", "happy",
    "wellbeing", "quality of life", "content", "fulfilled", "optimistic",
    "hopeful", "thriving", "life is good", "overall satisfaction"
]

# Optional: mapping for your loop-based classifiers
OECD_CATEGORIES = {
    "access_to_services": access_to_services_keywords,
    "civic_engagement": civic_engagement_keywords,
    "education": education_keywords,
    "jobs": jobs_keywords,
    "community": community_keywords,
    "environment": environment_keywords,
    "income": income_keywords,
    "health": health_keywords,
    "safety": safety_keywords,
    "housing": housing_keywords,
    "life_satisfaction": life_satisfaction_keywords,
}


# 2) Models -------------------------------------------------------------------
# Zero-shot for relevance 
zshot = pipeline("zero-shot-classification", model="roberta-large-mnli", device_map="auto")

# Sentiment (optional, for CHS sub-scores)
sentiment_clf = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest", device_map="auto")

Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0
Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassificatio

In [6]:
# 3) Category scoring helpers -------------------------------------------------
def score_category(text: str, labels, topk_mean: int | None = None):
    """
    Returns: dict(score=float, best_label=str, label_scores=list[(label,score)])
    Uses multi_label=True so scores are independent sigmoids per label.
    """
    out = zshot(text, candidate_labels=labels, multi_label=True)
    label_scores = list(zip(out["labels"], out["scores"]))
    # sort high -> low
    label_scores.sort(key=lambda x: x[1], reverse=True)

    if topk_mean and topk_mean > 1:
        k = min(topk_mean, len(label_scores))
        agg = float(np.mean([s for _, s in label_scores[:k]]))
    else:
        agg = float(label_scores[0][1])  # max

    return {
        "score": agg,
        "best_label": label_scores[0][0],
        "label_scores": label_scores
    }

def classify_citypulse_topic(text: str, topk_mean: int | None = None, threshold: float = 0.25):
    """
    Scores all categories and returns the best category if it clears threshold.
    """
    cat_results = {}
    for cat, labels in OECD_CATEGORIES.items():
        r = score_category(text, labels, topk_mean=topk_mean)
        cat_results[cat] = r

    # pick best category
    best_cat = max(cat_results.items(), key=lambda kv: kv[1]["score"])
    cat_name, cat_info = best_cat

    is_confident = cat_info["score"] >= threshold
    return {
        "predicted_category": cat_name if is_confident else "none",
        "category_score": cat_info["score"],
        "top_keyword": cat_info["best_label"],
        "all_category_scores": {k: v["score"] for k, v in cat_results.items()},
        "explanation_top3_keywords": cat_info["label_scores"][:3]
    }

def classify_with_sentiment(text: str, **kwargs):
    topic = classify_citypulse_topic(text, **kwargs)
    sent = sentiment_clf(text)[0]  # {'label': 'Positive'|'Negative'|'Neutral', 'score': p}
    topic["sentiment"] = sent
    return topic

# 4) Example ------------------------------------------------------------------
sequence_to_classify = "The rent prices are skyrocketing, making it hard for tenants to find affordable housing."
res = classify_with_sentiment(sequence_to_classify, topk_mean=None, threshold=0.25)

# Example output:
# {
#   'predicted_category': 'housing',
#   'category_score': 0.91,
#   'top_keyword': 'rent',
#   'all_category_scores': {'housing': 0.91, 'income_wealth': 0.43, ...},
#   'explanation_top3_keywords': [('rent', 0.91), ('tenants', 0.88), ('housing cost', 0.85)],
#   'sentiment': {'label': 'Negative', 'score': 0.98}
# }


In [7]:
print(res)

{'predicted_category': 'housing', 'category_score': 0.9977219104766846, 'top_keyword': 'housing cost', 'all_category_scores': {'access_to_services': 0.8177061080932617, 'civic_engagement': 0.78459233045578, 'education': 0.7282626032829285, 'jobs': 0.6385324597358704, 'community': 0.7214561700820923, 'environment': 0.6564213037490845, 'income': 0.9834838509559631, 'health': 0.7077276706695557, 'safety': 0.9443448185920715, 'housing': 0.9977219104766846, 'life_satisfaction': 0.25235071778297424}, 'explanation_top3_keywords': [('housing cost', 0.9977219104766846), ('housing', 0.9969483017921448), ('rent', 0.9943724274635315)], 'sentiment': {'label': 'negative', 'score': 0.8707168698310852}}


## Testing the pipeline

In [8]:
test_posts = [
    # 1) ACCESS TO SERVICES
    {"text": "Finally booked a GP appointment in two days—much better access to services now.", "expected_category": "access_to_services", "expected_sentiment": "Positive"},
    {"text": "The online portal is down again and bus routes were cut—no access to services!", "expected_category": "access_to_services", "expected_sentiment": "Negative"},

    # 2) CIVIC ENGAGEMENT
    {"text": "Huge voter turnout and a transparent town hall meeting tonight.", "expected_category": "civic_engagement", "expected_sentiment": "Positive"},
    {"text": "Corruption allegations make people distrust the local council.", "expected_category": "civic_engagement", "expected_sentiment": "Negative"},

    # 3) EDUCATION
    {"text": "Schools improved exam results and new STEM courses launched.", "expected_category": "education", "expected_sentiment": "Positive"},
    {"text": "Teacher shortages and crowded classrooms are hurting learning.", "expected_category": "education", "expected_sentiment": "Negative"},

    # 4) JOBS
    {"text": "Startups are hiring again—lots of vacancies for junior roles.", "expected_category": "jobs", "expected_sentiment": "Positive"},
    {"text": "Another wave of layoffs hit the tech park.", "expected_category": "jobs", "expected_sentiment": "Negative"},

    # 5) COMMUNITY
    {"text": "The street festival brought the neighbourhood together—such belonging.", "expected_category": "community", "expected_sentiment": "Positive"},
    {"text": "Feeling isolated; no community centre or local clubs left.", "expected_category": "community", "expected_sentiment": "Negative"},

    # 6) ENVIRONMENT
    {"text": "Air quality improved after the low-emission zone started.", "expected_category": "environment", "expected_sentiment": "Positive"},
    {"text": "Heatwave and smog alerts again—pollution is terrible.", "expected_category": "environment", "expected_sentiment": "Negative"},

    # 7) INCOME
    {"text": "Got a pay rise, so our disposable income finally covers the bills.", "expected_category": "income", "expected_sentiment": "Positive"},
    {"text": "Cost of living is crushing low-income families; inequality is widening.", "expected_category": "income", "expected_sentiment": "Negative"},

    # 8) HEALTH
    {"text": "New community clinic cut waiting lists and boosted mental health support.", "expected_category": "health", "expected_sentiment": "Positive"},
    {"text": "Emergency departments are overwhelmed; public health services backlogged.", "expected_category": "health", "expected_sentiment": "Negative"},

    # 9) SAFETY
    {"text": "Police patrols increased and violent crime dropped downtown.", "expected_category": "safety", "expected_sentiment": "Positive"},
    {"text": "Late-night assaults near the station have people scared to walk.", "expected_category": "safety", "expected_sentiment": "Negative"},

    # 10) HOUSING
    {"text": "Affordable housing units opened—more council homes available.", "expected_category": "housing", "expected_sentiment": "Positive"},
    {"text": "Rent is skyrocketing and tenants face evictions.", "expected_category": "housing", "expected_sentiment": "Negative"},

    # 11) LIFE SATISFACTION
    {"text": "Feeling genuinely happy and satisfied with life here lately.", "expected_category": "life_satisfaction", "expected_sentiment": "Positive"},
    {"text": "My overall life satisfaction has dropped; I don’t feel fulfilled.", "expected_category": "life_satisfaction", "expected_sentiment": "Negative"},

    # AMBIGUOUS / MULTI / NONE
    {"text": "New tram line opens next week—can’t wait!", "expected_category": "access_to_services", "expected_sentiment": "Positive"},  # could also hit environment/community
    {"text": "The city feels buzzy today.", "expected_category": "life_satisfaction", "expected_sentiment": "Positive"},              # low-signal; tests thresholding
    {"text": "That latte was incredible.", "expected_category": "none", "expected_sentiment": "Positive"},                           # should map to 'none'
]


In [9]:
def run_tests(posts, classify_fn):
    rows = []
    for p in posts:
        out = classify_fn(p["text"], topk_mean=3, threshold=0.25)  # tweak as needed
        rows.append({
            "text": p["text"][:80] + ("..." if len(p["text"]) > 80 else ""),
            "pred_cat": out["predicted_category"],
            "pred_score": round(out["category_score"], 3),
            "top_kw": out["top_keyword"],
            "sent": out["sentiment"]["label"],
            "sent_p": round(out["sentiment"]["score"], 3),
            "expected_cat": p["expected_category"],
            "expected_sent": p["expected_sentiment"],
        })
    return rows

# Example:
results = run_tests(test_posts, classify_with_sentiment)
for r in results:
    print(r)


{'text': 'Finally booked a GP appointment in two days—much better access to services now.', 'pred_cat': 'access_to_services', 'pred_score': 0.983, 'top_kw': 'healthcare access', 'sent': 'positive', 'sent_p': 0.947, 'expected_cat': 'access_to_services', 'expected_sent': 'Positive'}
{'text': 'The online portal is down again and bus routes were cut—no access to services!', 'pred_cat': 'access_to_services', 'pred_score': 0.985, 'top_kw': 'service outage', 'sent': 'negative', 'sent_p': 0.917, 'expected_cat': 'access_to_services', 'expected_sent': 'Negative'}
{'text': 'Huge voter turnout and a transparent town hall meeting tonight.', 'pred_cat': 'civic_engagement', 'pred_score': 0.99, 'top_kw': 'civic participation', 'sent': 'positive', 'sent_p': 0.915, 'expected_cat': 'civic_engagement', 'expected_sent': 'Positive'}
{'text': 'Corruption allegations make people distrust the local council.', 'pred_cat': 'civic_engagement', 'pred_score': 0.909, 'top_kw': 'local council', 'sent': 'negative', 's

## Compute City Health Score (CHS) as a weighted sum of normalised topic scores:

In [10]:
CHS_WEIGHTS = {
    "access_to_services": 0.09,
    "civic_engagement":   0.08,
    "education":          0.09,
    "jobs":               0.10,
    "community":          0.09,
    "environment":        0.09,
    "income":             0.10,
    "health":             0.12,  # slightly higher
    "safety":             0.10,  # slightly higher
    "housing":            0.08,
    "life_satisfaction":  0.06
}
assert abs(sum(CHS_WEIGHTS.values()) - 1.0) < 1e-9

# CHS = sum_i w_i * T_i   where T_i are the 0–100 normalised topic scores per city-day
def compute_CHS(topic_scores_0_to_100: dict[str, float]) -> float:
    return sum(CHS_WEIGHTS[k] * topic_scores_0_to_100.get(k, 0.0) for k in CHS_WEIGHTS)


In [None]:
from collections import defaultdict
import math

# --- weights from earlier (edit if you like) ---
CHS_WEIGHTS = {
    "access_to_services": 0.09,
    "civic_engagement":   0.08,
    "education":          0.09,
    "jobs":               0.10,
    "community":          0.09,
    "environment":        0.09,
    "income":             0.10,
    "health":             0.12,
    "safety":             0.10,
    "housing":            0.08,
    "life_satisfaction":  0.06,
}
assert abs(sum(CHS_WEIGHTS.values()) - 1.0) < 1e-9

OECD_TOPICS = set(CHS_WEIGHTS.keys())

def _sentiment_to_sign(sent_label: str) -> int:
    """Return +1 for Positive, 0 for Neutral, -1 for Negative."""
    s = (sent_label or "").strip().lower()
    if s.startswith("pos"):   return +1
    if s.startswith("neu"):   return 0
    if s.startswith("neg"):   return -1
    # default to neutral if unknown
    return 0

def compute_topic_signed_scores(texts, classify_fn, topk_mean=3, threshold=0.28):
    """
    Returns:
      topic_scores_raw: dict[topic] -> mean signed score in [-1, 1]
      details: per-post rows for debugging
    """
    per_topic_values = defaultdict(list)
    details = []

    for t in texts:
        out = classify_fn(t, topk_mean=topk_mean, threshold=threshold)
        pred_cat   = out.get("predicted_category", "none")
        pred_score = float(out.get("category_score", 0.0))
        sent_label = out.get("sentiment", {}).get("label", "Neutral")
        sign = _sentiment_to_sign(sent_label)

        # Only include valid OECD topics and confident predictions
        if pred_cat in OECD_TOPICS and pred_score > 0:
            signed = sign * pred_score  # in [-1, 1]
            per_topic_values[pred_cat].append(signed)
        else:
            # ignore 'none' or below-threshold/zero scores
            pass

        details.append({
            "text": t,
            "predicted_category": pred_cat,
            "category_score": pred_score,
            "sentiment": sent_label,
            "signed_score": sign * pred_score if pred_cat in OECD_TOPICS else 0.0
        })

    # mean signed score per topic (fallback to 0 when no posts)
    topic_scores_raw = {
        topic: (sum(vals) / len(vals)) if len(vals) > 0 else 0.0
        for topic, vals in per_topic_values.items()
    }
    # ensure all topics present
    for topic in OECD_TOPICS:
        topic_scores_raw.setdefault(topic, 0.0)

    return topic_scores_raw, details

def normalize_topic_scores_0_100(topic_scores_raw):
    """
    Map mean signed scores from [-1, 1] -> [0, 100].
      -1 -> 0, 0 -> 50, +1 -> 100
    """
    topic_scores_0_100 = {
        k: ((v + 1.0) / 2.0) * 100.0  # linear mapping
        for k, v in topic_scores_raw.items()
    }
    return topic_scores_0_100

def compute_CHS(topic_scores_0_to_100: dict[str, float]) -> float:
    """Weighted sum of topic scores (0–100) -> CHS (0–100)."""
    return sum(CHS_WEIGHTS[k] * topic_scores_0_to_100.get(k, 0.0) for k in CHS_WEIGHTS)


In [13]:
# -------------------------
# Example usage:
texts = [p["text"] for p in test_posts]
raw, rows = compute_topic_signed_scores(texts, classify_with_sentiment, topk_mean=3, threshold=0.28)
topic_0_100 = normalize_topic_scores_0_100(raw)
chs = compute_CHS(topic_0_100)
print("Per-topic (0-100):", {k: round(v,1) for k,v in topic_0_100.items()})
print("CHS:", round(chs, 1))


Per-topic (0-100): {'access_to_services': 66.3, 'civic_engagement': 52.0, 'education': 51.4, 'jobs': 50.7, 'community': 99.8, 'income': 35.6, 'environment': 47.4, 'health': 1.0, 'safety': 25.1, 'housing': 25.1, 'life_satisfaction': 79.8}
CHS: 46.1
