<a href="https://colab.research.google.com/github/TurkuNLP/DIGHT25/blob/main/03_summaries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [10]:
model_name="openai/gpt-4.1-mini"
api_key="..."

import json, dspy
import pandas as pd

INPUT_JSON = "hmd_newspapers_texts.json"
OUTPUT_XLSX = "news_topics_summary.xlsx"

# --- DSPy signature (class-based), no module class ---
class TopicAndSummary(dspy.Signature):
    """Read a historical news snippet and return:
    - topics: up to 3 concise topics (1–3 words each), comma-separated
    - summary: a brief, neutral summary (1–2 sentences)
    Example topics: crime, politics, war, economy, agriculture, culture, religion, science, health, local news.
    """
    text = dspy.InputField()
    topics = dspy.OutputField(desc="Comma-separated topics (≤3)")
    summary = dspy.OutputField(desc="Brief summary (1–2 sentences)")

# --- configure DSPy (adjust model / API as you use elsewhere) ---
lm = dspy.LM(model_name, api_key=api_key)
dspy.configure(lm=lm)
predict = dspy.Predict(signature=TopicAndSummary)

# --- load & cut down ---
with open(INPUT_JSON, encoding="utf-8") as f:
    texts = json.load(f)

texts = texts[:50]  # small hard-coded subset for the demo/class

# --- run extraction ---
rows = []
for i, t in enumerate(texts, 1):
    out = predict(text=t)
    topics = out.topics.strip()
    summary = out.summary.strip()

    # print to console (compact)
    print(f"--- {i} ---")
    preview = t[:200].replace("\n", " ")
    print("TEXT:", preview + ("..." if len(t) > 200 else ""))
    print("TOPICS:", topics)
    print("SUMMARY:", summary)
    print()

    rows.append({"text": t, "topics": topics, "summary": summary})

# --- write Excel (text, topics, summary) ---
df = pd.DataFrame(rows, columns=["text", "topics", "summary"])
# If openpyxl/xlsxwriter is installed, this will just work. Otherwise install one.
df.to_excel(OUTPUT_XLSX, index=False)

print(f"Wrote {len(df)} rows to {OUTPUT_XLSX}")

--- 1 ---
TEXT: TUESDAY, SEPTEMBER 18.—Wind N.N.W., light  ARRIVED.— Tippoo Saib, Cornforth, from Akyab Rimae' Fearon, Islay—Phoenix, King, Antigua—Daniel Webster, Putman, Boston—L. Pastorita, Bilboa—Rboda, Baunt, Gi...
TOPICS: shipping, maritime incidents, hurricane
SUMMARY: The report details the arrival and departure of various ships at multiple ports, including incidents such as a crew mutiny and a hurricane in Barbadoes that resulted in the loss of ten vessels. It also mentions the transport of large sums of money and shipwrecked passengers from a recent wreck near Vigo.

--- 2 ---
TEXT: HEALTH OF THE TROOPS.  The troops continue remarkably healthy. Their duties are comparatively light. Excepting the guards for the divisional staff establishments, the usual regimental guards, and thos...
TOPICS: health, military duties, troop movements
SUMMARY: The troops remain in good health due to light duties primarily involving fatigue work such as road making and camp maintenance, with few n