Setup and Auth

In [1]:
from transformers import pipeline
from huggingface_hub import login
from dotenv import load_dotenv
import os

# Load token from .env
load_dotenv()
login(os.getenv("HF_API_KEY"))

ModuleNotFoundError: No module named 'transformers'

Load Analysis Data

In [2]:
import pandas as pd
import sqlite3
from scipy.stats import linregress

conn = sqlite3.connect("../climate_data.db")
df = pd.read_sql_query("SELECT * FROM avg_temperatures_cleaned", conn)
conn.close()

# Choose one city as example
city = "Rome"
df_city = df[df["City"] == city]

slope, intercept, r, p, _ = linregress(df_city["Year"], df_city["AvgMaxTemp"])

text = (
    f"From 2014 to 2023, {city} experienced a warming trend. "
    f"The average maximum temperature increased from {df_city['AvgMaxTemp'].iloc[0]:.1f}°C "
    f"to {df_city['AvgMaxTemp'].iloc[-1]:.1f}°C. "
    f"The trend slope was {slope:.3f} with a p-value of {p:.4f}, indicating "
    f"{'statistical significance' if p < 0.05 else 'no significant trend'}."
)
print(text)


From 2014 to 2023, Rome experienced a warming trend. The average maximum temperature increased from 20.4°C to 22.4°C. The trend slope was 0.238 with a p-value of 0.0017, indicating statistical significance.


Summary

In [3]:
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

summary = summarizer(text, max_length=60, min_length=20, do_sample=False)

print("🔎 Summary:")
print(summary[0]["summary_text"])


Device set to use cpu
Your max_length is set to 60, but your input_length is only 54. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)


🔎 Summary:
 Rome experienced a warming trend from 2014 to 2023 . The average maximum temperature increased from 20.4°C to 22°C . The trend slope was 0.238 with a p-value of 0.0017, indicating statistical significance .


Summary for all cities

In [4]:
for city in df["City"].unique():
    df_city = df[df["City"] == city]
    slope, intercept, r, p, _ = linregress(df_city["Year"], df_city["AvgMaxTemp"])

    text = (
        f"From 2014 to 2023, {city} experienced a warming trend. "
        f"The average max temp increased from {df_city['AvgMaxTemp'].iloc[0]:.1f}°C "
        f"to {df_city['AvgMaxTemp'].iloc[-1]:.1f}°C. "
        f"Slope = {slope:.3f}, p-value = {p:.4f}. "
        f"This trend is {'statistically significant' if p < 0.05 else 'not statistically significant'}."
    )

    summary = summarizer(text, max_length=60, min_length=20, do_sample=False)
    print(f"\n📍 {city} Summary:\n{summary[0]['summary_text']}")


Your max_length is set to 60, but your input_length is only 55. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)
Your max_length is set to 60, but your input_length is only 55. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)



📍 Zurich Summary:
 Zurich experienced a warming trend from 2014 to 2023 . The average max temp increased from 14.8°C to 15.8 °C . Slope = 0.115, p-value = 0 .2042. This trend is not statistically significant .


Your max_length is set to 60, but your input_length is only 54. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)



📍 Geneva Summary:
 From 2014 to 2023, Geneva experienced a warming trend . The average max temp increased from 15.2°C to 17.0°C . Slope = 0.180, p-value= 0.0698. This trend is not statistically significant .


Your max_length is set to 60, but your input_length is only 56. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=28)



📍 Bern Summary:
 Bern experienced a warming trend from 2014 to 2023 . The average max temp increased from 13.7°C to 15.6°C . This trend is statistically significant .


Your max_length is set to 60, but your input_length is only 54. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)



📍 Berlin Summary:
 From 2014 to 2023, Berlin experienced a warming trend . The average max temp increased from 14.7°C to 15.1°C . Slope = 0.064, p-value= 0.4631. This trend is not statistically significant .


Your max_length is set to 60, but your input_length is only 55. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)



📍 Rome Summary:
 From 2014 to 2023, Rome experienced a warming trend . The average max temp increased from 20.4°C to 22°C . This trend is statistically significant .

📍 London Summary:
 From 2014 to 2023, London experienced a warming trend . The average max temp increased from 14.9°C to 15.5°C . Slope = 0.120, p-value = 0 .0695. This trend is not statistically significant .
