# Week 6 â€“ LLM Narrative Insights (Subte Turnstile Data)

This notebook validates the LLM-based narrative generation for the Urban Intelligence Lab,
using the cleaned Subte turnstile dataset and a local Ollama model (`llama3.2:3b`).


In [8]:
import sys
from pathlib import Path

# Add project root to Python path
PROJECT_ROOT = Path("..").resolve()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

import pandas as pd
from llm.insights import summarize_dataset, generate_insights

DATA_PATH = PROJECT_ROOT / "data/processed/subte_molinetes_ridership_clean.csv"
df = pd.read_csv(DATA_PATH)

df.head()


  df = pd.read_csv(DATA_PATH)


Unnamed: 0,fecha;desde;hasta;linea;molinete;estacion;pax_pagos;pax_pases_pagos;pax_franq;pax_total,unnamed:_1,unnamed:_2,fecha,desde,hasta,linea,molinete,estacion,pax_pagos,pax_pases_pagos,pax_franq,pax_total
0,1/1/2024;07:45:00;08:00:00;LineaB;LineaB_Malab...,,,,,,,,,,,,
1,1/1/2024;07:45:00;08:00:00;LineaB;LineaB_Trona...,,,,,,,,,,,,
2,1/1/2024;07:45:00;08:00:00;LineaB;LineaB_Pelle...,,,,,,,,,,,,
3,1/1/2024;07:45:00;08:00:00;LineaA;LineaA_Flore...,,,,,,,,,,,,
4,1/1/2024;07:45:00;08:00:00;LineaB;LineaB_Dorre...,,,,,,,,,,,,


In [9]:
df.info()
df.columns.tolist()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11440437 entries, 0 to 11440436
Data columns (total 13 columns):
 #   Column                                                                                   Dtype  
---  ------                                                                                   -----  
 0   fecha;desde;hasta;linea;molinete;estacion;pax_pagos;pax_pases_pagos;pax_franq;pax_total  object 
 1   unnamed:_1                                                                               float64
 2   unnamed:_2                                                                               float64
 3   fecha                                                                                    object 
 4   desde                                                                                    object 
 5   hasta                                                                                    object 
 6   linea                                                           

['fecha;desde;hasta;linea;molinete;estacion;pax_pagos;pax_pases_pagos;pax_franq;pax_total',
 'unnamed:_1',
 'unnamed:_2',
 'fecha',
 'desde',
 'hasta',
 'linea',
 'molinete',
 'estacion',
 'pax_pagos',
 'pax_pases_pagos',
 'pax_franq',
 'pax_total']

In [10]:
summary = summarize_dataset(df)
summary


{'top_stations': {'Constitucion': 41266,
  'Federico Lacroze': 33537,
  'San Pedrito': 20628,
  'Rosas': 20620,
  'Retiro': 19500},
 'top_lines': {'LineaB': 224587, 'LineaA': 208074, 'LineaC': 118393}}

In [11]:
insights_text = generate_insights(df)
print(insights_text)


Exception in thread Thread-6 (_readerthread):
Traceback (most recent call last):
  File "C:\Users\do_ch\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "e:\Proyectos\Proyectos GitHub\urban-intelligence-lab\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "C:\Users\do_ch\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\do_ch\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1599, in _readerthread
    buffer.append(fh.read())
                  ^^^^^^^^^
  File "C:\Users\do_ch\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in posit

Analyzing Buenos Aires Subte Turnstile Data: Uncovering Passenger Behavior

The turnstile data from the Buenos Aires Subte provides a wealth of information about passenger behavior, highlighting patterns and trends that can inform urban mobility strategies. Our analysis reveals that certain stations are hubs for public transportation, with Constitucion Station standing out as the busiest, followed closely by Federico Lacroze and San Pedrito Stations.

In terms of line usage, Linea B emerges as the most popular, accounting for 224,587 turnstile transactions. This is not surprising, given its extensive network and high frequency of service. In contrast, Linea C lags behind with significantly fewer turnstile events. Another notable trend is the popularity of Retiro Station, which rounds out the top five busiest stations.

A closer examination of the data reveals a seasonal pattern in passenger traffic. For example, peak hours typically occur between 7-9 am and 4-6 pm, suggesting that comm

In [12]:
output_path = Path("../llm/prompts/week6_sample_insights.txt")
output_path.write_text(insights_text, encoding="utf-8")
output_path


WindowsPath('../llm/prompts/week6_sample_insights.txt')