# VBB Public Transport Analysis & Streamlit App Logic

This notebook outlines the steps to fetch data from the VBB API, process it, and prepare the logic for a Streamlit dashboard.

**IMPORTANT:**
* You **MUST** obtain your own API key from the VBB developer portal.
* The exact API endpoints and JSON response structures shown here are **examples** and might differ from the actual VBB API. You'll need to consult the official VBB API documentation.
* Error handling should be more robust in a production app.

## 1. Setup and Configuration
Import libraries and set up API key access (using a config file here for example).

In [1]:
import requests
import pandas as pd
import json
from datetime import datetime, timedelta
import os
import warnings

warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', 50)

# --- Configuration ---
# OPTION 1: Use a config.py file (Make sure config.py is in .gitignore!)
try:
    import config
    API_KEY = config.VBB_API_KEY
    print("API Key loaded from config.py")
except (ImportError, AttributeError):
    # OPTION 2: Use an Environment Variable (Recommended for deployment)
    API_KEY = os.getenv('VBB_API_KEY')
    if API_KEY:
        print("API Key loaded from environment variable VBB_API_KEY")
    else:
        # OPTION 3: Placeholder (Replace this with your actual key if testing directly)
        API_KEY = "YOUR_VBB_API_KEY_HERE" # <<<--- REPLACE THIS OR USE Option 1 or 2
        print("API Key using placeholder - replace it!")

# --- VBB API Example Endpoints (These are illustrative - CHECK VBB DOCS!) ---
BASE_URL = "https://vbb-api-endpoint.example.com/v1" # Replace with actual base URL
LOCATION_SEARCH_ENDPOINT = f"{BASE_URL}/locations"
DEPARTURES_ENDPOINT = f"{BASE_URL}/stops/{{stop_id}}/departures" # Uses f-string formatting later

# Headers often needed for APIs
HEADERS = {'Authorization': f'Bearer {API_KEY}', 'Content-Type': 'application/json'}

print("Setup complete.")

  from pandas.core.computation.check import NUMEXPR_INSTALLED
  from pandas.core import (


API Key using placeholder - replace it!
Setup complete.


## 2. API Interaction Functions

Functions to search for stations and get departures.

In [2]:
def search_station(query):
    """Searches for a station ID based on a query string."""
    if not API_KEY or API_KEY == "YOUR_VBB_API_KEY_HERE":
        print("API Key not configured.")
        # Return dummy data for structure example
        return [{'id': '900000100001', 'name': 'Example Station A (Dummy)'},
                {'id': '900000100002', 'name': 'Example Station B (Dummy)'}]

    params = {'query': query, 'results': 5} # Limit results
    try:
        response = requests.get(LOCATION_SEARCH_ENDPOINT, headers=HEADERS, params=params, timeout=10)
        response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
        # Assuming the API returns a list of location objects directly or under a key like 'locations'
        data = response.json()
        # --- Adjust based on actual API response structure ---
        if isinstance(data, list):
             # Filter only stations (example: check if 'type' is 'stop')
            stations = [loc for loc in data if loc.get('type') == 'stop']
            return stations[:5] # Return top 5
        elif 'locations' in data and isinstance(data['locations'], list):
            stations = [loc for loc in data['locations'] if loc.get('type') == 'stop']
            return stations[:5]
        else:
            print("Unexpected API response format for locations.")
            return []
        # --- End Adjustment section ---

    except requests.exceptions.RequestException as e:
        print(f"Error searching for station: {e}")
        return [] # Return empty list on error

def get_departures(stop_id, duration_minutes=60):
    """Gets departures for a specific stop ID for the next X minutes."""
    if not API_KEY or API_KEY == "YOUR_VBB_API_KEY_HERE":
        print("API Key not configured.")
        # Return dummy data for structure example
        return [{'line': {'name': 'S1'}, 'direction': 'Destination A', 'when': '2025-10-26T10:15:00+01:00', 'plannedWhen': '2025-10-26T10:15:00+01:00', 'delay': 0},
                {'line': {'name': 'U2'}, 'direction': 'Destination B', 'when': '2025-10-26T10:18:00+01:00', 'plannedWhen': '2025-10-26T10:17:00+01:00', 'delay': 60}]

    endpoint = DEPARTURES_ENDPOINT.format(stop_id=stop_id)
    params = {'duration': duration_minutes}
    try:
        response = requests.get(endpoint, headers=HEADERS, params=params, timeout=10)
        response.raise_for_status()
        # Assuming the API returns a list of departure objects directly or under a key like 'departures'
        data = response.json()
        # --- Adjust based on actual API response structure ---
        if isinstance(data, list):
            return data
        elif 'departures' in data and isinstance(data['departures'], list):
            return data['departures']
        else:
            print("Unexpected API response format for departures.")
            return []
        # --- End Adjustment section ---

    except requests.exceptions.RequestException as e:
        print(f"Error getting departures: {e}")
        return []

# --- Example Usage ---
print("Example Station Search for 'Potsdam Hbf':")
example_stations = search_station("Potsdam Hbf")
print(example_stations)

if example_stations:
    example_stop_id = example_stations[0]['id'] # Use the first result's ID
    print(f"\nExample Departures for ID {example_stop_id}:")
    example_departures = get_departures(example_stop_id, 30) # Get next 30 mins
    # Print first few departures for inspection
    print(json.dumps(example_departures[:2], indent=2))
else:
    print("\nCould not get example station ID.")

Example Station Search for 'Potsdam Hbf':
API Key not configured.
[{'id': '900000100001', 'name': 'Example Station A (Dummy)'}, {'id': '900000100002', 'name': 'Example Station B (Dummy)'}]

Example Departures for ID 900000100001:
API Key not configured.
[
  {
    "line": {
      "name": "S1"
    },
    "direction": "Destination A",
    "when": "2025-10-26T10:15:00+01:00",
    "plannedWhen": "2025-10-26T10:15:00+01:00",
    "delay": 0
  },
  {
    "line": {
      "name": "U2"
    },
    "direction": "Destination B",
    "when": "2025-10-26T10:18:00+01:00",
    "plannedWhen": "2025-10-26T10:17:00+01:00",
    "delay": 60
  }
]


## 3. Data Processing Function

Turn the raw JSON departure list into a structured Pandas DataFrame.

In [3]:
def process_departures(departures_json):
    """Processes the JSON list of departures into a Pandas DataFrame."""
    processed_data = []
    if not departures_json:
        return pd.DataFrame(columns=['Line', 'Direction', 'Scheduled', 'Expected', 'Delay (Min)'])

    for dep in departures_json:
        try:
            # --- Adjust dictionary keys based on actual API response ---
            line_name = dep.get('line', {}).get('name', 'N/A')
            direction = dep.get('direction', 'N/A')
            scheduled_time_str = dep.get('plannedWhen')
            expected_time_str = dep.get('when') # 'when' usually includes delay

            # VBB API often returns delay in seconds directly
            delay_seconds = dep.get('delay')

            scheduled_dt = pd.to_datetime(scheduled_time_str) if scheduled_time_str else None
            expected_dt = pd.to_datetime(expected_time_str) if expected_time_str else scheduled_dt # Use scheduled if expected is missing

            delay_minutes = None
            if delay_seconds is not None:
                delay_minutes = delay_seconds // 60 # Convert seconds to minutes
            elif scheduled_dt and expected_dt:
                 # Calculate delay if not provided directly
                 time_diff = expected_dt - scheduled_dt
                 delay_minutes = round(time_diff.total_seconds() / 60)
            # --- End Adjustment section ---

            processed_data.append({
                'Line': line_name,
                'Direction': direction,
                'Scheduled': scheduled_dt.strftime('%H:%M') if scheduled_dt else 'N/A',
                'Expected': expected_dt.strftime('%H:%M') if expected_dt else 'N/A',
                'Delay (Min)': int(delay_minutes) if delay_minutes is not None else 0 # Default delay to 0 if unknown
            })
        except Exception as e:
            print(f"Error processing departure record: {dep} - Error: {e}") # Log errors for debugging
            continue # Skip problematic records

    df_departures = pd.DataFrame(processed_data)
    # Convert Delay to numeric, coercing errors
    df_departures['Delay (Min)'] = pd.to_numeric(df_departures['Delay (Min)'], errors='coerce').fillna(0).astype(int)
    # Sort by expected time (might need adjustment if time format changes)
    # df_departures.sort_values(by='Expected', inplace=True)
    return df_departures

# --- Example Usage ---
if 'example_departures' in locals():
    print("\nProcessing example departures into DataFrame:")
    df_processed = process_departures(example_departures)
    display(df_processed.head())
else:
    print("\nNo example departures fetched, skipping processing.")


Processing example departures into DataFrame:


Unnamed: 0,Line,Direction,Scheduled,Expected,Delay (Min)
0,S1,Destination A,10:15,10:15,0
1,U2,Destination B,10:17,10:18,1


## 4. Streamlit App Logic (Conceptual)

This section outlines how the functions above would be used in a `streamlit run app.py` script.

In [4]:
# This cell is just a placeholder showing the structure for app.py
# You would copy the relevant functions (search_station, get_departures, process_departures)
# and this layout logic into a separate app.py file.

# import streamlit as st
# import pandas as pd
# # Import functions or define them here

# st.title("VBB Public Transport Departures 🚆")

# # --- Station Search ---
# station_query = st.text_input("Search for a station:", "Potsdam Hbf")
# # ... (rest of search logic from previous response)

# # --- Station Selection ---
# # ... (selectbox logic from previous response)

# # --- Fetch and Display Departures ---
# # ... (button, data fetching, dataframe display, and bar chart logic from previous response)

# st.markdown("---")
# st.caption("Data fetched from VBB API (structure based on examples). Requires API key.")

## 5. Next Steps

1.  **Refine API Calls:** Get actual VBB API key, find correct endpoints, and adjust `search_station`, `get_departures`, and `process_departures` to match the real API structure (JSON keys, data types, error formats).
2.  **Create `app.py`:** Copy the logic from section 4 into a `app.py` file. Import the necessary functions or define them within the script.
3.  **Secure API Key:** Implement proper API key handling (environment variables are best).
4.  **Enhance Dashboard:** Add more visualizations (on-time percentage, delay distributions), error handling, potentially caching API results briefly to avoid hitting rate limits.
5.  **Test Thoroughly:** Test with various stations and edge cases.
6.  **(Optional) Deploy:** Use Streamlit Community Cloud or another platform.