Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @npaun @JMilot1 @jsteelz
* @JMilot1 @jsteelz
52 changes: 52 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Publish to PyPI

on:
push:
branches: [ main ]

jobs:
publish:
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2

- name: Install uv
uses: astral-sh/setup-uv@v5

- name: Check if version changed
id: version_check
run: |
VERSION=$(uv run python -c "import tomllib; print(tomllib.load(open('pyproject.toml', 'rb'))['project']['version'])")
echo "version=$VERSION" >> $GITHUB_OUTPUT

if git diff HEAD^ HEAD -- pyproject.toml | grep -q 'version ='; then
echo "changed=true" >> $GITHUB_OUTPUT
else
echo "changed=false" >> $GITHUB_OUTPUT
fi

- name: Build package
if: steps.version_check.outputs.changed == 'true'
run: uv build

- name: Publish to PyPI
if: steps.version_check.outputs.changed == 'true'
env:
UV_PUBLISH_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
run: uv publish

- name: Create GitHub Release
if: steps.version_check.outputs.changed == 'true'
env:
GH_TOKEN: ${{ github.token }}
run: |
VERSION=${{ steps.version_check.outputs.version }}
gh release create "v$VERSION" \
--title "v$VERSION" \
--generate-notes \
dist/*
22 changes: 8 additions & 14 deletions .github/workflows/pull-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,25 +14,19 @@ jobs:
runs-on: [ubuntu-latest]
strategy:
matrix:
python-version: [pypy-3.8]
python-version: ['3.10', 'pypy3.10']

steps:
- uses: actions/checkout@v2
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
run: uv sync --all-extras --dev
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
uv run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude=.venv
uv run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --exclude=.venv
- name: Test with pytest
run: |
python -m pytest .
run: uv run pytest .
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.10.19
80 changes: 80 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

py-gtfs-loader is a Python library for loading and manipulating GTFS (General Transit Feed Specification) data. It parses GTFS directories into Python objects with schema validation and provides utilities for reading, modifying, and writing GTFS feeds.

## Development Commands

### Using uv (package manager)

```bash
# Install dependencies
uv sync --all-extras --dev

# Run tests
uv run pytest .

# Run linting
uv run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
uv run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

# Build package
uv build

# Run a single test file
uv run pytest tests/test_runner.py

# Run a specific test case
uv run pytest tests/test_runner.py::test_default -k "test_name"
```

## Architecture

### Core Components

**`gtfs_loader/__init__.py`** - Main entry point with load/patch functions
- `load(gtfs_dir, ...)`: Parses GTFS directory into structured objects
- `patch(gtfs, gtfs_in_dir, gtfs_out_dir, ...)`: Modifies and writes GTFS data back to disk
- Supports both standard GTFS and Transit itinerary format via `itineraries=True` flag
- CSV and GeoJSON file type support

**`gtfs_loader/schema.py`** - GTFS entity definitions and schemas
- Defines all GTFS entities (Agency, Route, Trip, Stop, StopTime, etc.)
- Entity classes have `_schema` attribute describing file structure (ID, grouping, required fields)
- Two schema collections: `GTFS_SUBSET_SCHEMA` (standard) and `GTFS_SUBSET_SCHEMA_ITINERARIES` (Transit format)
- Entities reference other entities via `_gtfs` attribute (e.g., `stop_time.stop` resolves to Stop object)

**`gtfs_loader/schema_classes.py`** - Schema metadata system
- `File`: Describes GTFS file structure (primary key, grouping, file type)
- `Field`: Named tuple for field configuration (type, required, default)
- `FileCollection`: Container for file schemas
- Grouping support: entities with same ID can be grouped by secondary key (e.g., stop_times grouped by trip_id + stop_sequence)

**`gtfs_loader/types.py`** - Custom types and base classes
- `GTFSTime`: Integer-based time allowing >24h (e.g., "25:30:00" for next-day services)
- `GTFSDate`: datetime subclass parsing YYYYMMDD and YYYY-MM-DD formats
- `Entity`: Base class for all GTFS entities, dict-like with `_gtfs` reference to parent collection
- `EntityDict`: Dict subclass storing resolved field metadata

### Data Flow

1. **Load**: CSV/GeoJSON → parse headers → validate fields → create Entity objects → index by ID → return nested dict structure
2. **Access**: `gtfs.stops['stop_id']` or `gtfs.stop_times['trip_id'][sequence_index]`
3. **Patch**: Flatten nested structures → write CSV with correct headers → preserve unmodified files

### Key Patterns

- **Entity indexing**: Primary entities indexed by `id` field, grouped entities create nested dicts/lists
- **Cross-references**: Entities access related data via `_gtfs` backref (e.g., `trip.route`, `stop_time.stop`)
- **Computed properties**: Use `@cached_property` for derived values (e.g., `trip.first_departure`)
- **Two GTFS formats**: Standard (stop_times.txt) vs Transit itinerary format (itinerary_cells.txt + trip arrays)

## Itinerary Format Support

The library supports Transit's custom itinerary format where:
- `itinerary_cells.txt` defines stop sequences (like templates)
- Trips reference itineraries and contain time arrays instead of individual stop_times
- Use `itineraries=True` flag when loading/patching to use this format
114 changes: 111 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,115 @@
# py-gtfs-loader

Simple python library to load GTFS folder
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://github.com/TransitApp/py-gtfs-loader/workflows/Build%20on%20pull%20request/badge.svg)](https://github.com/TransitApp/py-gtfs-loader/actions)

To use, simply `import gtfs_loader` and load a GTFS folder with `gtfs = gtfs_loader.load(args.gtfs_dir)`
A Python library for loading and manipulating GTFS (General Transit Feed Specification) data with schema validation and type safety.

All data is now available under `gtfs.filename`
## Features

- 📦 Load GTFS feeds from directories
- ✅ Schema validation with type checking
- 🔄 Modify and patch GTFS data
- 🚀 Support for standard GTFS and Transit's itinerary format
- 📝 CSV and GeoJSON file type support
- 🔗 Cross-referenced entities for easy data navigation

## Installation

```bash
pip install py-gtfs-loader
```

Or using uv:

```bash
uv add py-gtfs-loader
```

## Quick Start

### Loading GTFS Data

```python
import gtfs_loader

# Load a GTFS feed
gtfs = gtfs_loader.load('path/to/gtfs/directory')

# Access data by entity
stop = gtfs.stops['stop_id']
route = gtfs.routes['route_id']
trip = gtfs.trips['trip_id']

# Access grouped entities
stop_times = gtfs.stop_times['trip_id'] # Returns list of stop times for a trip
```

### Modifying and Saving GTFS Data

```python
# Modify data
gtfs.stops['stop_id'].stop_name = "New Stop Name"

# Save changes back to disk
gtfs_loader.patch(gtfs, 'path/to/input', 'path/to/output')
```

### Loading Specific Files

```python
# Load only specific files
gtfs = gtfs_loader.load('path/to/gtfs', files=['stops', 'routes', 'trips'])
```

### Transit Itinerary Format

```python
# Load Transit itinerary format (itinerary_cells.txt)
gtfs = gtfs_loader.load('path/to/gtfs', itineraries=True)
```

## Development

This project uses [uv](https://docs.astral.sh/uv/) for dependency management.

### Setup

```bash
# Install dependencies
uv sync --all-extras --dev

# Run tests
uv run pytest .

# Run linting
uv run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
```

### Requirements

- Python ≥ 3.10
- Development dependencies: pytest, flake8

## Project Structure

- `gtfs_loader/` - Main package
- `__init__.py` - Load/patch functions
- `schema.py` - GTFS entity definitions
- `schema_classes.py` - Schema metadata system
- `types.py` - Custom GTFS types (GTFSTime, GTFSDate, Entity)
- `lat_lon.py` - Geographic utilities

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License - see LICENSE file for details

## Maintainers

- Jonathan Milot
- Jeremy Steele
31 changes: 31 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
[project]
name = "py-gtfs-loader"
version = "0.3.0"
description = "Load GTFS"
readme = "README.md"
authors = [
{ name = "Jonathan Milot" },
{ name = "Jeremy Steele" }
]
requires-python = ">=3.10"
dependencies = []
license = { text = "MIT" }
classifiers = [
"License :: OSI Approved :: MIT License",
]

[project.urls]
Homepage = "https://github.com/TransitApp/py-gtfs-loader"

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
include = ["gtfs_loader*"]

[dependency-groups]
dev = [
"flake8>=7.3.0",
"pytest>=8.4.2",
]
11 changes: 0 additions & 11 deletions setup.py

This file was deleted.

Loading