Acquisition and processing of AIS data from NOAA Marine Cadastre, with integration into an AISdb-aligned database (SQLite or PostgreSQL/TimescaleDB). It consumes the NOAA archive at https://coast.noaa.gov/htdata/CMSP/AISDataHandler/{year}/ and supports every published format, including daily ZIP archives (2015-2024), zone archives (2009-2014), legacy geodatabases (.gdb.zip, 2011-2013), and Zstandard-compressed CSV (.csv.zst, 2025 onward).
- Downloads every archive NOAA publishes for a year range, resume-friendly by default
- Organizes archives into
{year}{month}folders across all NOAA naming schemes since 2009 - Extracts
.zip(with integrity check) and.csv.zst, and converts legacy.gdb.zipgeodatabases - Filters records to a WGS84 bounding box, skipping and counting malformed rows
- Deduplicates rows in place with atomic replacement
- Simplifies per-vessel trajectories (Visvalingam-Whyatt, Douglas-Peucker, TDTR) with optional quality metrics
- Loads month batches into SQLite or PostgreSQL/TimescaleDB through
aisdb.decode_msgs, with retries
uv sync # core pipeline (download, organize, extract, filter, dedup)
uv sync --extra load # + aisdb for database loading
uv sync --extra gdb # + geopandas/fiona for legacy .gdb.zip conversion
uv sync --extra simplify # + rdp/fastdtw/similaritymeasures for trajectory simplificationDownload one year, organize it, extract it, and load it into a SQLite database.
uv sync --extra load
uv run noaa-integrator download --start-year 2024 --end-year 2024 --dest data/downloads
uv run noaa-integrator organize --base-dir data/downloads
uv run noaa-integrator extract --base-dir data/downloads --dest data/extracted --start 202401 --end 202412
uv run noaa-integrator load --source-dir data/extracted \
--start-year 2024 --end-year 2024 --sqlite marine_cadastre.dbEach stage is a subcommand of noaa-integrator. Stages are independent; run the ones you need, in order.
uv run noaa-integrator download --start-year 2023 --end-year 2024 --dest data/downloadsScrapes the NOAA index per year and streams every archive. Existing non-empty files are skipped (resume-friendly); force with --no-skip-existing.
uv run noaa-integrator organize --base-dir data/downloadsGroups archives into {year}{month} folders (for example 202406/), matching every NOAA naming scheme from 2009 to 2025+.
uv run noaa-integrator extract --base-dir data/downloads --dest data/extracted --start 202301 --end 202312Decompresses .zip (with integrity check) and .csv.zst archives into a mirror {yyyymm} tree. Corrupted archives are reported and skipped, never fatal.
Legacy geodatabase archives (2011-2013) convert separately.
uv run noaa-integrator gdb data/downloads/201303/*.gdb.zip --dest data/extracted/201303uv run noaa-integrator filter \
--base-dir data/extracted --dest data/filtered \
--start-year 2023 --end-year 2023 --start-month 1 --end-month 2 \
--min-lon -77.36 --min-lat 36.02 --max-lon -57.62 --max-lat 48.64Keeps only records inside the WGS84 bounding box. Malformed rows and non-numeric coordinates are skipped and counted; months without matching data produce no output files.
uv run noaa-integrator dedup --directory data/mergedRemoves duplicate rows in-place (header preserved, atomic replace).
uv run noaa-integrator simplify data/merged/*.csv --dest data/simplified --algorithm vw --metricsPer-vessel trajectory simplification with Visvalingam-Whyatt (vw), Douglas-Peucker (rdp), or Time-Dependent Trajectory Reduction (tdtr). --metrics also writes SR, LLR, DTW, Frechet, and ASED per track.
# SQLite
uv run noaa-integrator load --source-dir data/filtered \
--start-year 2023 --end-year 2023 --start-month 1 --end-month 2 \
--sqlite marine_cadastre.db
# PostgreSQL / TimescaleDB
export NOAA_PG_DSN='postgresql://user:password@localhost:5432/noaa'
uv run noaa-integrator load --source-dir data/filtered \
--start-year 2023 --end-year 2023 --dsn --timescaledbDecodes month batches through aisdb.decode_msgs. Failed batches are retried (default 3 attempts). Credentials come from --dsn or the NOAA_PG_DSN environment variable, never from code.
uv sync --all-extras
uv run pytest # test suite
uv run ruff check . # lint
uv run ruff format . # format
uv run pyrefly check # typecheckContinuous integration runs linting, type checking, and the test suite on Ubuntu and macOS for every push and pull request.
Docs · Tutorials · API reference · Website
- AISdb, the core Python/Rust platform for storing, querying, and analyzing AIS data
- AISdb-lite, a lightweight AISdb variant built on PostGIS and TimescaleDB hypertables
- Tutorials, hands-on notebook companions to the GitBook tutorials
This project is licensed under the GNU Affero General Public License v3.0. See LICENSE for the full text.