This project is used to update the databases used by Game Store Catalog—a cross-store indexing and data standardization project designed to maintain accurate, normalized information about games across multiple digital storefronts.
- Steam — Updates take days to complete
- PlayStation Network (PSN) — Updates take ≤ 15min
- Xbox Store — Updates take ≤ 15min
- Nintendo eShop — Updates take ≤ 15min
- Aggregate and normalize game information from various platforms.
- Export consistent, language-agnostic JSON files for external applications, web frontends, or dashboards.
- Enable community-driven maintenance and mirroring of catalog data.
The project outputs three JSON file types. See the online builder to help make URL schemas and see examples.
A key/value map represented as a list of [name, { ...props }] pairs.
Each entry corresponds to a single game and includes metadata fields.
Example:
[
[
"A Bibelot: Tiret sur Will",
{
"name": "A Bibelot: Tiret sur Will",
"type": "game",
"price": "$2.49",
"image": "https://assets.nintendo.com/image/upload/store/software/switch/70010000097038/7504a19b54adc8b3ac9bb09d71a4c0b9bb5d56baa8522149051c880c9616ec42",
"href": "https://www.nintendo.com/us/store/products/a-bibelot-tiret-sur-will-switch/",
"uuid": "70010000097038",
"platforms": [
"Switch"
],
"rating": "everyone"
}
]
]Each file is an array of simple objects used for alphabetical browsing.
Example:
[
{
"name": "A Bibelot: Tiret sur Will",
"type": "game",
"price": "$2.49",
"image": "https://assets.nintendo.com/image/upload/store/software/switch/70010000097038/7504a19b54adc8b3ac9bb09d71a4c0b9bb5d56baa8522149051c880c9616ec42",
"href": "https://www.nintendo.com/us/store/products/a-bibelot-tiret-sur-will-switch/",
"uuid": "70010000097038",
"platforms": [
"Switch"
],
"rating": "everyone"
}
]A file hosting minimal metadata about the update/files.
Example:
{
"size": 12815,
"date": "2025-11-09T03:18:32.207Z",
"diff": -1137
}Each store uses a dedicated adapter implementing a common interface. This makes the project modular and easy to extend.
/adapters
├─ steam.py → Steam Store API
├─ psn.py → PlayStation Store API
├─ xbox.py → Microsoft Store API
└─ nintendo.py → Nintendo eShop
Adapters yield normalized GameRecord objects that match the schema above.
- models.py — Defines the canonical data model.
- normalize.py — Cleans and formats raw fields.
- ingest.py — Merges and formats databases.
- http.py — Handles throttled, retried HTTP requests.
- io_writer.py — Writes normalized JSON output (bang and per-letter files).
- runner.py — Orchestrates multiple adapters and output pipelines.
For persistent storage, you may stage game data into SQLite/PostgreSQL tables and merge changes via content hashes.
A new, resilient implementation written in Python 3.11+ adds robust networking, validation, and normalization. It exports data that exactly matches the JSON produced by the original project.
- Per-domain rate limiting, backoff, and retry logic.
- Unified price, title, and date normalization.
- Modular adapter system for multiple stores.
- Structured output matching
game-store-catalog. - Optional deduplication and data merge pipeline.
- Fully type-checked and tested with
pydantic,pytest, andruff.
- Python 3.11+
httpx,pydantic,aiolimiter,tenacity
# Install dependencies automatically
INSTALL
# Run a single store (Steam)
crawl.bat --stores steam --out ./out --country US --locale en-US
# Run multiple stores
crawl.bat --stores steam,psn,xbox,nintendo --out ./out
# Persist progress to resume after interruptions (default path: catalog-cache.db)
crawl.bat --stores steam --cache-db ./catalog-cache.db# Create a virtual environment
python -m venv .venv && source .venv/bin/activate # (Windows: .venv\Scripts\activate)
# Install dependencies
pip install -e ".[dev]"
# Run a single store (Steam)
python scripts/crawl.py --stores steam --out ./out --country US --locale en-US
# Run multiple stores
python scripts/crawl.py --stores steam,psn,xbox,nintendo --out ./out
# Use the authenticated Steam app list (requires a Steam Web API key)
STEAM_API_KEY=your_key_here python scripts/crawl.py --stores steam --out ./outscripts/crawl.py caches normalized records to a SQLite database (catalog-cache.db by default) so you can restart the crawler after a network outage or reboot without losing progress. Tune the behavior with:
--cache-db PATH_OR_URL— choose where the cache is stored (file path or SQLAlchemy URL).--no-cache— disable caching for the current run.--no-resume-cache— continue writing to the cache but skip preloading existing records when a crawl starts.--cache-commit-interval N— commit cached records every N entries (defaults to 50).
Outputs will appear as:
out/
└─ steam/
├─ !.json
├─ $.json
├─ _.json
├─ a.json
├─ b.json
└─ ...
- Adapters: One per store; each yields normalized records.
- HTTP: Centralized client with rate limiting and retries.
- Normalization: Cleans titles (™/®), standardizes pricing and release dates.
- Validation: Pydantic models enforce structure and type safety.
- Writing: Exports files in the same layout as the original catalog.
- Optional Merge: Stage → dedupe → merge with hash comparison.
- Retries and exponential backoff for unstable endpoints.
- Avoids prototype modification (no patching built-ins).
- Respects store rate limits and TOS.
- Unified schema validation prevents malformed JSON.
store-scraper/
├─ scripts/
│ ├─ catalog/
│ │ ├─ adapters/
│ │ │ ├─ __init__.py
│ │ │ ├─ base.py # Adapter interface
│ │ │ ├─ nintendo.py # Working Adapter (Nintendo)
│ │ │ ├─ psn.py # Working Adapter (PlayStation)
│ │ │ ├─ steam.py # Working Adapter (Steam)
│ │ │ └─ xbox.py # Working Adapter (Xbox)
│ │ │
│ │ ├─ __init__.py
│ │ ├─ db.py # Database (SQL) handler
│ │ ├─ dedupe.py # optional cross-store clustering (title/year/publisher)
│ │ ├─ http.py # resilient HTTP (rate limit, retries, backoff)
│ │ ├─ ingest.py # staging/merge helpers (SQLite/Postgres optional)
│ │ ├─ io_writer.py # writes !.json and a..z/_.json in your exact format
│ │ ├─ models.py # Pydantic models that mirror the JSON schema
│ │ ├─ normalize.py # title/price/date/platform normalization
│ │ └─ runner.py # Orchestrates adapters, validation, writing
│ │
│ └─ crawl.py # CLI entry point
│
├─ tests/ # Pytest tests (fixtures later)
├─ CRAWL.bat # CLI entry point
├─ INSTALL.bat # Dependency installer
├─ pyproject.toml # Project dependency file
├─ README.md # Project overview
└─ SCHEMA.md # This file, project outline
- Follow 3-space indentation for Python.
- Run
ruffandpytestbefore submitting changes. - Each adapter should return clean
GameRecordobjects. - PRs adding new stores should include sample output and documentation.
MIT — © Ephellon and contributors.