Overview

This project is used to update the databases used by Game Store Catalog—a cross-store indexing and data standardization project designed to maintain accurate, normalized information about games across multiple digital storefronts.

Supported Stores

Steam — Updates take days to complete
PlayStation Network (PSN) — Updates take ≤ 15min
Xbox Store — Updates take ≤ 15min
Nintendo eShop — Updates take ≤ 15min

Purpose

Aggregate and normalize game information from various platforms.
Export consistent, language-agnostic JSON files for external applications, web frontends, or dashboards.
Enable community-driven maintenance and mirroring of catalog data.

JSON Schema

The project outputs three JSON file types. See the online builder to help make URL schemas and see examples.

1. Bang File: `!.json`

A key/value map represented as a list of [name, { ...props }] pairs. Each entry corresponds to a single game and includes metadata fields.

Example:

[
    [
        "A Bibelot: Tiret sur Will",
        {
            "name": "A Bibelot: Tiret sur Will",
            "type": "game",
            "price": "$2.49",
            "image": "https://assets.nintendo.com/image/upload/store/software/switch/70010000097038/7504a19b54adc8b3ac9bb09d71a4c0b9bb5d56baa8522149051c880c9616ec42",
            "href": "https://www.nintendo.com/us/store/products/a-bibelot-tiret-sur-will-switch/",
            "uuid": "70010000097038",
            "platforms": [
                "Switch"
            ],
            "rating": "everyone"
        }
    ]
]

2. Per-letter Files: `_.json`, `a.json`, `b.json`, … `z.json`

Each file is an array of simple objects used for alphabetical browsing.

Example:

[
    {
        "name": "A Bibelot: Tiret sur Will",
        "type": "game",
        "price": "$2.49",
        "image": "https://assets.nintendo.com/image/upload/store/software/switch/70010000097038/7504a19b54adc8b3ac9bb09d71a4c0b9bb5d56baa8522149051c880c9616ec42",
        "href": "https://www.nintendo.com/us/store/products/a-bibelot-tiret-sur-will-switch/",
        "uuid": "70010000097038",
        "platforms": [
            "Switch"
        ],
        "rating": "everyone"
    }
]

3. Metadata Files: `$.json`

A file hosting minimal metadata about the update/files.

Example:

{
    "size": 12815,
    "date": "2025-11-09T03:18:32.207Z",
    "diff": -1137
}

Architecture

Adapter-based design

Each store uses a dedicated adapter implementing a common interface. This makes the project modular and easy to extend.

/adapters
 ├─ steam.py       → Steam Store API
 ├─ psn.py         → PlayStation Store API
 ├─ xbox.py        → Microsoft Store API
 └─ nintendo.py    → Nintendo eShop

Adapters yield normalized GameRecord objects that match the schema above.

Core Modules

models.py — Defines the canonical data model.
normalize.py — Cleans and formats raw fields.
ingest.py — Merges and formats databases.
http.py — Handles throttled, retried HTTP requests.
io_writer.py — Writes normalized JSON output (bang and per-letter files).
runner.py — Orchestrates multiple adapters and output pipelines.

Database (optional)

For persistent storage, you may stage game data into SQLite/PostgreSQL tables and merge changes via content hashes.

Python Implementation

A new, resilient implementation written in Python 3.11+ adds robust networking, validation, and normalization. It exports data that exactly matches the JSON produced by the original project.

Key Features

Per-domain rate limiting, backoff, and retry logic.
Unified price, title, and date normalization.
Modular adapter system for multiple stores.
Structured output matching game-store-catalog.
Optional deduplication and data merge pipeline.
Fully type-checked and tested with pydantic, pytest, and ruff.

Installation & Setup

Requirements

Python 3.11+
httpx, pydantic, aiolimiter, tenacity

Quick Start

Option 1 — Run `INSTALL.bat`

# Install dependencies automatically
INSTALL

# Run a single store (Steam)
crawl.bat --stores steam --out ./out --country US --locale en-US

# Run multiple stores
crawl.bat --stores steam,psn,xbox,nintendo --out ./out

# Persist progress to resume after interruptions (default path: catalog-cache.db)
crawl.bat --stores steam --cache-db ./catalog-cache.db

Option 2 — Manage a Virtual Environment

# Create a virtual environment
python -m venv .venv && source .venv/bin/activate  # (Windows: .venv\Scripts\activate)

# Install dependencies
pip install -e ".[dev]"

# Run a single store (Steam)
python scripts/crawl.py --stores steam --out ./out --country US --locale en-US

# Run multiple stores
python scripts/crawl.py --stores steam,psn,xbox,nintendo --out ./out

# Use the authenticated Steam app list (requires a Steam Web API key)
STEAM_API_KEY=your_key_here python scripts/crawl.py --stores steam --out ./out

Resuming after interruptions

scripts/crawl.py caches normalized records to a SQLite database (catalog-cache.db by default) so you can restart the crawler after a network outage or reboot without losing progress. Tune the behavior with:

--cache-db PATH_OR_URL — choose where the cache is stored (file path or SQLAlchemy URL).
--no-cache — disable caching for the current run.
--no-resume-cache — continue writing to the cache but skip preloading existing records when a crawl starts.
--cache-commit-interval N — commit cached records every N entries (defaults to 50).

Outputs will appear as:

out/
 └─ steam/
    ├─ !.json
    ├─ $.json
    ├─ _.json
    ├─ a.json
    ├─ b.json
    └─ ...

Design Notes

Adapters: One per store; each yields normalized records.
HTTP: Centralized client with rate limiting and retries.
Normalization: Cleans titles (™/®), standardizes pricing and release dates.
Validation: Pydantic models enforce structure and type safety.
Writing: Exports files in the same layout as the original catalog.
Optional Merge: Stage → dedupe → merge with hash comparison.

Reliability Enhancements

Retries and exponential backoff for unstable endpoints.
Avoids prototype modification (no patching built-ins).
Respects store rate limits and TOS.
Unified schema validation prevents malformed JSON.

Example Project Layout

store-scraper/
    ├─ scripts/
    │   ├─ catalog/
    │   │   ├─ adapters/
    │   │   │   ├─ __init__.py
    │   │   │   ├─ base.py          # Adapter interface
    │   │   │   ├─ nintendo.py      # Working Adapter (Nintendo)
    │   │   │   ├─ psn.py           # Working Adapter (PlayStation)
    │   │   │   ├─ steam.py         # Working Adapter (Steam)
    │   │   │   └─ xbox.py          # Working Adapter (Xbox)
    │   │   │
    │   │   ├─ __init__.py
    │   │   ├─ db.py                # Database (SQL) handler
    │   │   ├─ dedupe.py            # optional cross-store clustering (title/year/publisher)
    │   │   ├─ http.py              # resilient HTTP (rate limit, retries, backoff)
    │   │   ├─ ingest.py            # staging/merge helpers (SQLite/Postgres optional)
    │   │   ├─ io_writer.py         # writes !.json and a..z/_.json in your exact format
    │   │   ├─ models.py            # Pydantic models that mirror the JSON schema
    │   │   ├─ normalize.py         # title/price/date/platform normalization
    │   │   └─ runner.py            # Orchestrates adapters, validation, writing
    │   │
    │   └─ crawl.py                 # CLI entry point
    │
    ├─ tests/                       # Pytest tests (fixtures later)
    ├─ CRAWL.bat                    # CLI entry point
    ├─ INSTALL.bat                  # Dependency installer
    ├─ pyproject.toml               # Project dependency file
    ├─ README.md                    # Project overview
    └─ SCHEMA.md                    # This file, project outline

Contributing

Follow 3-space indentation for Python.
Run ruff and pytest before submitting changes.
Each adapter should return clean GameRecord objects.
PRs adding new stores should include sample output and documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Supported Stores

Purpose

JSON Schema

1. Bang File: `!.json`

2. Per-letter Files: `_.json`, `a.json`, `b.json`, … `z.json`

3. Metadata Files: `$.json`

Architecture

Adapter-based design

Core Modules

Database (optional)

Python Implementation

Key Features

Installation & Setup

Requirements

Quick Start

Option 1 — Run `INSTALL.bat`

Option 2 — Manage a Virtual Environment

Resuming after interruptions

Design Notes

Reliability Enhancements

Example Project Layout

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
scripts		scripts
.gitignore		.gitignore
CRAWL.bat		CRAWL.bat
INSTALL.bat		INSTALL.bat
README.md		README.md
SCHEMA.md		SCHEMA.md
add_missing_appids.py		add_missing_appids.py
pyproject.toml		pyproject.toml

Ephellon/store-scraper

Folders and files

Latest commit

History

Repository files navigation

Overview

Supported Stores

Purpose

JSON Schema

1. Bang File: !.json

2. Per-letter Files: _.json, a.json, b.json, … z.json

3. Metadata Files: $.json

Architecture

Adapter-based design

Core Modules

Database (optional)

Python Implementation

Key Features

Installation & Setup

Requirements

Quick Start

Option 1 — Run INSTALL.bat

Option 2 — Manage a Virtual Environment

Resuming after interruptions

Design Notes

Reliability Enhancements

Example Project Layout

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Bang File: `!.json`

2. Per-letter Files: `_.json`, `a.json`, `b.json`, … `z.json`

3. Metadata Files: `$.json`

Option 1 — Run `INSTALL.bat`

Packages