Validation, ingestion, and serving engine for the TechAPI dataset.
Code: MIT · Data: lives in TechAPI (CC-BY-SA 4.0)
TechEngine owns everything around the data: schema validation, the FastAPI read API, the static JSON dump generator, the engine's own landing site, and (next up) automated coverage checks and a weekly ingestion crawler.
The dataset and the public-facing playground site live in TechAPI so each can be versioned, mirrored, and licensed independently. The site shipped in this repo is the engine's own landing — what TechEngine is, what it runs, link out to docs.
app/
├ validate.py # schema/range/uniqueness checks
├ seed.py # data/ → SQLModel database
├ dump.py # API → static JSON tree
├ main.py # FastAPI entrypoint
├ models/ # SQLModel tables
├ routers/ # /v1/{brands,socs,smartphones,gpus,cpus}
├ schemas/ # Pydantic response models
├ services/ # scoring (algorithm_version-tagged)
├ coverage/ # upstream-vs-curated diff + Markdown report
└ ingest/ # draft new records from upstream pages
tests/ # unit + integration
site/ # Astro engine landing (deploys to Pages)
docs/ # SPEC / DATA_PIPELINE / DEVELOPMENT
.github/workflows/
├ validate-data.yml # workflow_call: PR-time data validation for TechAPI
├ refresh-data.yml # cron: regenerate the static dump weekly
├ coverage-report.yml # cron: gap report, sticky issue
├ weekly-ingest.yml # cron: drafts new SKUs, opens PR against TechAPI
├ deploy-pages.yml # build & deploy engine site + dump
└ test.yml # lint + type-check + tests
┌────────────────────┐ ┌──────────────────────────┐
│ TechAPI (data/) │ workflow_call │ TechEngine (this repo) │
│ + bundled self- │ ─────────────▶ │ validate-data.yml │
│ check (PR) │ ◀───────────── │ (checks out TechAPI) │
└────────────────────┘ └──────────────────────────┘
Every Python entry point reads data from a sibling TechAPI checkout. The
location can be overridden via TECHAPI_DATA_DIR; the default looks for
../TechAPI/data next to this repo, which matches a local dev layout.
git clone https://github.com/Seungpyo1007/TechAPI.git ../TechAPI # data source
pip install -e ".[dev]"
python -m app.validate # check data integrity
python -m app.seed # data/ → ./techapi.db (SQLite)
uvicorn app.main:app --reload # serve; curl localhost:8000/v1/cpus/ryzen-9-9950x3d
python -m app.dump # generate ./dump/v1/... static treeTECHAPI_DATA_DIR=/path/to/TechAPI/data overrides the data location.
docker compose up --buildSpins up Postgres 16, seeds from the mounted TechAPI checkout, serves on :8000.
- Split out from TechAPI; sibling-checkout data pipeline
- Coverage gap detector — diff curated dataset vs upstream catalogs and surface missing SKUs as a sticky weekly issue (#1)
- Weekly ingestion crawler — scrape canonical sources and open PRs
against TechAPI with new SKUs (requires
TECHAPI_PR_TOKENsecret to push) (#2) - More sources (Intel ARK, AMD product pages, TechPowerUp DB)
Code is licensed under the MIT License. The dataset (in TechAPI) is licensed CC-BY-SA 4.0 separately.