feat(ingest): weekly crawler that opens PRs to TechAPI with new SKUs

## What

A scheduled crawler that ingests new SKUs from canonical upstream sources, normalizes them into TechAPI's JSON schema, and opens a PR against the TechAPI repo. Curators review before merge — fully automated end-to-end is not the goal; reducing toil is.

Builds on the coverage detector ([#1](https://github.com/GetTechAPI/TechEngine/issues/1)) — coverage tells us *what* is missing; ingest tries to *propose* the data.

## Why

Even with [#1](https://github.com/GetTechAPI/TechEngine/issues/1) producing a missing-SKU list, manually authoring each JSON record from scratch is the bottleneck. Most fields (cores, clocks, TDP, socket, process node) are available verbatim on vendor product pages and Wikipedia infoboxes; a crawler that drafts an 80%-complete record cuts curation per SKU from minutes to seconds-of-review.

## Architecture

- `app/ingest/` package with one fetcher per source family
- Shared `app/ingest/normalize.py` — maps source-specific fields → TechAPI schema (units, slug shape, segment enum, manufacturer brand slug lookup)
- `app/ingest/pipeline.py` — orchestrates: pull coverage gaps → for each gap, fetch detail page → normalize → write JSON to a worktree of TechAPI → open PR
- Authentication: PR open via a fine-grained PAT scoped to `GetTechAPI/TechAPI` content+PR write, stored as `TECHAPI_PR_TOKEN` secret on TechEngine
- New workflow `.github/workflows/weekly-ingest.yml` — Mondays after the coverage report runs

## Output contract

Each opened PR:
- Modifies only `data/<category>/<...>/<slug>.json`
- One PR per category per run (so reviews are bounded)
- PR body lists every added SKU with a one-line provenance string and links the source URL
- Marked as `draft` if any field is missing or required by the schema and could not be confidently parsed; otherwise ready-for-review

## Safety rails

- Never modify existing files in the same PR (additions only)
- Slug conflicts: skip + log
- Each record carries `verified: false` and the source URL in `source_urls`; humans flip `verified` on review
- Rate-limit per host with a polite User-Agent

## Out of scope

- Updates/corrections to existing records (separate workflow later)
- Sources without stable structured data (e.g. forum-only specs)

## Acceptance

- `python -m app.ingest --category cpu --limit 5` drafts 5 candidate JSON files and prints the diff
- Weekly workflow opens at least one PR per run against TechAPI
- All proposed records pass `python -m app.validate`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest): weekly crawler that opens PRs to TechAPI with new SKUs #2

What

Why

Architecture

Output contract

Safety rails

Out of scope

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(ingest): weekly crawler that opens PRs to TechAPI with new SKUs #2

Description

What

Why

Architecture

Output contract

Safety rails

Out of scope

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions