Skip to content

fix(data): address C03 review — bbox corrections, metadata, sort order#13

Merged
shaypal5 merged 1 commit into
mainfrom
data/rachel-bluwstein
May 14, 2026
Merged

fix(data): address C03 review — bbox corrections, metadata, sort order#13
shaypal5 merged 1 commit into
mainfrom
data/rachel-bluwstein

Conversation

@shaypal5
Copy link
Copy Markdown
Contributor

Follow-up to #12. Addresses all issues raised in self-review.

Bbox fixes (Issues 1–6)

Entry Change
he__v0001 Tighten y1 — letter body only; removes empty space below
ayin__v0001 Shift x0 right to exclude adjacent ק; 2px margins
lamed__v0001 Correct x0 333→306 (was clipping ~27px off the left side of ל)
yod__v0001 Deleted — could not be cleanly isolated in "לי"; former v0002 (from "לאמי") renumbered to v0001
het__v0001 Adjust x0/x1 to bound ח strokes only; tighter right margin
resh__v0001 Trim x0 by 4px to exclude stray dot from adjacent letter

Metadata fixes (Issues 7–9)

  • extraction.notes: per-entry source word and line context (was identical boilerplate across all 24 entries)
  • legibility: evaluated per-crop — high / medium / low as appropriate (was blanket "high")
  • usable_for_syngen: false for crops <25px or ambiguously isolated (was blanket true)

Sort order (Issue 10)

Rachel block in entries.jsonl now sorted by entry_id.

Writer record (Issue 11)

ingest.agent_notes now lists the 6 missing letter forms (mem_final, kaf_final, pe, pe_final, tsadi, tsadi_final) and notes that gan_naul and begani_netatikha scans were not exhaustively explored.

Validation

ok: 2 writers, 48 entries, 48 files verified, 48 upstream-cross-checked
62 passed, 1 skipped
git diff --check: clean

🤖 Generated with Claude Code

Five bbox fixes from self-review:
- he: tighten y1 (letter body only; legibility downgraded to low)
- ayin: shift x0 right to exclude adjacent ק; use 2px margins
- lamed v0001: correct x0 from 333→306 (was clipping left side of letter)
- het: adjust x0/x1 to isolate ח strokes; tighter right margin
- resh: trim x0 by 4px to exclude stray dot from adjacent letter

yod v0001 deleted — could not be cleanly isolated from surrounding letters
in "לי"; former v0002 (from "לאמי") renumbered to v0001.

Metadata fixes across all 23 entries:
- extraction.notes: per-entry source word and line context
- legibility: evaluated per-crop (high/medium/low) instead of blanket "high"
- usable_for_syngen: false for crops <25px or with ambiguous isolation

Rachel block in entries.jsonl now sorted by entry_id.

writers.jsonl: added note listing missing letter forms (mem_final,
kaf_final, pe, pe_final, tsadi, tsadi_final) and unexplored scans.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shaypal5 shaypal5 merged commit 15231b1 into main May 14, 2026
3 checks passed
@shaypal5 shaypal5 deleted the data/rachel-bluwstein branch May 14, 2026 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant