fix(data): address C03 review — bbox corrections, metadata, sort order#13
Merged
Conversation
Five bbox fixes from self-review: - he: tighten y1 (letter body only; legibility downgraded to low) - ayin: shift x0 right to exclude adjacent ק; use 2px margins - lamed v0001: correct x0 from 333→306 (was clipping left side of letter) - het: adjust x0/x1 to isolate ח strokes; tighter right margin - resh: trim x0 by 4px to exclude stray dot from adjacent letter yod v0001 deleted — could not be cleanly isolated from surrounding letters in "לי"; former v0002 (from "לאמי") renumbered to v0001. Metadata fixes across all 23 entries: - extraction.notes: per-entry source word and line context - legibility: evaluated per-crop (high/medium/low) instead of blanket "high" - usable_for_syngen: false for crops <25px or with ambiguous isolation Rachel block in entries.jsonl now sorted by entry_id. writers.jsonl: added note listing missing letter forms (mem_final, kaf_final, pe, pe_final, tsadi, tsadi_final) and unexplored scans. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #12. Addresses all issues raised in self-review.
Bbox fixes (Issues 1–6)
he__v0001y1— letter body only; removes empty space belowayin__v0001x0right to exclude adjacent ק; 2px marginslamed__v0001x0333→306 (was clipping ~27px off the left side of ל)yod__v0001v0002(from "לאמי") renumbered tov0001het__v0001x0/x1to bound ח strokes only; tighter right marginresh__v0001x0by 4px to exclude stray dot from adjacent letterMetadata fixes (Issues 7–9)
extraction.notes: per-entry source word and line context (was identical boilerplate across all 24 entries)legibility: evaluated per-crop —high/medium/lowas appropriate (was blanket"high")usable_for_syngen:falsefor crops <25px or ambiguously isolated (was blankettrue)Sort order (Issue 10)
Rachel block in
entries.jsonlnow sorted byentry_id.Writer record (Issue 11)
ingest.agent_notesnow lists the 6 missing letter forms (mem_final,kaf_final,pe,pe_final,tsadi,tsadi_final) and notes thatgan_naulandbegani_netatikhascans were not exhaustively explored.Validation
🤖 Generated with Claude Code