Skip to content

Conversation

@jorben
Copy link
Collaborator

@jorben jorben commented Jan 23, 2026

Update ConverterWorker design document to v1.2 with major fixes:

  • Remove FOR UPDATE SKIP LOCKED (SQLite incompatible) โ†’ use Prisma optimistic locking
  • Fix Model.ts import mismatch โ†’ use default import
  • Fix retry count timing โ†’ increment before each attempt
  • Add task cancellation check โ†’ filter CANCELLED tasks in claimPage
  • Add transaction conflict retry โ†’ handle Prisma P2034 errors
  • Improve token extraction โ†’ support multiple providers (OpenAI/Claude/Gemini/Ollama)
  • Reduce streaming update frequency โ†’ 2s throttle to minimize DB writes
  • Remove unused successProgress concept
  • Add content validation logic
  • Optimize error classification with typed exceptions

Update ConverterWorker design document to v1.2 with major fixes:
- Remove FOR UPDATE SKIP LOCKED (SQLite incompatible) โ†’ use Prisma optimistic locking
- Fix Model.ts import mismatch โ†’ use default import
- Fix retry count timing โ†’ increment before each attempt
- Add task cancellation check โ†’ filter CANCELLED tasks in claimPage
- Add transaction conflict retry โ†’ handle Prisma P2034 errors
- Improve token extraction โ†’ support multiple providers (OpenAI/Claude/Gemini/Ollama)
- Reduce streaming update frequency โ†’ 2s throttle to minimize DB writes
- Remove unused successProgress concept
- Add content validation logic
- Optimize error classification with typed exceptions

Co-Authored-By: Claude <noreply@anthropic.com>
@jorben jorben merged commit 74f2eb1 into master Jan 23, 2026
@jorben jorben deleted the j-branch-1 branch January 23, 2026 17:43
jorben added a commit that referenced this pull request Jan 26, 2026
Unify page range input format across all document types by removing
name-based Sheet selection. Excel now uses the same numeric format
as PDF, Word, and PowerPoint (e.g., "1-3,5" instead of "#1-2" or
"Sheet1,ๆ•ฐๆฎ่กจ").

Changes:
- Remove type and names fields from SheetRange interface
- Simplify parseSheetRange to use parseNumeric directly
- Update filterSheets to index-only filtering
- Update documentation and examples

Co-Authored-By: Claude <noreply@anthropic.com>
jorben added a commit that referenced this pull request Jan 26, 2026
โ€ฆity (#17)

* docs: add Office splitter design for Word, PowerPoint and Excel support

* docs: ๐Ÿ“ refactor Office splitter to separate format handlers with security

Refactor the Office splitter design document with major architectural changes:

- Split single OfficeSplitter into WordSplitter, PPTSplitter, ExcelSplitter
- Remove legacy OLE format support (.doc, .ppt, .xls) - only OOXML supported
- Add PathValidator for security against path traversal attacks
- Add PageRangeParser for page/sheet range selection
- Add RenderWindowPoolFactory for shared rendering resources
- Add ChunkedRenderer for memory-optimized large document rendering
- Update architecture diagrams to reflect new modular design
- Add comprehensive test specifications for all components

This design provides better maintainability through separation of concerns
and improved security with explicit path validation.

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: ๐Ÿ“ simplify Excel Sheet range to index-only format

Unify page range input format across all document types by removing
name-based Sheet selection. Excel now uses the same numeric format
as PDF, Word, and PowerPoint (e.g., "1-3,5" instead of "#1-2" or
"Sheet1,ๆ•ฐๆฎ่กจ").

Changes:
- Remove type and names fields from SheetRange interface
- Simplify parseSheetRange to use parseNumeric directly
- Update filterSheets to index-only filtering
- Update documentation and examples

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants