Refactor: Code organization and bounds-checked range combinators#3
Refactor: Code organization and bounds-checked range combinators#3
Conversation
Split the 440-line style.scala into 10 focused files within xl-core/src/com/tjclp/xl/style/: - alignment.scala (824B) - HAlign, VAlign, Align - border.scala (1.3K) - BorderStyle, BorderSide, Border - cellstyle.scala (1.4K) - CellStyle - color.scala (2.0K) - ThemeSlot, Color - fill.scala (601B) - PatternType, Fill - font.scala (880B) - Font - numfmt.scala (1.7K) - NumFmt - patch.scala (1.9K) - StylePatch (Monoid) - registry.scala (2.4K) - StyleRegistry - units.scala (1.6K) - Pt, Px, Emu, StyleId Benefits: - Improved navigation and discoverability - Better separation of concerns - Easier parallel development - Reduced cognitive load (each file <2.5KB) - 81% reduction in style system max file size Updated imports across: - xl-core: cell, codec, dsl, formatted, html, optics, patch, richtext, sheet - xl-macros: formatted literal macros - xl-ooxml: Styles, XlsxReader - All test files All 287 tests passing. No breaking changes to public API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Split the 450-line sheet.scala into two focused files: - sheet.scala (328 lines) - ColumnProperties, RowProperties, Sheet - workbook.scala (129 lines) - WorkbookMetadata, Workbook Benefits: - Clear separation of worksheet vs workbook concerns - Improved navigation and discoverability - Better single responsibility at file level - 27% reduction in sheet.scala size No import updates needed - types remain in com.tjclp.xl package. All 287 tests passing. No breaking changes to public API. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… plans Restructure documentation for clarity and maintainability: - Remove all arbitrary numbering (00-31) from file names - Archive 11 completed implementation plans (P0-P6, P31) organized by phase - Separate concerns: plan/ (active), design/ (architecture), reference/ (quick ref) - Reduce docs/plan/ from 31 files to 9 (71% reduction - active work only) - Update STATUS.md: 109→263 tests, 75%→85% complete, add P6/P31 features - Update roadmap.md: Comprehensive phase breakdown with completion status - Update domain-model.md: Add RichText and StyleRegistry sections - Add 10 ADRs to decisions.md: Document P4-P31 architectural decisions - Update error-model-and-safety.md: Mark XLError complete, focus P11 security - Update testing-guide.md: Full breakdown of 263 tests across modules - Create new plan/README.md: Index of active plans with clear navigation - Update CLAUDE.md: Document new structure New structure: - docs/plan/ (9 files) - Active future work (P7-P11) only - docs/design/ (6 files) - Timeless architectural decisions - docs/reference/ (5 files) - Examples, glossary, OOXML research - docs/archive/plan/ (11 files) - Completed phases organized by P0-P6, P31 - docs/CONTRIBUTING.md, docs/FAQ.md - Standard locations Result: Crystal-clear organization where docs/plan/ = active work only. Completed phases properly archived. No more arbitrary numbering. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…binators Code Organization: - Split xl-macros/macros.scala (298 lines) into 3 focused modules: * CellRangeLiterals.scala - cell/range/fx literals * BatchPutMacro.scala - batch put macro * FormattedLiterals.scala - money/percent/date/accounting literals - Split xl-core/sheet.scala: Extracted extensions and properties * SheetProperties.scala - ColumnProperties, RowProperties * SheetExtensions.scala - All Sheet extension methods Functional Improvements: - Added Column.MaxIndex0 and Row.MaxIndex0 constants - Added XLError.ValueCountMismatch for better error reporting - Enhanced range combinators with bounds checking: * putRange now validates value count matches range size * putRow enforces Excel column limits (XFD/16384) * putCol enforces Excel row limits (1,048,576) - All three methods now return XLResult[Sheet] instead of Sheet Breaking Changes: - Sheet.putRange, Sheet.putRow, Sheet.putCol now return XLResult[Sheet] Testing: - All 267 tests pass - Added RangeCombinatorsSpec.scala for new error handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Pull Request Review: Code Organization and Bounds-Checked Range CombinatorsOverall Assessment: Excellent ✅This is a well-executed refactoring that significantly improves code organization and adds critical safety features. The changes align perfectly with the project's purity charter and demonstrate strong attention to detail. Strengths1. Code Organization 🎯The file splitting is thoughtful and follows clear separation of concerns:
Impact: File sizes reduced by 27-81%, making navigation and maintenance significantly easier. 2. Safety Improvements 🛡️The bounds checking additions are critical for production use:
Benefits: Prevents silent data loss when writing beyond Excel's limits (XFD column, row 1,048,576). 3. Test Coverage ✅Excellent test additions in
267 tests passing demonstrates stability. 4. Documentation 📚
Issues & RecommendationsMinor: Bounds Checking Clarity 💭Location: Current Implementation: The bounds check uses Suggestion: Consider making the validation logic clearer by computing Impact: Low - current code is functionally correct, this just improves clarity. Minor: Documentation Enhancement 📝Location: Suggestion: Add rationale for strict size matching: /**
* Put cells in a range in row-major order.
*
* The number of supplied values must exactly match the range size to prevent
* silent data loss from truncation or unintended empty cells from padding.
*/Minor: Consistent Error Message Format 📐Observation: Error messages use slightly different styles:
Suggestion: Use consistent format for both (either mention limits for both or neither). Security Considerations 🔒Good: Bounds checking prevents buffer overflows and out-of-memory issues from malicious/malformed inputs. Consider: Add fuzz testing for edge cases:
Performance Considerations ⚡Excellent:
Note: Test Coverage Assessment 🧪Excellent coverage in
Missing tests (nice-to-have):
Breaking Changes Analysis 📋Changes: Migration path: Well-documented in CHANGELOG and README with clear examples. Code Quality 💎Excellent:
Final RecommendationApprove ✅ This PR represents high-quality work that significantly improves the codebase. All suggestions above are minor improvements for clarity and consistency. Action Items (All Optional):
Key Wins:
Reviewed by: Claude Code 🤖 Generated with Claude Code |
Documentation Improvements: - Enhanced putRange documentation with rationale for exact size matching (prevents silent data loss from truncation or padding) - Added detailed ScalaDoc for parameters and return values Code Clarity: - Clarified bounds checking logic in putRow and putCol - Changed from computing lastIndex to explicit maxAllowed calculation - Added inline comments explaining what's being validated Error Message Consistency: - Standardized error format across putRow and putCol - Both now mention value count, starting position, and Excel limit - Format: "Cannot write X values starting at Y (Excel limit: Z)" Testing: - Updated test assertions to match new error message format - All 267 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
PR Review: Code Organization and Bounds-Checked Range CombinatorsThank you for this well-structured refactoring PR! Overall, this is excellent work that significantly improves code maintainability and adds critical safety features. The changes align well with the project's philosophy of purity, totality, and safety. ✅ Strengths1. Excellent Code OrganizationThe file splits are well-motivated and follow single-responsibility principle:
2. Critical Safety ImprovementsAdding bounds checking to range combinators addresses a real safety gap:
3. Appropriate Breaking ChangesThe return type changes (
4. Strong Test Coverage
🔍 Observations & Minor Issues1. Potential Edge Case in
|
Addresses 3 high-priority issues identified in PR #4 code review: **Issue #1: O(n²) Performance** - Changed style indexOf from O(n²) to O(1) using Maps - Pre-build fontMap, fillMap, borderMap for constant-time lookups - Styles.scala:185-193 **Issue #2: Non-Deterministic allFills Ordering** - Fixed .distinct using Set with non-deterministic iteration - Replaced with manual LinkedHashSet-style tracking - Preserves first-occurrence order for stable diffs - Styles.scala:166-179 **Issue #3: XXE Security Vulnerability** - Added XXE protection to XML parser - Configured SAXParserFactory to reject DOCTYPE declarations - Disabled external entity references - XlsxReader.scala:205-232 **Tests Added** (+20 tests, 645 total passing): - StylePerformanceSpec: 2 tests (1000+ styles, sub-quadratic scaling) - DeterminismSpec: 4 tests (fill ordering, XML stability) - SecuritySpec: 3 tests (XXE rejection, legitimate file parsing) All existing tests pass. Code formatted with Scalafmt 3.10.1. Fixes #1 (performance), #2 (determinism), #3 (security) from PR #4 review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Marked Issues #1, #2, #3 as completed with implementation details. Updated Definition of Done with all completed items. Status changed to Partially Complete (3 critical fixes done, 4 medium-priority deferred). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed 3 critical issues from code reviews while maintaining surgical
modification compatibility and byte-identical preservation.
## Issues Fixed
**Issue 1: remove require() - Violates Purity Charter**
- Location: VmlDrawing.scala:94-95
- Removed require() calls that throw exceptions
- ARef already enforces Excel limits at type level
**Issue 2: Unauthored Comments Get Wrong Author** (Codex P1)
- Location: XlsxWriter.scala:208-211
- Reserve authorId=0 for empty string ("") when unauthored comments exist
- Prevents unauthored comments from showing first real author's name
**Issue 3: Author Prepending Asymmetry**
- Location: XlsxReader.scala:296-310 (new stripAuthorPrefix function)
- Reader now strips "Author:\n" prefix added by writer
- Conservative pattern matching: only strips exact XL-generated format
- Preserves real Excel files with different author text patterns
- Maintains round-trip law: read(write(wb)) == wb
## Testing
- ✅ All 114 ooxml tests passing
- ✅ Surgical demo: 79 comments preserved byte-for-byte
- ✅ Round-trip test added for authored comments
- ✅ No Excel corruption
## Files Modified
- xl-ooxml/src/com/tjclp/xl/ooxml/VmlDrawing.scala (-4 LOC)
- xl-ooxml/src/com/tjclp/xl/ooxml/XlsxReader.scala (+40 LOC)
- xl-ooxml/src/com/tjclp/xl/ooxml/XlsxWriter.scala (+6 LOC)
- xl-ooxml/test/src/com/tjclp/xl/ooxml/CommentsSpec.scala (+41 LOC)
- xl-ooxml/test/src/com/tjclp/xl/ooxml/XlsxWriterRealWorldSpec.scala (+19 LOC)
## Outstanding (deferred to follow-up)
- Surgical comment regeneration (Issue #3 from Codex) - complex, needs relationship merging
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
…preservation Addresses additional HIGH PRIORITY feedback: Issue #1: fromColumnNames() Bypasses Validation - Changed return type to XLResult[TableSpec] - Delegates to create() for validation - Added unsafeFromColumnNames() for tests Issue #2: Primary Constructor (Mitigated) - Kept case class for equals/copy/pattern matching - fromColumnNames now validates - Documentation warns against direct usage Issue #3: UID Preservation Tests - Added round-trip test for all UIDs - Fixed parsing: elem.attributes.asAttrMap.get("xr:uid") All 851 tests passing (56 table tests total). 🎉 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implements 4 high-impact rendering features for closer Excel parity: 1. Merged Cells (#1) - Both renderers - HTML: colspan/rowspan attributes on anchor cells - SVG: single rect spanning merged region, text centered - Interior cells correctly skipped in both formats 2. Text Wrapping (#2) - SVG only - Word-boundary text wrapping via wrapText style property - Multi-line rendering using tspan elements with y positioning - Line height calculation (1.4x font size) - Vertical alignment respected for wrapped text blocks 3. Indentation (#3) - Both renderers - HTML: padding-left CSS (~21px per indent level) - SVG: x-position offset with alignment awareness - Center alignment shifts slightly, right alignment ignores indent 4. SVG Vertical Alignment (#4) - Top/Middle/Bottom alignment via textYPosition helper - Baseline offset calculation (80% of font size) - Default: Bottom (Excel's default behavior) Test coverage: 35 new tests (20 SVG + 15 HTML) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
This PR improves code organization by splitting monolithic files into focused modules and adds robust bounds checking to range combinators.
Code Organization
xl-macros Module
macros.scala(298 lines monolithic file)CellRangeLiterals.scala- cell/range/fx compile-time literalsBatchPutMacro.scala- batch put macro for elegant syntaxFormattedLiterals.scala- money/percent/date/accounting literalsxl-core Module
sheet.scala(reduced from 450 → 328 lines)SheetProperties.scala(18 lines) - ColumnProperties, RowPropertiesSheetExtensions.scala(217 lines) - All Sheet extension methodsworkbook.scala(129 lines) - Workbook and WorkbookMetadata (previous commit)xl-core/style Package (Previous Commit)
style.scala(440 lines) into 10 focused files:units.scala,color.scala,numfmt.scala,font.scala,fill.scalaborder.scala,alignment.scala,cellstyle.scala,patch.scala,registry.scalaFunctional Improvements
Bounds Checking
Column.MaxIndex0(16383) andRow.MaxIndex0(1048575) constantsputRange- validates value count matches range sizeputRow- enforces Excel column limits (A-XFD)putCol- enforces Excel row limits (1-1,048,576)Error Handling
XLError.ValueCountMismatchfor clearer error messagesXLResult[Sheet]instead ofSheetBreaking Changes
XLResult[Sheet]:Sheet.putRange(range, values): XLResult[Sheet]Sheet.putRow(row, startCol, values): XLResult[Sheet]Sheet.putCol(col, startRow, values): XLResult[Sheet]Callers must handle the result explicitly with
.getOrElse(), pattern matching, or for-comprehensions.Testing
RangeCombinatorsSpec.scalawith 4 new tests for error handlingImpact
File Size Reductions:
Benefits:
🤖 Generated with Claude Code