Skip to content

Conversation

zlatanovic-nebojsa
Copy link

Fix BIFF8 DIMENSIONS record to use 0-based column indices

Problem

After migrating from PHPExcel to PhpSpreadsheet, our API consumers reported that their Excel parsers were not reading all data from generated XLS files. The files opened correctly in modern Excel, but older Excel parsers (used by legacy systems) were missing columns.

Investigation

We initially attempted a different DIMENSIONS fix, but consumers continued to report issues. To ensure 100% compatibility with legacy parsers, I analyzed old PHPExcel Excel5 output at the byte level and discovered that PhpSpreadsheet's DIMENSIONS record uses 1-based column indices instead of 0-based as required by the BIFF8 specification.

This subtle difference causes older parsers to misread the worksheet dimensions, resulting in missing data.

Root Cause

The XLS writer incorrectly uses 1-based column indices in the BIFF8 DIMENSIONS record, violating the Microsoft Excel Binary File Format specification which requires 0-based indices.

Example: For columns A-G (7 columns):

  • Current (incorrect): colMic=1, colMac=8
  • Correct (per BIFF8 spec): colMic=0, colMac=7

Fix

Modified column index initialization to subtract 1 from Coordinate::columnIndexFromString() to convert from 1-based to 0-based indexing.

Changes:

// Before
$this->firstColumnIndex = Coordinate::columnIndexFromString($minC);
$this->lastColumnIndex = Coordinate::columnIndexFromString($maxC);

// After (BIFF8 compliant)
$this->firstColumnIndex = Coordinate::columnIndexFromString($minC) - 1;
$this->lastColumnIndex = Coordinate::columnIndexFromString($maxC) - 1;

Also updated the COLINFO loop to use the corrected 0-based lastColumnIndex (removed the -1 adjustment that was compensating for the incorrect 1-based value).

Testing

Custom test confirms DIMENSIONS record now correctly uses 0-based indices:

✓ colMic = 0 (CORRECT - 0-based)
✓ colMac = 7 (CORRECT - 0-based, 7 columns A-G)

Impact

  • Fixes: Legacy Excel parser compatibility
  • Side effect fix: Extra empty column no longer appears when converting XLS to CSV
  • BIFF8 compliant: Matches Microsoft specification
  • Backwards compatible: Aligns with old PHPExcel Excel5 writer behavior

References

  • Microsoft Excel Binary File Format Specification (BIFF8)
  • DIMENSIONS record: colMic (first column) and colMac (column after last) must be 0-based

Fixes #4682

The XLS writer incorrectly used 1-based column indices in the BIFF8
DIMENSIONS record, violating the Microsoft Excel Binary File Format
specification which requires 0-based indices.

This bug caused an extra empty column to appear when converting XLS
files to other formats (e.g., CSV).

Changes:
- Modified column index initialization to subtract 1 from the result
  of Coordinate::columnIndexFromString() to convert from 1-based to
  0-based indexing
- Updated COLINFO loop to use the corrected 0-based lastColumnIndex

Per BIFF8 specification:
- colMic (first column) must be 0-based
- colMac (column after last column) must be 0-based

Example: For columns A-G (7 columns):
- Before: colMic=1, colMac=8 (incorrect)
- After:  colMic=0, colMac=7 (correct)

Fixes PHPOffice#4682
$this->lastColumnIndex = Coordinate::columnIndexFromString($maxC);
// BIFF8 requires 0-based column indices, but columnIndexFromString() returns 1-based
$this->firstColumnIndex = Coordinate::columnIndexFromString($minC) - 1;
$this->lastColumnIndex = Coordinate::columnIndexFromString($maxC) - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change has the harmless, but unintended, consequence, of making statement 222 uncovered in our unit tests. While this is not necessarily a show-stopper, it is also easily avoided. Please change:

        if ($this->lastColumnIndex > 255) {
            $this->lastColumnIndex = 255;
        }

to

        $this->lastColumnIndex = min(255, $this->lastColumnIndex);

@oleibman
Copy link
Collaborator

You will need to add a formal unit test which would fail prior to your change but which will succeed with it. Let me know if you need help with this.

This commit refines the BIFF8 DIMENSIONS record fix by:

1. **Optimized column index capping**: Replaced the if-statement with min()
   function to ensure lastColumnIndex never exceeds 255, improving code
   coverage and eliminating unreachable branches in unit tests.

2. **Added comprehensive unit tests**: Created DimensionsRecordTest.php which
   directly parses the binary DIMENSIONS record (0x0200) from XLS files to
   verify correct 0-based column indices.

The tests validate:
- colMic (first column) = 0 for column A (was incorrectly 1 before fix)
- colMac (last column + 1) uses proper 0-based indexing
- Column indices are correctly capped at 255 (BIFF8 limit)

These tests fail without the fix and pass with it, ensuring the DIMENSIONS
record is correctly written for compatibility with legacy XLS parsers that
expect 0-based column indices per the BIFF8 specification.
Add proper type assertions to handle potential false returns:
- Assert file_get_contents() returns string, not false
- Assert strpos() returns int, not false
- Assert unpack() returns array, not false

This resolves all 7 PHPStan errors reported in CI while maintaining
test functionality (12 assertions, all passing).
@zlatanovic-nebojsa
Copy link
Author

@oleibman added fix, and test I hope that's it! Have a nice day!

@oleibman
Copy link
Collaborator

Thank you for the changes. I apologize for not checking this out earlier, but now I have to ask. In your original issue, you state "When converting XLS files to CSV or text format, an extra empty column appears with a trailing comma/delimiter on each row." I was going to suggest a test that confirms no trailing comma, BUT, when I run a simple test without your change, I don't see a trailing comma. Further, when I open the XLS file in Excel and save it as Csv, I still don't see a trailing comma. What are you doing with the current PhpSpreadsheet code that causes the trailing delimiter to appear? Here is my code:

        $spreadsheet = new Spreadsheet();
        $sheet = $spreadsheet->getActiveSheet();
        $sheet->fromArray([
            ['a1', 'b1', 'c1', 'd1'],
            ['a2', 'b2', 'c2', 'd2'],
        ]);
        $writer = new XlsWriter($spreadsheet);
        $outfile = 'pr.4687.xls';
        $writer->save($outfile);
        echo "saved $outfile\n";
        $reader = new XlsReader();
        $spreadsheet2 = $reader->load($outfile);
        $sheet2 = $spreadsheet->getActiveSheet();
        var_dump($sheet2->toArray());
        $writerCsv = new CsvWriter($spreadsheet2);
        $outfil2 = 'pr.4687.csv';
        $writerCsv->save($outfil2);
        echo "saved $outfil2\n";

No empty cells in the var_dump. No trailing comma in the Csv. I admittedly get exactly the same result when testing with your code. But now I need a better understanding of why you think your code is needed.

@zlatanovic-nebojsa
Copy link
Author

Thank you for reviewing! Let me clarify the issue with a concrete example.

The Problem:
When generating XLS files with 7 columns (A-G), the current code writes incorrect DIMENSIONS record indices, causing an extra empty column H to appear.

Root Cause:
The DIMENSIONS record uses 1-based indices (colMic=1, colMac=8) instead of the BIFF8-required 0-based indices (colMic=0, colMac=7).

        use PhpOffice\PhpSpreadsheet\Spreadsheet;
        use PhpOffice\PhpSpreadsheet\Writer\Xls;

        $spreadsheet = new Spreadsheet();
        $sheet = $spreadsheet->getActiveSheet();

// Define 7 columns for a typical product export
        $columns = [
            'A' => ['title' => 'Part Number', 'width' => 15],
            'B' => ['title' => 'Description', 'width' => 30],
            'C' => ['title' => 'Manufacturer', 'width' => 20],
            'D' => ['title' => 'Model', 'width' => 20],
            'E' => ['title' => 'Price', 'width' => 15],
            'F' => ['title' => 'UOM', 'width' => 10],
            'G' => ['title' => 'Stock', 'width' => 10]
        ];

// Set headers and column widths
        foreach ($columns as $letter => $column) {
            $sheet->setCellValue($letter . '1', $column['title']);
            $sheet->getColumnDimension($letter)->setWidth($column['width']);
        }

// Add sample data (3 rows)
        $sampleData = [
            ['PART-001', 'Black Ink Cartridge', 'Generic Brand', 'Model X', '$29.99', 'EA', '100'],
            ['PART-002', 'Cyan Ink Cartridge', 'Generic Brand', 'Model X', '$32.99', 'EA', '85'],
            ['PART-003', 'Yellow Ink Cartridge', 'Generic Brand', 'Model X', '$31.99', 'EA', '92']
        ];

        $row = 2;
        foreach ($sampleData as $data) {
            $col = 0;
            foreach ($data as $value) {
                $sheet->setCellValue(chr(65 + $col) . $row, $value);
                $col++;
            }
            $row++;
        }

// Save as XLS
        $writer = new Xls($spreadsheet);
        $writer->save('test_product_export.xls');

// Clean up
        $spreadsheet->disconnectWorksheets();

Visual Evidence:
The generated file shows an extra empty column H (see screenshots below):
image
image

File Comparison:
fixed.xls
with comma.xls

This affects our production system where business partners use legacy XLS parsers that strictly follow BIFF8 specifications.

@zlatanovic-nebojsa
Copy link
Author

And side note I'm not sure yet if only DIMENSION fix will be enough for our partners, that's why I made a complete wrapper that will match xls output 100% to old phpexcel excel5 creation lib

@oleibman
Copy link
Collaborator

oleibman commented Oct 18, 2025

Your code suggests that, although colMic and colMac are zero-based, rwMic and rwMac are one-based. I do not believe this is true. Take a look at the attached xls file, with 3 rows and 5 columns, created entirely with Excel (no PhpSpreadsheet, no PhpExcel). I believe that the dimensions record indicates:

array(5) {
  'rwMic' =>
  int(0)
  'rwMac' =>
  int(3)
  'colMic' =>
  int(0)
  'colMac' =>
  int(5)
  'reserved' =>
  int(0)
}

According to your test, you expect rwMic to be 1, not 0.
pr.4687.excel.xls

The initial fix only converted column indices to 0-based, but overlooked
that row indices also need the same treatment per BIFF8 specification.

Changes:
- Convert firstRowIndex from 1-based to 0-based (subtract 1)
- Convert lastRowIndex from 1-based to 0-based (subtract 1)
- Update row capping logic for 65536 limit
- Fix test assertions to expect rwMic=0 and rwMac=5 (not 1 and 6)
- Enhanced documentation to clarify all DIMENSIONS indices are 0-based

This now matches the behavior observed in Excel-generated XLS files,
where a file with 3 rows × 5 columns shows: rwMic=0, rwMac=3, colMic=0, colMac=5

All tests pass (53 tests, 244 assertions).
@zlatanovic-nebojsa
Copy link
Author

Thank you for the detailed review and for catching this oversight! You're absolutely right.

After analyzing the Excel-generated file you examined (3 rows × 5 columns showing rwMic=0, rwMac=3, colMic=0, colMac=5), I now understand that both row AND column indices must be 0-based in the BIFF8 DIMENSIONS record, not just columns.

What I Missed

In my initial fix, I correctly converted column indices to 0-based but completely overlooked that row indices also needed the same treatment. I apologize for this incomplete fix.

What's Been Corrected

I've now updated both the implementation and tests:

Worksheet.php:

  • firstRowIndex now converts from 1-based to 0-based: $minR - 1
  • lastRowIndex now converts from 1-based to 0-based: $maxR - 1
  • Updated capping logic for the 65536 row limit
  • Added clear comments explaining both rows and columns are 0-based per BIFF8

DimensionsRecordTest.php:

  • Updated test assertions to expect rwMic=0 (not 1)
  • Updated test assertions to expect rwMac=5 (not 6)
  • Enhanced documentation to clarify that all DIMENSIONS record indices are 0-based

Verification

The test now correctly expects:

  • For rows 1-5 (Excel UI): rwMic=0, rwMac=5 (0-based: rows 0-4, +1 = 5)
  • For columns A-D (Excel UI): colMic=0, colMac=4 (0-based: cols 0-3, +1 = 4)

This matches the behavior you observed in the actual Excel-generated file.

All tests pass: ✅

OK (2 tests, 12 assertions)

Thank you again for the thorough review and for helping ensure this fix properly adheres to the BIFF8 specification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Xls Writer uses 1-based column indices in BIFF DIMENSIONS record instead of required 0-based indices

2 participants