-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Fix BIFF8 DIMENSIONS record to use 0-based column indices #4687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix BIFF8 DIMENSIONS record to use 0-based column indices #4687
Conversation
The XLS writer incorrectly used 1-based column indices in the BIFF8 DIMENSIONS record, violating the Microsoft Excel Binary File Format specification which requires 0-based indices. This bug caused an extra empty column to appear when converting XLS files to other formats (e.g., CSV). Changes: - Modified column index initialization to subtract 1 from the result of Coordinate::columnIndexFromString() to convert from 1-based to 0-based indexing - Updated COLINFO loop to use the corrected 0-based lastColumnIndex Per BIFF8 specification: - colMic (first column) must be 0-based - colMac (column after last column) must be 0-based Example: For columns A-G (7 columns): - Before: colMic=1, colMac=8 (incorrect) - After: colMic=0, colMac=7 (correct) Fixes PHPOffice#4682
$this->lastColumnIndex = Coordinate::columnIndexFromString($maxC); | ||
// BIFF8 requires 0-based column indices, but columnIndexFromString() returns 1-based | ||
$this->firstColumnIndex = Coordinate::columnIndexFromString($minC) - 1; | ||
$this->lastColumnIndex = Coordinate::columnIndexFromString($maxC) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your change has the harmless, but unintended, consequence, of making statement 222 uncovered in our unit tests. While this is not necessarily a show-stopper, it is also easily avoided. Please change:
if ($this->lastColumnIndex > 255) {
$this->lastColumnIndex = 255;
}
to
$this->lastColumnIndex = min(255, $this->lastColumnIndex);
You will need to add a formal unit test which would fail prior to your change but which will succeed with it. Let me know if you need help with this. |
This commit refines the BIFF8 DIMENSIONS record fix by: 1. **Optimized column index capping**: Replaced the if-statement with min() function to ensure lastColumnIndex never exceeds 255, improving code coverage and eliminating unreachable branches in unit tests. 2. **Added comprehensive unit tests**: Created DimensionsRecordTest.php which directly parses the binary DIMENSIONS record (0x0200) from XLS files to verify correct 0-based column indices. The tests validate: - colMic (first column) = 0 for column A (was incorrectly 1 before fix) - colMac (last column + 1) uses proper 0-based indexing - Column indices are correctly capped at 255 (BIFF8 limit) These tests fail without the fix and pass with it, ensuring the DIMENSIONS record is correctly written for compatibility with legacy XLS parsers that expect 0-based column indices per the BIFF8 specification.
Add proper type assertions to handle potential false returns: - Assert file_get_contents() returns string, not false - Assert strpos() returns int, not false - Assert unpack() returns array, not false This resolves all 7 PHPStan errors reported in CI while maintaining test functionality (12 assertions, all passing).
@oleibman added fix, and test I hope that's it! Have a nice day! |
Thank you for the changes. I apologize for not checking this out earlier, but now I have to ask. In your original issue, you state "When converting XLS files to CSV or text format, an extra empty column appears with a trailing comma/delimiter on each row." I was going to suggest a test that confirms no trailing comma, BUT, when I run a simple test without your change, I don't see a trailing comma. Further, when I open the XLS file in Excel and save it as Csv, I still don't see a trailing comma. What are you doing with the current PhpSpreadsheet code that causes the trailing delimiter to appear? Here is my code: $spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
$sheet->fromArray([
['a1', 'b1', 'c1', 'd1'],
['a2', 'b2', 'c2', 'd2'],
]);
$writer = new XlsWriter($spreadsheet);
$outfile = 'pr.4687.xls';
$writer->save($outfile);
echo "saved $outfile\n";
$reader = new XlsReader();
$spreadsheet2 = $reader->load($outfile);
$sheet2 = $spreadsheet->getActiveSheet();
var_dump($sheet2->toArray());
$writerCsv = new CsvWriter($spreadsheet2);
$outfil2 = 'pr.4687.csv';
$writerCsv->save($outfil2);
echo "saved $outfil2\n"; No empty cells in the var_dump. No trailing comma in the Csv. I admittedly get exactly the same result when testing with your code. But now I need a better understanding of why you think your code is needed. |
Thank you for reviewing! Let me clarify the issue with a concrete example. The Problem: Root Cause: use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xls;
$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
// Define 7 columns for a typical product export
$columns = [
'A' => ['title' => 'Part Number', 'width' => 15],
'B' => ['title' => 'Description', 'width' => 30],
'C' => ['title' => 'Manufacturer', 'width' => 20],
'D' => ['title' => 'Model', 'width' => 20],
'E' => ['title' => 'Price', 'width' => 15],
'F' => ['title' => 'UOM', 'width' => 10],
'G' => ['title' => 'Stock', 'width' => 10]
];
// Set headers and column widths
foreach ($columns as $letter => $column) {
$sheet->setCellValue($letter . '1', $column['title']);
$sheet->getColumnDimension($letter)->setWidth($column['width']);
}
// Add sample data (3 rows)
$sampleData = [
['PART-001', 'Black Ink Cartridge', 'Generic Brand', 'Model X', '$29.99', 'EA', '100'],
['PART-002', 'Cyan Ink Cartridge', 'Generic Brand', 'Model X', '$32.99', 'EA', '85'],
['PART-003', 'Yellow Ink Cartridge', 'Generic Brand', 'Model X', '$31.99', 'EA', '92']
];
$row = 2;
foreach ($sampleData as $data) {
$col = 0;
foreach ($data as $value) {
$sheet->setCellValue(chr(65 + $col) . $row, $value);
$col++;
}
$row++;
}
// Save as XLS
$writer = new Xls($spreadsheet);
$writer->save('test_product_export.xls');
// Clean up
$spreadsheet->disconnectWorksheets(); Visual Evidence: File Comparison: This affects our production system where business partners use legacy XLS parsers that strictly follow BIFF8 specifications. |
And side note I'm not sure yet if only DIMENSION fix will be enough for our partners, that's why I made a complete wrapper that will match xls output 100% to old phpexcel excel5 creation lib |
Your code suggests that, although colMic and colMac are zero-based, rwMic and rwMac are one-based. I do not believe this is true. Take a look at the attached xls file, with 3 rows and 5 columns, created entirely with Excel (no PhpSpreadsheet, no PhpExcel). I believe that the dimensions record indicates:
According to your test, you expect rwMic to be 1, not 0. |
The initial fix only converted column indices to 0-based, but overlooked that row indices also need the same treatment per BIFF8 specification. Changes: - Convert firstRowIndex from 1-based to 0-based (subtract 1) - Convert lastRowIndex from 1-based to 0-based (subtract 1) - Update row capping logic for 65536 limit - Fix test assertions to expect rwMic=0 and rwMac=5 (not 1 and 6) - Enhanced documentation to clarify all DIMENSIONS indices are 0-based This now matches the behavior observed in Excel-generated XLS files, where a file with 3 rows × 5 columns shows: rwMic=0, rwMac=3, colMic=0, colMac=5 All tests pass (53 tests, 244 assertions).
Thank you for the detailed review and for catching this oversight! You're absolutely right. After analyzing the Excel-generated file you examined (3 rows × 5 columns showing What I MissedIn my initial fix, I correctly converted column indices to 0-based but completely overlooked that row indices also needed the same treatment. I apologize for this incomplete fix. What's Been CorrectedI've now updated both the implementation and tests: Worksheet.php:
DimensionsRecordTest.php:
VerificationThe test now correctly expects:
This matches the behavior you observed in the actual Excel-generated file. All tests pass: ✅
Thank you again for the thorough review and for helping ensure this fix properly adheres to the BIFF8 specification! |
Fix BIFF8 DIMENSIONS record to use 0-based column indices
Problem
After migrating from PHPExcel to PhpSpreadsheet, our API consumers reported that their Excel parsers were not reading all data from generated XLS files. The files opened correctly in modern Excel, but older Excel parsers (used by legacy systems) were missing columns.
Investigation
We initially attempted a different DIMENSIONS fix, but consumers continued to report issues. To ensure 100% compatibility with legacy parsers, I analyzed old PHPExcel Excel5 output at the byte level and discovered that PhpSpreadsheet's DIMENSIONS record uses 1-based column indices instead of 0-based as required by the BIFF8 specification.
This subtle difference causes older parsers to misread the worksheet dimensions, resulting in missing data.
Root Cause
The XLS writer incorrectly uses 1-based column indices in the BIFF8 DIMENSIONS record, violating the Microsoft Excel Binary File Format specification which requires 0-based indices.
Example: For columns A-G (7 columns):
colMic=1, colMac=8
colMic=0, colMac=7
Fix
Modified column index initialization to subtract 1 from
Coordinate::columnIndexFromString()
to convert from 1-based to 0-based indexing.Changes:
Also updated the COLINFO loop to use the corrected 0-based
lastColumnIndex
(removed the-1
adjustment that was compensating for the incorrect 1-based value).Testing
Custom test confirms DIMENSIONS record now correctly uses 0-based indices:
Impact
References
colMic
(first column) andcolMac
(column after last) must be 0-basedFixes #4682