-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Apply chars limit in single cell in xlsx writer #2913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| */ | ||
| public static function applyCharsLimit(string $text): string | ||
| { | ||
| if (strlen($text) > self::CELL_CHARS_LIMIT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the limit characters, or is it bytes? strlen() measures the string in bytes; the PhpSpreadsheet StringHelper::countCharacters() method returns the character length
| public static function applyCharsLimit(string $text): string | ||
| { | ||
| if (strlen($text) > self::CELL_CHARS_LIMIT) { | ||
| $text = substr($text, 0, self::CELL_CHARS_LIMIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise with substr() which bases its count on bytes; StringHelper::substring() works with characters
| $objWriter->writeAttribute('xml:space', 'preserve'); | ||
| } | ||
| $objWriter->writeRawData($textToWrite); | ||
| $objWriter->writeRawData(StringHelper::applyCharsLimit($textToWrite)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be applied on write? Or when the cell value is set? e.g. perhaps through the value binder
| /** | ||
| * Maximum limit of chars in single cell in excel. | ||
| */ | ||
| private const CELL_CHARS_LIMIT = 32767; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason why you've made this constant private rather than public?
|
Aside from Mark's comments, there is a bigger issue here - your test case doesn't actually exercise the code in question. That is because the string is already truncated when the value is set in the cell - the code that does this is in method checkString in Cell/DataType, which is called by method setValueExplicit in Cell/Cell, which is called by both DefaultValueBinder and StringValueBinder. Are you using a custom binder? We would need code that shows how you manged to create this spreadsheet avoiding that check in the first place to figure out the proper solution. |
|
I'm using pure library (v 1.23.0) without any additions. This project is based on Symfony with Doctrine ORM. Code is simple: <?php
$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
/* Add columns names on top of excel sheet */
$rowArray = [
// array of strings as columns headers
];
$sheet->fromArray($rowArray);
//then assing values using:
$sheet->setCellValueByColumnAndRow(1, $row, $entity->getDescription()); // function returns string|null
$writer = new XlsxWriter($spreadsheet); // alias of ...\Writer\Xlsx
try {
$writer->save($this->targetDirectory . '/' . $exportFilePath);
} catch (Exception $e) {
$this->logger->error('Cannot save excel to path: ' . $exportFilePath);
$this->logger->error('Error message: ' . $e);
}
$spreadsheet->disconnectWorksheets();
unset($spreadsheet); |
|
I am not able to duplicate your result: require __DIR__ . '/PhpSpreadsheet' . '/vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;
$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
$text = str_repeat('Ω', 35000);
$sheet->setCellValueByColumnAndRow(1, 1, $text);
$sheet->setCellValueByColumnAndRow(1, 2, '=LEN(A1)');
echo 'Length of data in cell A1 is ',
$sheet->getCell('A2')->getCalculatedValue(),
"\n";
$writer = new Xlsx($spreadsheet);
$filename = 'issue.2884b.xlsx';
$writer->save($filename);
echo "saved $filename\n";This should be approximately equivalent to your code. Note that I am using a string composed entirely of multiple byte UTF8 characters, just to make sure this isn't a bytes-vs-characters issue. When I run it, I see what I expected, which is not what you are seeing: And the output file opens with no problem. Can you try running this to see your results. |
|
I believe I may have found a way to duplicate your problem. I will keep you posted. |
|
I hate to say it, but this looks like an Excel bug to me. I have read in the file where you originally reported a problem, and confirmed using mb_strlen that the length of the data in J2 is 32,767 characters. It is truncated, with the last characters being "a fryzury to dzieło W". Now, when I open a new spreadsheet and paste that text into a cell, and use LEN to see the length of that cell, it reports 32,751! I think this difference might be because there are 4 ampersand characters in the text, which have been translated from |
|
I think I can confirm a bug in Excel. Run the following program: $longstring1 = str_repeat('a', 32767);
$longstring2 = str_repeat('b', 32767);
echo mb_strlen($longstring1, 'UTF-8'), "\n";
echo mb_strlen($longstring2, 'UTF-8'), "\n";
$filename = 'issue.2884.toolong1d.csv';
$stream = fopen($filename, 'wb');
fputs($stream, "$longstring1\n$longstring2\n");
fclose($stream);
echo "write complete to $filename\n";You can easily confirm that, as expected, this file shows two 32,767-character lines when viewed in a text editor. Not so when you open it in Excel - cell A1 contains the first 32,759 a's, and cell A2 contains the last 7. One appears to have been lost altogether. Likewise, cell A3 contains the first 32,759 b's, A4 contains the last 7, and one is missing. LibreOffice (which does not have a 32K restriction) shows the data as 32,767 characters in each of A1 and A2. Note that this demo does not depend on PhpSpreadsheet, uses only ASCII characters, and does not use XML replacement for the ampersands. |
This is the result of an investigation into issue PHPOffice#2884 (see also PR PHPOffice#2913). It is, unfortunately, not a fix for the original problem; see the discussion in that PR for why I don't think there is a practical fix for that specific problem at this time. Excel limits strings to 32,767 characters. We already truncate strings to that length when added to the spreadsheet. However, we have been able to exceed that length as a result of the concatenation operator (Excel truncates); as a result of the CONCATENATE or TEXTJOIN functions (Excel returns #CALC!); or as a result of the REPLACE, REPT, SUBSTITUTE functions (Excel returns #VALUE!). This PR changes PhpSpreadsheet to return the same value as Excel in these cases. Note that Excel2003 truncates in all those cases; I don't think there is a way to differentiate that behavior in PhpSpreadsheet. However, LibreOffice and Gnumeric do not have that limit; if they have a limit at all, it is much higher. It would be fairly easy to use existing settings to differentiate between Excel and LibreOffice/Gnumeric in this respect. I have not done so in this PR because I am not sure how useful that is, and I can easily see it leading to problems (read in a LibreOffice spreadsheet with a 33K cell and then output to an Excel spreadsheet). Perhaps it should be handled with an additional opt-in setting. I changed the maximum size from a literal to a constant in the one place where it was already being enforced (Cell/DataType). I am not sure that is the best place for it to be defined; I am open to suggestions.
* Keep Calculated String Results Below 32K This is the result of an investigation into issue #2884 (see also PR #2913). It is, unfortunately, not a fix for the original problem; see the discussion in that PR for why I don't think there is a practical fix for that specific problem at this time. Excel limits strings to 32,767 characters. We already truncate strings to that length when added to the spreadsheet. However, we have been able to exceed that length as a result of the concatenation operator (Excel truncates); as a result of the CONCATENATE or TEXTJOIN functions (Excel returns #CALC!); or as a result of the REPLACE, REPT, SUBSTITUTE functions (Excel returns #VALUE!). This PR changes PhpSpreadsheet to return the same value as Excel in these cases. Note that Excel2003 truncates in all those cases; I don't think there is a way to differentiate that behavior in PhpSpreadsheet. However, LibreOffice and Gnumeric do not have that limit; if they have a limit at all, it is much higher. It would be fairly easy to use existing settings to differentiate between Excel and LibreOffice/Gnumeric in this respect. I have not done so in this PR because I am not sure how useful that is, and I can easily see it leading to problems (read in a LibreOffice spreadsheet with a 33K cell and then output to an Excel spreadsheet). Perhaps it should be handled with an additional opt-in setting. I changed the maximum size from a literal to a constant in the one place where it was already being enforced (Cell/DataType). I am not sure that is the best place for it to be defined; I am open to suggestions. * Implement Some Suggestions ... from @MarkBaker.
|
Closing this PR. The problem appears to be with Excel, not with PhpSpreadsheet, and the code in this PR won't fix it. I will leave the original issue open so that the problem remains documented. |
This is:
Checklist:
Why this change is needed?
#2884