Skip to content

Conversation

@ppodgorskicrido
Copy link

This is:

- [x] a bugfix
- [ ] a new feature
- [ ] refactoring
- [ ] additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change
  • Documentation is updated as necessary

Why this change is needed?

#2884

*/
public static function applyCharsLimit(string $text): string
{
if (strlen($text) > self::CELL_CHARS_LIMIT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the limit characters, or is it bytes? strlen() measures the string in bytes; the PhpSpreadsheet StringHelper::countCharacters() method returns the character length

public static function applyCharsLimit(string $text): string
{
if (strlen($text) > self::CELL_CHARS_LIMIT) {
$text = substr($text, 0, self::CELL_CHARS_LIMIT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise with substr() which bases its count on bytes; StringHelper::substring() works with characters

$objWriter->writeAttribute('xml:space', 'preserve');
}
$objWriter->writeRawData($textToWrite);
$objWriter->writeRawData(StringHelper::applyCharsLimit($textToWrite));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be applied on write? Or when the cell value is set? e.g. perhaps through the value binder

/**
* Maximum limit of chars in single cell in excel.
*/
private const CELL_CHARS_LIMIT = 32767;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why you've made this constant private rather than public?

@oleibman
Copy link
Collaborator

Aside from Mark's comments, there is a bigger issue here - your test case doesn't actually exercise the code in question. That is because the string is already truncated when the value is set in the cell - the code that does this is in method checkString in Cell/DataType, which is called by method setValueExplicit in Cell/Cell, which is called by both DefaultValueBinder and StringValueBinder. Are you using a custom binder? We would need code that shows how you manged to create this spreadsheet avoiding that check in the first place to figure out the proper solution.

@ppodgorskicrido
Copy link
Author

I'm using pure library (v 1.23.0) without any additions. This project is based on Symfony with Doctrine ORM. Code is simple:

<?php
$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
/* Add columns names on top of excel sheet */
$rowArray = [
    // array of strings as columns headers
];
$sheet->fromArray($rowArray);

//then assing values using:
$sheet->setCellValueByColumnAndRow(1, $row, $entity->getDescription()); // function returns string|null

$writer = new XlsxWriter($spreadsheet); // alias of ...\Writer\Xlsx
try {
    $writer->save($this->targetDirectory . '/' . $exportFilePath);
} catch (Exception $e) {
    $this->logger->error('Cannot save excel to path: ' . $exportFilePath);
    $this->logger->error('Error message: ' . $e);
}
$spreadsheet->disconnectWorksheets();
unset($spreadsheet);

@oleibman
Copy link
Collaborator

oleibman commented Jul 1, 2022

I am not able to duplicate your result:

require __DIR__ . '/PhpSpreadsheet' . '/vendor/autoload.php';

use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;

$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();
$text = str_repeat('Ω', 35000);
$sheet->setCellValueByColumnAndRow(1, 1, $text);
$sheet->setCellValueByColumnAndRow(1, 2, '=LEN(A1)');
echo 'Length of data in cell A1 is ',
    $sheet->getCell('A2')->getCalculatedValue(),
    "\n";
$writer = new Xlsx($spreadsheet);
$filename = 'issue.2884b.xlsx';
$writer->save($filename);
echo "saved $filename\n";

This should be approximately equivalent to your code. Note that I am using a string composed entirely of multiple byte UTF8 characters, just to make sure this isn't a bytes-vs-characters issue. When I run it, I see what I expected, which is not what you are seeing:

Length of data in cell A1 is 32767
saved issue.2884b.xlsx

And the output file opens with no problem. Can you try running this to see your results.

@oleibman
Copy link
Collaborator

oleibman commented Jul 1, 2022

I believe I may have found a way to duplicate your problem. I will keep you posted.

@oleibman
Copy link
Collaborator

oleibman commented Jul 1, 2022

I hate to say it, but this looks like an Excel bug to me. I have read in the file where you originally reported a problem, and confirmed using mb_strlen that the length of the data in J2 is 32,767 characters. It is truncated, with the last characters being "a fryzury to dzieło W". Now, when I open a new spreadsheet and paste that text into a cell, and use LEN to see the length of that cell, it reports 32,751! I think this difference might be because there are 4 ampersand characters in the text, which have been translated from & to &amp;, resulting in an extra 4*4=16 characters. I don't understand why this should be so, but it is consistent with the facts. Then, maybe there is some merit in your original suggestion that &quot; is pushing it over the limit. However, if it is, that's an Excel bug. And, in any case, there's no sensible action we can take. If we arbitrarily truncate at 32,767 after htmlspecialchars, we may wind up cutting off part of &amp; or &quot;, winding up with our xml corrupted in a different way.
BTW, I tried adding ENT_NOQUOTES to the htmlspecialchars call in XMLWriter. This did, in fact, leave the quotes unchanged,but it was not sufficient to allow opening the resulting file without error.

@oleibman
Copy link
Collaborator

oleibman commented Jul 2, 2022

I think I can confirm a bug in Excel. Run the following program:

$longstring1 = str_repeat('a', 32767);
$longstring2 = str_repeat('b', 32767);
echo mb_strlen($longstring1, 'UTF-8'), "\n";
echo mb_strlen($longstring2, 'UTF-8'), "\n";
$filename = 'issue.2884.toolong1d.csv';
$stream = fopen($filename, 'wb');
fputs($stream, "$longstring1\n$longstring2\n");
fclose($stream);
echo "write complete to $filename\n";

You can easily confirm that, as expected, this file shows two 32,767-character lines when viewed in a text editor. Not so when you open it in Excel - cell A1 contains the first 32,759 a's, and cell A2 contains the last 7. One appears to have been lost altogether. Likewise, cell A3 contains the first 32,759 b's, A4 contains the last 7, and one is missing. LibreOffice (which does not have a 32K restriction) shows the data as 32,767 characters in each of A1 and A2. Note that this demo does not depend on PhpSpreadsheet, uses only ASCII characters, and does not use XML replacement for the ampersands.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Jul 2, 2022
This is the result of an investigation into issue PHPOffice#2884 (see also PR PHPOffice#2913). It is, unfortunately, not a fix for the original problem; see the discussion in that PR for why I don't think there is a practical fix for that specific problem at this time.

Excel limits strings to 32,767 characters. We already truncate strings to that length when added to the spreadsheet. However, we have been able to exceed that length as a result of the concatenation operator (Excel truncates); as a result of the CONCATENATE or TEXTJOIN functions (Excel returns #CALC!); or as a result of the REPLACE, REPT, SUBSTITUTE functions (Excel returns #VALUE!). This PR changes PhpSpreadsheet to return the same value as Excel in these cases. Note that Excel2003 truncates in all those cases; I don't think there is a way to differentiate that behavior in PhpSpreadsheet.

However, LibreOffice and Gnumeric do not have that limit; if they have a limit at all, it is much higher. It would be fairly easy to use existing settings to differentiate between Excel and LibreOffice/Gnumeric in this respect. I have not done so in this PR because I am not sure how useful that is, and I can easily see it leading to problems (read in a LibreOffice spreadsheet with a 33K cell and then output to an Excel spreadsheet). Perhaps it should be handled with an additional opt-in setting.

I changed the maximum size from a literal to a constant in the one place where it was already being enforced (Cell/DataType). I am not sure that is the best place for it to be defined; I am open to suggestions.
oleibman added a commit that referenced this pull request Jul 4, 2022
* Keep Calculated String Results Below 32K

This is the result of an investigation into issue #2884 (see also PR #2913). It is, unfortunately, not a fix for the original problem; see the discussion in that PR for why I don't think there is a practical fix for that specific problem at this time.

Excel limits strings to 32,767 characters. We already truncate strings to that length when added to the spreadsheet. However, we have been able to exceed that length as a result of the concatenation operator (Excel truncates); as a result of the CONCATENATE or TEXTJOIN functions (Excel returns #CALC!); or as a result of the REPLACE, REPT, SUBSTITUTE functions (Excel returns #VALUE!). This PR changes PhpSpreadsheet to return the same value as Excel in these cases. Note that Excel2003 truncates in all those cases; I don't think there is a way to differentiate that behavior in PhpSpreadsheet.

However, LibreOffice and Gnumeric do not have that limit; if they have a limit at all, it is much higher. It would be fairly easy to use existing settings to differentiate between Excel and LibreOffice/Gnumeric in this respect. I have not done so in this PR because I am not sure how useful that is, and I can easily see it leading to problems (read in a LibreOffice spreadsheet with a 33K cell and then output to an Excel spreadsheet). Perhaps it should be handled with an additional opt-in setting.

I changed the maximum size from a literal to a constant in the one place where it was already being enforced (Cell/DataType). I am not sure that is the best place for it to be defined; I am open to suggestions.

* Implement Some Suggestions

... from @MarkBaker.
@oleibman
Copy link
Collaborator

oleibman commented Sep 2, 2022

Closing this PR. The problem appears to be with Excel, not with PhpSpreadsheet, and the code in this PR won't fix it. I will leave the original issue open so that the problem remains documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants