Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rounding problems #321

Closed
wants to merge 1 commit into from
Closed

Fix rounding problems #321

wants to merge 1 commit into from

Conversation

colinwjd
Copy link

@colinwjd colinwjd commented Apr 7, 2022

Use BigDecimal in some cases in DataFormatter.formatCellValue, to avoid rounding problems.

Use BigDecimal in some cases in DataFormatter.formatCellValue, to avoid rounding problems.
@pjfanning
Copy link
Contributor

@colinwjd could you provide test cases or examples of this issue in practice? Excel is a bit of a mine field when it comes to when to round and when not to round. We have no access to the Excel code or the mindsets of the Microsoft developers. We can only build code based on observation and interpretation of publicly available specs.

@@ -950,7 +950,7 @@ private String getFormattedNumberString(Cell cell, ConditionalFormattingEvaluato
if (numberFormat == null) {
return String.valueOf(d);
}
String formatted = numberFormat.format(d);
String formatted = numberFormat.format(new BigDecimal(String.valueOf(d)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting a double to a string and then to a bigdecimal and then to a formatted string - this seems dangerous to me - each conversion introducing more risk of rounding issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

image

As shown above, the value 2.05 is correctly rounded to 2.1 in excel, but in the parsed result the value is 2.0

The reason is that the Double type has a loss of precision, so we need to use the BigDecimal type.

We need to use the String type to initialize the BigDecimal object, otherwise there will also be precision problems.
It is described in the Java api documentation as follows

This is generally the preferred way to convert a float or double into a BigDecimal, as it doesn't suffer from the unpredictability of the BigDecimal(double) constructor.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you provide the xlsx file? the fact that you are using a formula evaluator suggests that the cells may not have numbers but have formulas instead - this could affect the result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

05e8a16 - this test case I just added seems to behave as expected without the change you are suggesting

POI is 20 year old code and if DataFormatter has a serious bug like the one you suggest it has then I'm surprised noone else has seen the problem (including me).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the code in getFormattedNumberString does not use BigDecimal at the moment, so your API note is off topic - prove there is an issue and we will fix it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a terrible workaround and not a fix. 2.05f is less than 2.05, so 2.0 would be the correct result when converting to a BigDecimal. The formatter should be fixed instead. Just check in JShell:

axel@xiaolong ~ % jshell
|  Welcome to JShell -- Version 18
|  For an introduction type: /help intro

jshell> 2.05
$1 ==> 2.05

jshell> new BigDecimal(2.05)
$2 ==> 2.04999999999999982236431605997495353221893310546875

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xzel23 this is not an argument about 2.05f and how it rounds in Java - it is about having Apache POI match the behaviour of Microsoft Excel. I'm happy to have the solution changed as long the test cases still pass - including the new test case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can confirm that Excel displays the value as 2.1, this is obviously done in the Excel formatter as the value in the XML is stored as 2.0499999999999998:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac xr xr2 xr3" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xr:uid="{7487740C-7451-9943-A628-9BEEC0E38BD2}">
    <dimension ref="A1:A6" />
    <sheetViews>
        <sheetView tabSelected="1" workbookViewId="0">
            <selection activeCell="A6" sqref="A6" />
        </sheetView>
    </sheetViews>
    <sheetFormatPr baseColWidth="10" defaultRowHeight="16" x14ac:dyDescent="0.2" />
    <sheetData>
        <row r="1" spans="1:1" x14ac:dyDescent="0.2">
            <c r="A1">
                <v>2.04</v>
            </c>
        </row>
        <row r="2" spans="1:1" x14ac:dyDescent="0.2">
            <c r="A2" s="1">
                <v>2.04</v>
            </c>
        </row>
        <row r="3" spans="1:1" x14ac:dyDescent="0.2">
            <c r="A3">
                <v>2.0499999999999998</v>
            </c>
        </row>
        <row r="4" spans="1:1" x14ac:dyDescent="0.2">
            <c r="A4" s="1">
                <v>2.0499999999999998</v>
            </c>
        </row>
        <row r="5" spans="1:1" x14ac:dyDescent="0.2">
            <c r="A5">
                <v>2.06</v>
            </c>
        </row>
        <row r="6" spans="1:1" x14ac:dyDescent="0.2">
            <c r="A6" s="1">
                <v>2.06</v>
            </c>
        </row>
    </sheetData>
    <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" />
    <pageSetup paperSize="9" orientation="portrait" horizontalDpi="0" verticalDpi="0" />
</worksheet>

What should be done is check why the BigDecimal formatter displays this wrong (or in other words: what kind of workaround is implemented there). It looks like they map everything within one ULP to the rounded value.

Copy link
Contributor

@pjfanning pjfanning Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String formatted = numberFormat.format(new BigDecimal(NumberToTextConverter.toText(d)));

This was the first attempt I made - it fixes this github321 case but breaks some existing tests.

That NumberToTextConverter seems to have some logic that mimics how Excel takes the 2.0499999999999998 and converts it to 2.05 . If we can find out why the other tests break that this solution might be better.

Double.toString also seems to correct for some of these issues to because it too converts 2.0499999999999998 to 2.05.

In the end the madness is why do Microsoft store 2.05 as 2.0499999999999998 in their sheet XML in the first place.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had time... Let's just keep it in like this until there is time to look into the formatter.

BTW just last week I found some hand crafted number formatting code in an arbitrary precision library I once wrote and while I don't remember why I would have implemented such a conversion myself I bet it must have been an issue like this one.

@asfgit asfgit closed this in a8f1e7a Apr 9, 2022
@pjfanning
Copy link
Contributor

Thanks - I've applied your change based on the reproducible test case.

@pjfanning
Copy link
Contributor

I'm still tweaking this code because this change is breaking other tests. The Microsoft developers have come up with such terrible Excel code that it is gut-wrenching trying to hack something that matches their code.

@pjfanning
Copy link
Contributor

@colinwjd it could be a while until the next POI release. You can build your own version of POI, of course. One workaround that you could try is using the ROUND function, eg ROUND(2.05, 1). The current POI 5.2.2 code seems to correctly round this to 2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants