Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imprecise facet statistics in records mode #2588

Closed
wetneb opened this issue Apr 24, 2020 · 5 comments · Fixed by #2607
Closed

Imprecise facet statistics in records mode #2588

wetneb opened this issue Apr 24, 2020 · 5 comments · Fixed by #2607
Labels
records Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Milestone

Comments

@wetneb
Copy link
Sponsor Member

wetneb commented Apr 24, 2020

Describe the bug

The blank values in a column are counted incorrectly in records mode.

To Reproduce
Import the following CSV dataset:

a,b,c
1,2,3
,,4
,,5
,6,7
3,4,8
,2,9

Create a text facet on the second column, in records mode.
Observe that it counts a single blank value, whereas there are two of them.

Current Results
The count for (blank) is 1.

Expected behavior
The count for (blank) should be 2.

OpenRefine (please complete the following information):

  • master branch (but applicable to all versions since at least 2.6 probably)
@wetneb wetneb added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. records labels Apr 24, 2020
@thadguidry
Copy link
Member

thadguidry commented Apr 24, 2020

Wait, but those are not treated as blank, those are null?
We have the import option to "Store blank cells as null" to treat them as null.
Did you verify with the new All -> show/hide null ?

Customized facets -> Facet by blank (null or empty string)

Anyways, in Rows mode...it looks fine to me ?
image

Ah, but your explicitly saying this is a Records mode bug. OK. yeah, I guess it is.

@joanneong
Copy link
Contributor

Can I clarify what the bug is here?

If I understand records mode correctly, a record links multiple rows together. In the example given by @wetneb , there are a total of 6 rows and 2 records (if we treat 'a', 'b', and 'c' as column headers). The 2 records look like this:

image

When we apply a text facet on the second column in record mode, it shows up as (blank) 1. I think this means that there is a single record containing blank values in the dataset. Indeed, if you click on the (blank) 1, you see 1 record being filtered in the display:

image

As such, I am not really sure if this is really a bug since the count makes sense in records mode...?

@wetneb
Copy link
Sponsor Member Author

wetneb commented May 1, 2020

Yes, I should have been more precise. The problem is that blank values are computed only once per record, but non-blank values are counted multiple times per record. (Except for error values which are also counted once per record.) In my opinion there is no reason for this discrepancy.

@tfmorris
Copy link
Member

tfmorris commented May 5, 2020

The counts for nulls should definitely be consistent with the counts for other values, but should the count represent the number of rows or the number of records with that value. I think historically it was the former, but it's not clear that that's most intuitive (at least to me).

@wetneb
Copy link
Sponsor Member Author

wetneb commented May 5, 2020

I agree that makes intuitive sense but there are complications to this, see the discussion in the PR.

@wetneb wetneb added this to the 3.5 milestone Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
records Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants