Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to sort as number a column that contains empty string throws an error #1689

Closed
ettorerizza opened this issue Jul 30, 2018 · 7 comments
Assignees
Labels
Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Milestone

Comments

@ettorerizza
Copy link
Member

ettorerizza commented Jul 30, 2018

Describe the bug

A column that contains numbers and null values can be sorted as a number column, but not if it contains empty strings.

Current Results

java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
        at com.google.refine.sorting.NumberCriterion$1.compareKeys(NumberCriterion.java:74)
        at com.google.refine.sorting.BaseSorter$ComparatorWrapper.compare(BaseSorter.java:101)
        at com.google.refine.sorting.BaseSorter.compare(BaseSorter.java:169)
        at com.google.refine.sorting.SortingRowVisitor$1.compare(SortingRowVisitor.java:85)
        at com.google.refine.sorting.SortingRowVisitor$1.compare(SortingRowVisitor.java:75)
        at java.util.TimSort.countRunAndMakeAscending(Unknown Source)
        at java.util.TimSort.sort(Unknown Source)
        at java.util.Arrays.sort(Unknown Source)
        at java.util.ArrayList.sort(Unknown Source)
        at java.util.Collections.sort(Unknown Source)
        at com.google.refine.sorting.SortingRowVisitor.end(SortingRowVisitor.java:75)
        at com.google.refine.browsing.util.ConjunctiveFilteredRows.accept(ConjunctiveFilteredRows.java:71)
        at com.google.refine.commands.row.GetRowsCommand.internalRespond(GetRowsCommand.java:146)
        at com.google.refine.commands.row.GetRowsCommand.doPost(GetRowsCommand.java:70)
        at com.google.refine.RefineServlet.service(RefineServlet.java:178)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
        at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
        at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Expected behavior

Empty strings should behave like null.

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser : Chrome

OpenRefine (please complete the following information):
OpenRefine 3.0 RC1

Datasets

clipboard.openrefine.tar.gz

@wetneb wetneb added the Type: Bug Issues related to software defects or unexpected behavior, which require resolution. label Jul 30, 2018
@thadguidry thadguidry added Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. labels Jul 30, 2018
@thadguidry
Copy link
Member

thadguidry commented Jul 30, 2018

@ettorerizza No, we don't want empty cells to behave as null cells in the general case. We changed that behavior with Owen's work.

There is now an expectation by OpenRefine after Owen's change that when you "Sort by Number", that the column is full of numbers only. How do you wish to highlight to the user that "Hey dude, actually, your column that you asked me to sort is not only numbers...you need to fix that first" ?

My vote would be to have a popup warning and suggest to the user to FIRST run Facet by Null and Facet by Blank to inspect those rows in the column.

Recall that we also agreed that later we will add an import option (perhaps preference) to "treat blank cells as null cells" instead of ramming it down a users throat as we did prior to 3.0 all the way back to 1.0. But before we add that option, we are waiting for Antonin's work and a nicer UI. For now, we are kinda just getting by as much as we can...knowing that larger, cleaner, smarter ways of working in OpenRefine are coming.

@wetneb
Copy link
Sponsor Member

wetneb commented Aug 4, 2018

It's a genuine bug, we should not be failing like this…

@wetneb wetneb removed the Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators label Aug 4, 2018
@wetneb wetneb self-assigned this Aug 4, 2018
wetneb added a commit that referenced this issue Aug 4, 2018
@thadguidry
Copy link
Member

@wetneb It is useful to have the reasons when classifying a bug or not. Can you further state your reasoning why you think this is a bug? Can you refer to my reasons why I classify this as NOT A BUG and state your counter positions ? This will help us have a fuller understanding of expectations from different points of view. Jacky, Owen, and Martin might bring further expectations, so let's see where we all stand on this. Thanks.

@ettorerizza
Copy link
Member Author

ettorerizza commented Aug 4, 2018

@thadguidry @wetneb I will not meddle with the question of whether it is a bug or not. Let's say I understand both points of view.

My 2cts to feed the reflexion: Martin's latest survey showed that the majority of users came from the worlds of library and journalism, not computer science. I've been both in my career, and I'm not sure I'll be able to explain, if anyone wonders in the Google group, why he/she manages to sort some columns and not others simply because some cells that he/she thinks are empty are not empty, but contains an empty string (and why any null cell on which you click on "edit" automatically becomes an empty string).

@thadguidry
Copy link
Member

thadguidry commented Aug 4, 2018

I also don't see any changes from Jacky or Owen on isNonBlankData , which the process uses
https://github.com/OpenRefine/OpenRefine/blame/master/main/src/com/google/refine/expr/ExpressionUtils.java#L111a

@wetneb @ettorerizza I'm also wondering Where this bug was introduced ?
Since I have no java errors when sorting by number with your given test case in OpenRefine 2.0-r1836

The general argument as I see it is this:

  • Do we make a key for sort comparisons or not...when ExpressionUtils.isNonBlankData(value) is false ?
    From my eyes, that key was being made just fine in OpenRefine 2.0-r1836 when value was an empty string.

@wetneb
Copy link
Sponsor Member

wetneb commented Aug 5, 2018

@thadguidry it's clearly a bug because running an operation should not fail with an exception. Period.

Concerning the desired behaviour, the UI is designed to treat blank values uniformly, by letting the user decide where they should appear in the results, so the backend should agree with that.

@wetneb
Copy link
Sponsor Member

wetneb commented Aug 5, 2018

Since I have no java errors when sorting by number with your given test case in OpenRefine 2.0-r1836

Importing my example as a CSV with the default settings will not trigger the bug, you need to make sure the empty cell is treated as an empty string and not a null (which is the default behaviour).

wetneb added a commit that referenced this issue Aug 7, 2018
@wetneb wetneb added this to the 3.0 milestone Aug 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
Development

No branches or pull requests

3 participants