-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Option to render non-text Excel cells as text when importing #4838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Agree, and we have said this often, #1908 (comment) ,
|
One or more examples of how the current behavior should be modified would be extremely helpful. Excel has different data types internally, so it's simply not possible "parse all cells as text." If the desire is to take an internal number or date and format it using the Excel format string and then import the resulting string, that might be possible, but it's actually more conversion, not less. Or it's possible that there's just some simple bug lurking, but without examples, it's hard to tell. Here's the core of the import code: OpenRefine/main/src/com/google/refine/importers/ExcelImporter.java Lines 229 to 260 in cb55cdf
|
@tfmorris can we not retain the shape of the data as it is stored in Excel at least for XLSX and simply output as a text string and let the users clean up and reformat as they wish? In other words, don't think for our users (as that code seems to want to do), give them the power to decide later how to transform? |
I guess one way to specify this issue further would be as follows. When opening an excel file in Excel, all non-empty cells will have a certain string representation in the UI (possibly formatted or aligned differently to mark the non-string datatype). As an OpenRefine user, I would like to have the option of importing that excel file in OpenRefine such that the cells of the resulting OpenRefine project contain the same strings as the ones that Excel displays in its UI. All those cells would therefore have string datatype in OpenRefine. Perfect faithfulness to Excel is probably not achievable (typically because of date rendering specifics, locale differences, rounding of numbers, and things like that) but probably not needed in most cases. From an implementation perspective, this task would consist in adapting the code @tfmorris quoted above to convert non-string datatypes to strings on the fly, at importing time, if the option to do so is used. |
Actually the Apache POI project already has a utility class which knows how to do Excel compatible formatting, so this is straightforward to add. |
It would be easier if OpenRefine offered an option to parse all cells as text when importing Excel files, as it does with csv files. When refining e.g. publication dates, the default behaviour of parsing digit characters in Excel files as numbers can lead to unwanted results and is cumbersome to fix.
Proposed solution
Offer the option to leave a checkbox "Parse cell text into numbers, dates, ..." unchecked at import, as with csv files.
Alternatives considered
All Excel files could be converted to csv files before importing them into OpenRefine.
Additional context
The text was updated successfully, but these errors were encountered: