New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excel importer doesn't work from URL #6418
Comments
Hey @tfmorris can I work on this issue ? |
I only took a quick look at this, but I think it may be a little complicated due to not having the necessary information available when needed, as outlined above, but if you want to give it a go, go ahead! Let us know if you have any questions. |
Thanks. It actually works well with Excel files which are contained in archives or are compressed without making any modification to the program. |
This method might be what's causing this problem with Excel importing from URL when the fileRecord contains the URL (which for some reason I don't know, that of the zipped file I was using to test did not have which is why it worked for it), this method will ultimately return the URL rather than the fileName which is why at the if statement highlighted above it was comparing to the file's download URL. I don't yet really understand why this was done so, but modifying that method actually solves the problem. Please @tfmorris or @wetneb tell me if there was any reason for that method written like that so I could dig deeper or I submit the pull request. |
pro tip you can use Github's permalink feature to create clickable code previews that will take people directly to the code in question OpenRefine/main/src/com/google/refine/importing/ImportingUtilities.java Lines 601 to 606 in f487880
The original data source is stored as part of the provenance for the file creation (you can see it in the project metadata on the Open Project screen with all the projects listed). There are (potentially) multiple layers to the onion:
My memory (without having had the chance to refresh it, which I'll try to do tomorrow) is that we have one fewer variables than we need to hold all the necessary data in this particular case. They key thing to double check as you look for a solution is that you've preserved the provenance information in the project metadata. |
Did not know about the tip thanks. For the preservation of the provenance information in the project metadata, I just checked it and after the modifications to that method, the provenance information in the project metadata is as preserved as it was before the modifications, and the Excel importing works right.
|
Add a getFileName method in the ImportingUtilities so it can be used to get the name of the file from the fileRecord.
Add a new variable (fileName) in the ImportingParserBase method,that will hold the return value of the new getFileName method of the ImportingUtilities. So it can be parsed to the parseOneFile method instead of the fileSource. This is done to keep the provenance information from being lost.
When the Excel importer is invoked after downloading from a URL, the sheet selection logic doesn't work because it's comparing against the original URL rather than the local filename.
The relevant piece of code is:
OpenRefine/main/src/com/google/refine/importers/ExcelImporter.java
Lines 200 to 201 in f487880
which causes every sheet to be skipped. Unfortunately, we need original URL for provenance if the user has requested it to be saved AND we need the local filename for the sheet matching, but we don't have it available.
To Reproduce
Steps to reproduce the behavior:
Current Results
Both the preview and the actual project import are empty
Expected Behavior
File contents are correctly previewed and imported.
Versions
Additional context
This may also be a problem with Excel files which are contained in archives or are compressed, but I haven't checked.
The text was updated successfully, but these errors were encountered: