Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Importer APIs to support filename and archive name cleanly #2963

Closed
tfmorris opened this issue Jul 18, 2020 · 5 comments · Fixed by #2978
Closed

Refactor Importer APIs to support filename and archive name cleanly #2963

tfmorris opened this issue Jul 18, 2020 · 5 comments · Fixed by #2978
Assignees
Labels
import About importers in general - add a label for the data format if available Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Milestone

Comments

@tfmorris
Copy link
Member

In 818e139 the NotImplementedException throwing behavior of the methods in the base class got removed so that they're now silent no-ops if someone calls them for a subclass where they are not implemented. This is unsafe and misleading behavior that just cost me a good chunk of time to track down.

It looks like TreeImportingParserBase suffers from the same issue: 818e139#diff-fc818390bcd00535a45c7ef1f5dfc8d2L177-R178

@tfmorris tfmorris added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators import About importers in general - add a label for the data format if available labels Jul 18, 2020
@tfmorris
Copy link
Member Author

OMG, this is such a mess! Not only did the NotImplementedExceptions get removed, but the subclasses are now explicitly calling these methods for the new side effect code that got place there. This has polluted the entire Importer class hierarchy and when I restore some of the previous behavior, 95 tests fail. Ugh!! What a nightmare...

@thadguidry
Copy link
Member

thadguidry commented Jul 19, 2020

I guess we need a miracle...I think he goes by the name of Tom? :-)
In seriousness Tom, if you think this can be scoped out and you have a concrete plan to fix it all correctly, then the advisory committee could agree to perhaps have you invoice for rectifying this situation, or would it be only sort of a hassle and you could volunteer to resolve the nightmare?

@wetneb
Copy link
Sponsor Member

wetneb commented Jul 19, 2020

Yes the architecture of our importers is not ideal at the moment. Also we rely on the CSV importer in many tests to create a test project, which we should not do (that is my fault). I am working on this architecture in the Spark migration and I have removed the dependency to the CSV importer in tests, so perhaps it is not worth duplicating this effort in 3.x?

@tfmorris
Copy link
Member Author

These issues are orthogonal to use of the CSV importer to create test cases.

After ripping out all this code, all the tests pass again, so I guess the metadata functionality has zero test coverage. :(

As part of the investigation, I also realized that the recent addition of optionally saving archive filename broke the public importer API with an incompatible change which is uncool, so that will need to be redone.

@tfmorris tfmorris changed the title ImportingParserBase must throw NotImplementedException Refactor Importer APIs to support filename and archive name cleanly Jul 19, 2020
@tfmorris
Copy link
Member Author

It looks like this mess started with #1055, which was merged without review, and has snowballed from there. It's too late to restore compatibility for pre-2015 era importers, but we can at least a) not break current importers for 3.5 and b) try to create something which can be stabilized going forward.

tfmorris added a commit to tfmorris/OpenRefine that referenced this issue Jul 22, 2020
Fixes OpenRefine#2963
- restore binary compatibility to the API
- hoist the handling of both fileSource and archiveFileName from
TabularImportingParserBase and TreeImportingParserBase to
ImportingParserBase so that there's only one copy. These 3 classes are
all part of the internal implementation, so there should be no
compatibility issue.
tfmorris added a commit to tfmorris/OpenRefine that referenced this issue Jul 22, 2020
Fixes OpenRefine#2963
- restore binary compatibility to the API
- hoist the handling of both fileSource and archiveFileName from
TabularImportingParserBase and TreeImportingParserBase to
ImportingParserBase so that there's only one copy. These 3 classes are
all part of the internal implementation, so there should be no
compatibility issue.
tfmorris added a commit to tfmorris/OpenRefine that referenced this issue Jul 22, 2020
Fixes OpenRefine#2963
- restore binary compatibility to the API
- hoist the handling of both fileSource and archiveFileName from
TabularImportingParserBase and TreeImportingParserBase to
ImportingParserBase so that there's only one copy. These 3 classes are
all part of the internal implementation, so there should be no
compatibility issue.
wetneb pushed a commit that referenced this issue Jul 23, 2020
* Make sure data directory is directory, not a file

* Add a test for zip archive import

Also tests the saving of the archive file name and source filename

* Add TODOs - no functional changes

* Cosmetic cleanups

* Revert importer API changes for archive file name parameter

Fixes #2963
- restore binary compatibility to the API
- hoist the handling of both fileSource and archiveFileName from
TabularImportingParserBase and TreeImportingParserBase to
ImportingParserBase so that there's only one copy. These 3 classes are
all part of the internal implementation, so there should be no
compatibility issue.

* Revert weird flow of control for import options metadata

This reverts the very convoluted control flow that was introduced
when adding the input options to the project metadata. Instead
the metadata is all handled in the importer framework rather than
having to change APIs are have individual importers worry about
it.

The feature never had test coverage, so that is still to be added.

* Add test for import options in project metadata & fix bug

Fixes bug where same options object was being reused and overwritten,
so all copies in the list ended up the same.
tfmorris added a commit to tfmorris/OpenRefine that referenced this issue Jul 24, 2020
Fixes OpenRefine#2963
- restore binary compatibility to the API
- hoist the handling of both fileSource and archiveFileName from
TabularImportingParserBase and TreeImportingParserBase to
ImportingParserBase so that there's only one copy. These 3 classes are
all part of the internal implementation, so there should be no
compatibility issue.
@tfmorris tfmorris self-assigned this Jul 31, 2020
@tfmorris tfmorris removed the Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators label Jul 31, 2020
@tfmorris tfmorris added this to the 3.5 milestone Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
import About importers in general - add a label for the data format if available Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants