CachedInstanceLoader defaults to empty when import_id is missing #1225
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
CachedInstanceLoader tries to clean the import_id field, even when it is not present in the inbound dataset, resulting in a KeyError thrown in fields.py. When using ModelInstanceLoader, an absent import_id column is allowed and results in the entries all being new, which is (to my mind) the most desirable behavior.
Solution
Refactor CachedInstanceLoader to initialize with an empty cache when it gets a dataset without the import_id column. get_instance is then never called by the Resource since it checks whether the import_id_fields are present in the inbound dataset before calling get_instance.
Acceptance Criteria
Added a unit test which demonstrates the KeyError in the previous code and shows the correct initialization of CachedInstanceLoader in the new code.
Documentation is not necessary since this makes no change to a public API. Code using CachedInstanceLoader which has implemented a workaround for this problem should still work since most workarounds will probably have implemented some hook to guarantee the presence of the import_id field. In the unlikely event that someone is explicitly checking for a KeyError and basing their behavior on that, they will have to update their code.