Use filename instead of name for id in HarvestList #3082
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We are using the
name
element from file_scan_directory to create source_ids that are stored in the migrate map tables for harvest.name
attempts to remove the file extension (specifically, it uses thebasename
element from PHP'spathinfo()
, so asdf1234.json in the cache would be simply stored as source ID "asdf1234." However, files are not cached with a file extension. In most cases this doesn't matter, but if there is a period (.) in ID, what comes after it can be mistaken for a file extension.So, source ID
knb-lter-jrn.210001001.61
gets cached the the filename "public://dkan-harvest-cache/[harvest-machine-name]/knb-lter-jrn.210001001.61", but when gathered by thegetIdList()
function will then be interpreted as simply the idknb-lter-jrn.210001001
. Harvest will then think this is an orphaned dataset and unpublish it.Since we don't cache either JSON or XML files with extensions, using
filename
should remove this problem with no ill effects.QA Steps
Tests
I did not add a new test for this, it felt like an edge case. Reviewer can lmk if they think a unit test is needed.