Use filename instead of name for id in HarvestList #3082

dafeder · 2020-05-08T13:50:18Z

We are using the name element from file_scan_directory to create source_ids that are stored in the migrate map tables for harvest. name attempts to remove the file extension (specifically, it uses the basename element from PHP's pathinfo(), so asdf1234.json in the cache would be simply stored as source ID "asdf1234." However, files are not cached with a file extension. In most cases this doesn't matter, but if there is a period (.) in ID, what comes after it can be mistaken for a file extension.

So, source ID knb-lter-jrn.210001001.61 gets cached the the filename "public://dkan-harvest-cache/[harvest-machine-name]/knb-lter-jrn.210001001.61", but when gathered by the getIdList() function will then be interpreted as simply the id knb-lter-jrn.210001001. Harvest will then think this is an orphaned dataset and unpublish it.

Since we don't cache either JSON or XML files with extensions, using filename should remove this problem with no ill effects.

QA Steps

Create a harvest from https://api.jsonbin.io/b/5eb59eb9a47fdd6af15fd015 - it should bring in 4 datasets.
Delete one of the datasets
Re-run the Harvest. The deleted dataset should come back

Tests

I did not add a new test for this, it felt like an edge case. Reviewer can lmk if they think a unit test is needed.

Use filename instead of name for id in HarvestList

f87eadc

dafeder added the 1.x DKAN classic label May 8, 2020

dafeder assigned dharizza May 8, 2020

Remove repeat getIdList() calls (better debugging)

09f5595

dharizza approved these changes May 11, 2020

View reviewed changes

dharizza merged commit 8a12c74 into 7.x-1.x May 11, 2020

dharizza deleted the harvest-id-name branch May 11, 2020 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use filename instead of name for id in HarvestList #3082

Use filename instead of name for id in HarvestList #3082

dafeder commented May 8, 2020 •

edited

Use filename instead of name for id in HarvestList #3082

Use filename instead of name for id in HarvestList #3082

Conversation

dafeder commented May 8, 2020 • edited

QA Steps

Tests

dafeder commented May 8, 2020 •

edited