Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle missing harvest source without mass un-publishing all content from the source. #2829

Merged
merged 7 commits into from
Mar 12, 2019

Conversation

janette
Copy link
Member

@janette janette commented Feb 19, 2019

connects #2824

When a harvest source is temporarily unavailable, all content from that harvest is unpublished and marked as 'orphaned'. On large harvests this can mean huge amounts of unnecessary processing that fails to finish before the next harvest and the source is available again.

Let's instead just leave the content as-is. If the source is truly gone, the catalog maintainer can delete the content via the Harvest UI.

QA Steps

missing source

  1. Create a harvest source from https://s3.amazonaws.com/dkan-default-content-files/files/data_harvest_orphan_test.json, and run the harvest.
  2. Log in to S3 and make the file private
  3. Re-run the harvest, note that rather than unpublishing all of the datasets, you only get the error message:

Items to import is 0. Looks like source is missing. No updates will be made at this time.

Actual change in the source

  1. Create a harvest source on a flavor build (http://janette.dkandemo.nuamsdev.com/data.json) and run the harvest.
  2. Go to the source site and delete a dataset.
  3. Run the harvest again
  4. Confirm that the upstream deleted dataset was unpublished and marked as an 'orphan'

Reminders

  • There is test for the issue.
  • Coding standards checked.
  • Review docs.getdkan.com (or in /docs) to see if it still covers the scope of the PR and update if needed.

@dafeder
Copy link
Member

dafeder commented Feb 21, 2019

This PR will prevent all orphaning from happening, so isn't quite there yet. What we need to do is have the orphan function skip only if the source is missing. This means:

Ideally, if caching fails, the cache for this source should some how be flagged to be in a failed state, which would cause HarvestMigrate::processImport() to return FALSE, and therefore skip all postMigrate functionality.

Another option would be to simply have processImport() return FALSE whenever the cache is empty, which would include instances where a filter had removed all datasets. This would be less than ideal because presumably if a filter were to filter out all datasets from a source, orphaning everything would be the desired behavior.

@janette
Copy link
Member Author

janette commented Mar 5, 2019

@dafeder Fixed! this version will now allow normal orphaning when needed.

@janette janette changed the title Skip the missing source operations that unpublishes harvested content Handle missing harvest source without mass un-publishing all content from the source. Mar 5, 2019
@dafeder dafeder merged commit 59c11dc into 7.x-1.x Mar 12, 2019
@dafeder dafeder deleted the 2824-missing-source branch March 12, 2019 21:49
dafeder pushed a commit that referenced this pull request Apr 24, 2020
…from the source. (#2829)

* Avoid postImport steps if the harvest source uri is unavailable

* Adjust warning message

* Replace file_get_contents with cURL in dkan_harvest_datajson_cache

* clean up

* Fixed failing phpunit tests.

* Fixed test harvest sources to use http.

* Fixed coding standards.
dafeder pushed a commit that referenced this pull request Apr 24, 2020
…from the source. (#2829)

* Avoid postImport steps if the harvest source uri is unavailable

* Adjust warning message

* Replace file_get_contents with cURL in dkan_harvest_datajson_cache

* clean up

* Fixed failing phpunit tests.

* Fixed test harvest sources to use http.

* Fixed coding standards.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants