Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FileFetcher a separate queue item #4004

Merged
merged 129 commits into from Jan 9, 2024
Merged

Make FileFetcher a separate queue item #4004

merged 129 commits into from Jan 9, 2024

Conversation

paul-m
Copy link
Contributor

@paul-m paul-m commented Aug 22, 2023

  • Modifies ResourceLocalizer so it can perform its function or defer to a queue item.
  • Adds LocalizeQueueWorker, which can perform localization for a dataset within the queue.
  • Adds drush dkan:datastore:localize which is analogous to dkan:datastore:import. It can queue items or perform the localization directly.
  • ResourceLocalizer now sends an event when localization is complete: ResourceLocalizer::EVENT_RESOURCE_LOCALIZED
  • DatastoreSubscriber subscribes to this new event and creates corresponding datastore import queue items in response.
  • Modifies tests to pass where appropriate. Mostly this involves running the new localize_import queue in addition to the datastore_import queue.
  • Some docblock and other CS improvements on files touched by the PR changes.
  • Modifies the docs for readthedocs: https://dkan--4004.org.readthedocs.build/en/4004/index.html

QA:

  • Add a dataset. Suggested source URL: https://demo.getdkan.org/sites/default/files/distribution/cedcd327-4e5d-43f9-8eb1-c11850fa7c55/Bike_Lane.csv
  • Visit the dataset import status page: /admin/dkan/datastore/status You should see that the dataset is not fetched and not stored.
  • Use Drush to list the queue items: drush queue:list
  • There should be one item for the localize_import queue.
  • Run this queue:drush queue:run localize_import
  • Now running drush queue:list should show you one queue item for datastore_import.
  • Visit the import status page again, and it should show you that the fetch has occurred, and that the store is ‘stopped.’
  • Run the datastore_import queue, either with cron or drush queue:run.
  • You should be able to visit the dataset normally and see it. (Ensure you have built and installed the frontend.)

Copy link
Member

@dafeder dafeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue that is I think an actual bug - if the localize_import job encounters a 404, it throws an exception and re-queues. So if there is a typo in one of your URLs, it will be re-queued forever (or at least until you delete the queue manually). Suggest that this also work more like the import job, where exceptions from curl are caught the way mysql errors are, stored in the error field in the jobstore result so that they can be seen in the dashboard, and then the item is considered done with statuis "error".

Copy link
Member

@dafeder dafeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add docs on dkan:datastore:localize

@paul-m
Copy link
Contributor Author

paul-m commented Dec 22, 2023

RE: 404 leading to an endless loop of queued localizations:

Added a case where if localizeTask() returns a Result::ERROR object, we don't throw an exception, so the queue item won't be re-queued.

This also changes the way logging happens, so it improves the output when you localize through Drush.

This is tested in Drupal\Tests\datastore\Kernel\Plugin\QueueWorker\LocalizeQueueWorkerTest::testNotRequeuedOnError()

@paul-m
Copy link
Contributor Author

paul-m commented Dec 22, 2023

Updated the docs.

@paul-m paul-m marked this pull request as ready for review December 22, 2023 18:29
Copy link
Contributor

@jastraat jastraat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall love these changes and tested with PDC successfully.

I also copied the PDC DB from production and ran some harvests to see what would happen and I did run into some errors because it looked like localizer was still looking for the files on the production server versus in the local environment.

@dafeder dafeder merged commit 27dcf6d into 2.x Jan 9, 2024
12 checks passed
@dafeder dafeder deleted the 15780-filefetcher-step branch January 9, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants