Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust edit workflow to be compatible with a default moderation state of draft. #4002

Merged
merged 28 commits into from
Dec 14, 2023

Conversation

jastraat
Copy link
Contributor

@jastraat jastraat commented Aug 17, 2023

DatastoreSubscriber::onPreReference compares the old version of the metadata to the new version of the metadata during dataset preSave and then queues a datastore_import and orphan_reference_processor queue items if there is a change.

However, it's using the node->original property which is incorrectly set to the last PUBLISHED revision of the node, not the LATEST revision. When using a publication workflow with draft datasets that are later published, this results in the datastore_import being queued incorrectly when the dataset draft is published.

In addition, this PR removes a bunch of logic related to triggering datastore imports and orphan reference processing from the presave hook on data dictionaries.

Lastly, this PR checks for orphans from the last published revision when publishing a new revision of a dataset if the latest revision was not published. (e.g. if this was a draft revision being published)

This adds 5 new functional tests to cover the workflows that were not previously covered.

  • Test coverage exists
  • Documentation exists

QA Steps

  • Create a vanilla DKAN site with the sample content.
  • Check the resource settings. ( admin/dkan/resources ) Purge tables and purge files should both checked and delete local resource should be unchecked.
  • Disable the default data dictionary if present. Create a new data dictionary entry setting "price" to numeric and make sure the dictionary node is published. Set it as the new sitewide data dictionary. ( admin/dkan/data-dictionary/settings )
  • Enable last modified as the only triggering property ( admin/dkan/datastore )
  • Set the default moderation state to draft. ( admin/config/workflow/workflows/manage/dkan_publishing )
  • Using the dataset "Gold Prices in London 1950-2008 (Monthly)" with identifier 5dc1cfcf-8028-476c-a020-f58ec6dd621c
  • Make sure the original resource file is present in sites/default/files/distribution/5dc1cfcf-8028-476c-a020-f58ec6dd621c/data.csv (source path) and also in the original resource directory e.g. sites/default/files/resources/0821c20012819a030c6fc33f6b329643_1687525521/data.csv
  • Make sure the queue table is empty using ddev drush queue:list (all zeros)
  • ddev drush dkan:dataset-info 5dc1cfcf-8028-476c-a020-f58ec6dd621c and make a note of the existing datastore table name and file_path. Should only have "latest revision".
  • Edit the dataset through the node form UI. Append a "2" to the dataset title and the distribution title for easy tracking. Change the last modified date. Save the dataset as a draft revision.
  • Confirm that a single draft distribution node was created: admin/dkan/datasets?title=&data-type=distribution&status=All&moderation_state=All
  • Confirm that a single draft revision was added on the dataset node.
  • ddev drush queue:list should return a single item in each of the following queues: datastore_import, orphan_reference_processor, and resource_purger
  • ddev drush cron to run all the queues
  • ddev drush queue:list to confirm all queues are complete (all zeros). If necessary run cron again for post_import queue.
  • ddev drush dkan:dataset-info 5dc1cfcf-8028-476c-a020-f58ec6dd621c and make a note of the datastore table name and file_path for both latest_revision (should be draft) and published_revision.
  • Confirm that both datastore tables and both file_path directories are still present.
  • Confirm that the new datastore table has a decimal type on the price column.
  • Publish the latest draft revision of the dataset using the content admin form ( admin/content ), check the box next to the dataset, and choose "publish latest revision".
  • ddev drush queue:list to confirm queue items for only orphan_reference_processor and resource_purger.
  • ddev drush dkan:dataset-info 5dc1cfcf-8028-476c-a020-f58ec6dd621c and confirm latest_revision is published with a datastore table and still importer_status of done, 100%
  • ddev drush cron to run all the queues
  • Confirm that there are only 2 distribution nodes associated with the dataset and that the one with the "2" appended to the title is now published and the old one is orphaned.
  • Visit the dataset page and confirm that the updates were applied to the live version with data still present.
  • Confirm that the original datastore table and the original file in the original file_path were dropped/removed.

@jastraat jastraat changed the title Compare with latest revision for changes for datastore_import Compare changes from latest revision to decide to do another datastore_import Aug 17, 2023
@jastraat jastraat changed the title Compare changes from latest revision to decide to do another datastore_import Compare with latest revision when deciding to do another datastore_import Aug 17, 2023
@jastraat jastraat changed the title Compare with latest revision when deciding to do another datastore_import Adjust edit workflow to be compatible with a default moderation state of draft. Sep 7, 2023
@jastraat jastraat marked this pull request as draft September 18, 2023 16:12
@jastraat
Copy link
Contributor Author

jastraat commented Sep 18, 2023

Unfortunately when testing this with PDC, PDC's custom publication approach is creating draft distributions on publish. ARGH. Possibly related to the fact that PDC is using the local url resource download url display, so this may be a unique problem when the last published version of a dataset had a different local url (e.g. an Acquia server).

@paul-m
Copy link
Contributor

paul-m commented Nov 1, 2023

I think we can mark the too-many-methods codeclimate error in NodeWrapper\Data as won't fix. Let's strategize about this and related workflow issues.

@jastraat jastraat force-pushed the getoriginal_latestrevision branch 2 times, most recently from e071842 to bfedeb9 Compare November 20, 2023 21:47
@jastraat jastraat force-pushed the getoriginal_latestrevision branch 2 times, most recently from b54f9c8 to 3a4d6d2 Compare November 29, 2023 17:04
@jastraat jastraat marked this pull request as ready for review December 4, 2023 20:25
@jastraat jastraat requested a review from paul-m December 5, 2023 00:44
@jastraat jastraat mentioned this pull request Dec 5, 2023
15 tasks
Copy link
Member

@janette janette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'm not sure why the resource directories don't get removed but that is out of scope of the ticket.

Copy link
Member

@dafeder dafeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think there's just a lot going on in the metastore that shouldn't be necessary, but for now this appears to fix the pressing issues and improve test coverage. The cost is more complexity but I think its a good trade off to fix the draft workflow and we can think about a higher-level refactor later down the line.

modules/metastore/src/LifeCycle/LifeCycle.php Show resolved Hide resolved
modules/metastore/src/NodeWrapper/Data.php Show resolved Hide resolved
modules/metastore/src/NodeWrapper/Data.php Outdated Show resolved Hide resolved
modules/metastore/src/NodeWrapper/Data.php Outdated Show resolved Hide resolved
tests/src/Functional/DatasetBTBTest.php Show resolved Hide resolved
@paul-m
Copy link
Contributor

paul-m commented Dec 13, 2023

Ran through a basic test run with OP and this branch seems OK.

@dafeder dafeder merged commit 554f2c4 into 2.x Dec 14, 2023
12 checks passed
@dafeder dafeder deleted the getoriginal_latestrevision branch December 14, 2023 15:55
@stefan-korn
Copy link
Contributor

Question about this change:

It seems to me since this change you can not edit the download URL of a distribution directly, but only via the dataset.

Am I right about that? Is that intended?

@janette
Copy link
Member

janette commented Jan 26, 2024

@stefan-korn correct, all user interaction is handled via the dataset form (or metastore API). The separate distribution node is a referenced artifact for use in the internal workings. Changing the downloadURL will generate a NEW distribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants