Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔌 plugin_data & IDataDictionaryForm for datastore data dictionary #7971

Closed
wardi opened this issue Dec 8, 2023 · 0 comments · Fixed by #8014
Closed

🔌 plugin_data & IDataDictionaryForm for datastore data dictionary #7971

wardi opened this issue Dec 8, 2023 · 0 comments · Fixed by #8014
Assignees

Comments

@wardi
Copy link
Contributor

wardi commented Dec 8, 2023

Since the introduction of the datastore data dictionary feature, uses for storing different types of data per-column are multiplying. The original feature included:

  • label and notes text fields
  • type override drop-down for the next datapusher/xloader run

preface

plugins can add their own fields to the data dictionary by overriding the template and users can pass any json values they like using the datastore_create API. e.g. open.canada.ca adds:

the upcoming #7882 table designer feature hides the type override field and adds:

  • hidden tdtype value
  • minimum, maximum constraints
  • regex pattern
  • choices
  • etc.

https://github.com/dathere/datapusher-plus loads data into the datastore and needs a place to add qsv column statistics like:

  • detected type
  • data range
  • frequencies and values
  • etc.

We're also working on a feature to allow column reordering which will need a value stored like:

  • column weight/order value

problems

  1. All of these values would have to share the same info dict in each field and there's no way to prevent values from being overwritten or deleted when one plugin updates its values and is unaware of the others.

  2. Any values stored are always returned by datastore_search and datastore_info bloating the response and preventing storing too much data (e.g. frequency tables) this way

  3. No validation is possible for fields sent to info. Users can create any key they like, with any type regardless of the intended use

proposal

Let's divide the column description json into per-plugin values like we have for plugin_data fields. e.g:

top level key second level keys description
ckan label, notes standard human-friendly name+description of column
ckan order override for default ordering of columns
datapusher type_override import type for next datapusher/xloader run
canada label_fr, notes_fr French versions of standard fields our site needs*
tabledesigner tdtype, choices, minumum, ... table designer config
statistics type, minumum, maximum, frequency, ... datapusher+/qsv statistics

* this could be generalized for other sites/languages

Now each plugin has its own namespace for field information.

IDataDictionaryForm

Next we can use custom validation rules to parse field info values passed to datastore_create, clean and possibly return validation errors, then store actual values in the per-plugin column description json as shown above. These validators can choose whether to clear out values not passed so values like statistics can be kept when e.g. only the label or notes values are being changed.

Each plugin can extend the info schema with its own validators and all validators would apply to all field info values passed. The resource and column description objects are available in the validators' context so that a validator can e.g. only run if the url_type applies (table designer config, datapusher type override) and store/update/remove values in the per-plugin description object.

When returning field info values from datastore_search or datastore_info plugins provide a method that converts from the per-plugin column description json values back to a flat info dict to maintain backwards compatibility. We can add an option to datastore_info so users could request specific data that isn't output by default (e.g. statistics data or column order values) and the plugin method could populate info with those additional values.

implementation details

Performance of datastore_search should not be negatively affected, so we'll need to cache the info values returned and not loop through plugins parsing and generating info json on every call. A simple way to do this would be to cache values as a new top level key in the same column description json and have PostgreSQL return only that value for the columns returned.

@wardi wardi changed the title :plug: plugin_data for data dictionary 🔌 plugin_data for data dictionary Dec 8, 2023
@wardi wardi changed the title 🔌 plugin_data for data dictionary 🔌 plugin_data for datastore data dictionary Dec 8, 2023
@wardi wardi changed the title 🔌 plugin_data for datastore data dictionary 🔌 plugin_data & IDataDictionaryForm for datastore data dictionary Dec 8, 2023
@wardi wardi self-assigned this Dec 12, 2023
wardi added a commit that referenced this issue Jan 16, 2024
wardi added a commit that referenced this issue Jan 16, 2024
wardi added a commit that referenced this issue Jan 16, 2024
wardi added a commit that referenced this issue Jan 22, 2024
wardi added a commit that referenced this issue Jan 22, 2024
wardi added a commit that referenced this issue Jan 26, 2024
wardi added a commit that referenced this issue Feb 5, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
wardi added a commit to open-data/ckan that referenced this issue Feb 8, 2024
JVickery-TBS pushed a commit to open-data/ckan that referenced this issue Jun 18, 2024
# Conflicts:
#	ckanext/datastore/blueprint.py
#	ckanext/datastore/logic/action.py
#	ckanext/datastore/templates/datastore/dictionary.html
#	ckanext/example_idatadictionaryform/plugin.py
#	ckanext/example_idatadictionaryform/templates/datastore/snippets/dictionary_form.html
### RESOLVED.
JVickery-TBS pushed a commit to open-data/ckan that referenced this issue Jun 18, 2024
# Conflicts:
#	ckanext/datastore/backend/postgres.py
#	ckanext/datastore/interfaces.py
#	ckanext/datastore/logic/action.py
### RESOLVED.
JVickery-TBS pushed a commit to open-data/ckan that referenced this issue Jun 19, 2024
# Conflicts:
#	ckanext/datastore/blueprint.py
#	ckanext/datastore/interfaces.py
#	ckanext/datastore/logic/action.py
#	ckanext/datastore/templates/datastore/snippets/dictionary_form.html
#	ckanext/example_idatadictionaryform/plugin.py
#	ckanext/example_idatadictionaryform/templates/datastore/snippets/dictionary_form.html
### RESOLVED.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant