New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Queries (CREATE TABLE AS in datastore) #3816

Open
wants to merge 49 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@wardi
Contributor

wardi commented Sep 15, 2017

from ckan/ideas-and-roadmap#200

Add the option to build a resource from an SQL query instead of an uploaded or linked file, internally implemented as PostgreSQL materialized views

q1

Queries are the same SQL that users can send to datastore_search_sql

q2

This will solve several problems:

  • make existing visualization extensions more useful. The problem with the low usability of the existing viz extensions is that vizzes in general are meant to work against aggregated data, not raw observation-level data
  • allow us to do joins across related packages/resources
  • allow us to publish public views of private datasets with private columns removed
  • and make it all performant as the queries are "materialized"

This feature can be thought of as a performance/usability improvement over what users can do today with datastore_search_sql and datastore_create. A query in a resource may only be modified if the user requesting the change has access to make the same query with datastore_search_sql. The same is true for refreshing the underlying data: the user requesting the refresh must have permission to run the same query.

This way we're not introducing any new ways for less privileged users to expose data that they shouldn't have access to.

Setting up triggers to automatically refresh the underlying views and/or adding jobs for this purpose are sensible next steps but not included in this initial PR.

@wardi wardi added the WIP label Sep 15, 2017

@wardi wardi force-pushed the 3816-query-based-views branch from c699c29 to bfe02d2 Oct 11, 2017

@wardi

This comment has been minimized.

Contributor

wardi commented Oct 11, 2017

This PR includes two other PRs not yet merged into master, so they should be reviewed first: #3865 and #2562

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

jguo144 added a commit to OpenGov-OpenData/ckan that referenced this pull request Jul 5, 2018

@amercader

@wardi I found some time to revisit this, and left a few comments

@@ -55,17 +55,20 @@ CREATE OR REPLACE VIEW "_table_metadata" AS
substr(md5(dependee.relname || COALESCE(dependent.relname, '')), 0, 17) AS "_id",

This comment has been minimized.

@amercader

amercader Aug 24, 2018

Member

Can you summarize what the changes in the permissions mean here? It's a bit hard to follow.

Will all existing DataStore users need to re-run this against their existing database?

This comment has been minimized.

@wardi

wardi Aug 25, 2018

Contributor

I should actually remove this change, this just makes materialized views visible in _table_metadata but the PR is no longer using materialized views

ckan.datastore.resource_query.enabled = False
Default value: ``True``

This comment has been minimized.

@amercader

amercader Aug 24, 2018

Member

I'm not sure how I feel about having this feature enabled by default. It is definitely an advance feature and I think most users will be confused by it. My vote is for defaulting it to False.

This comment has been minimized.

@wardi

wardi Aug 25, 2018

Contributor

I'm fine with that.

query at any time from the resource preview screen.<br/>
<b>Note:</b> This query will run
with your permissions but the data will be published and visible
along with other data in this dataset.{% endtrans %}

This comment has been minimized.

@amercader

amercader Aug 24, 2018

Member

Does this mean that considering

  • Dataset A, public
  • Dataset B, private
  • User X, with admin permissions on Dataset B

If user X is creating a resource query in dataset A that includes data from a resource in dataset B, the data from that new resource query will be publicly accessible under dataset A?

If so there's a potential for users exposing private data without realizing. The message should reflect this better, and there should be a warning in the documentation (datastore.rst)

This comment has been minimized.

@wardi

wardi Aug 25, 2018

Contributor

Agreed, can you suggest some better text?

A query could be used to publish from a private table while removing all PII information, so this isn't always a drawback. Also there's nothing new being allowed here that couldn't already be done by the same user with datastore_search_sql and datastore_upsert.

This comment has been minimized.

@amercader

amercader Aug 27, 2018

Member

Also there's nothing new being allowed here that couldn't already be done by the same user with datastore_search_sql and datastore_upsert

True, but this feature makes it much easier to do, and so more prone to accidents.

What about something like:

The query will be run using your permissions but the data will be published using this dataset visibility settings, regardless of the permissions on other resources used. Keep this in mind when querying private datasets.

This comment has been minimized.

@wardi

wardi Aug 27, 2018

Contributor

Sounds much better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment