Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Catalog exposed as a CKAN dataset #59

Open
jqnatividad opened this issue May 28, 2014 · 2 comments
Open

System Catalog exposed as a CKAN dataset #59

jqnatividad opened this issue May 28, 2014 · 2 comments

Comments

@jqnatividad
Copy link
Contributor

There are several mechanisms like DCAT and data.json to expose the catalog metadata. Why not have some system-generated datasets as well exposed in the CKAN catalog?

This echoes system catalog tables popular in relational databases - pg_class in PostgreSQL, information_schema in MySQL, etc. where you can use the same CKAN API commands to interrogate the CKAN system catalog.

This also helps with common developer issues like looking up the table_id of a dataset in the datastore, etc.

@davidread
Copy link

I think it is great to expose metadata about the catalogue - where it lives, who's responsible, who to email, a bit of text about it, link to the API, link to the latest dump (see #48). Anything else? Some of this is available in /api/util/status e.g. http://data.gov.uk/api/util/status but I like the idea of having a better named API call that returns this info as JSON.

However I'm not sure about putting this catalogue metadata into an automatically created package. Different catalogues have different package schemas and conventions with how you fill out fields like resource format. I think it would be better to let catalogue owners publish their catalogue metadata in their catalog how they wish. We do it here for example: http://data.gov.uk/dataset/data_gov_uk-datasets

BTW I believe doing a VOID description of the catalogue is another way to write the catalogue metadata that linked data types use, although that's probably another kettle of fish to worry about another time.

@jqnatividad
Copy link
Contributor Author

Thanks for your feedback @davidread! I agree that data-publishers should have control of how they publish their metadata, but it would be nice if there is an out-of-the-box way to query the catalog metadata and returned not only as JSON, but also, using the same familiar CKAN UI that even non-technical data consumers can grok.

And its not just the high-level catalog metadata, but some detailed structural metadata about the datasets that the underlying PostgreSQL ALREADY maintains - like the columns in a dataset, the number of rows, the size of the dataset, etc.

And if these views are nothing more than filtered versions of PostgreSQL's system catalog, the beauty of this approach is that this structural metadata is automagically maintained.

Already, there is a _table_metadata view in the datastore and you can create user-friendly aliases (http://docs.ckan.org/en/latest/maintaining/datastore.html?highlight=datastore#resource-aliases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants