Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CKAN Harvester option to include/exclude organizations #169

Merged
merged 3 commits into from
Oct 27, 2015

Conversation

davidread
Copy link
Contributor

This PR replaces #168 because that one merged to the tester branch.

@davidread
Copy link
Contributor Author

@amercader any comment on this one?

@amercader
Copy link
Member

AFAICT this uses the old REST API. I wonder if once #158 gets merged we can instead add a generic "filter" option which essentially allows you to pass any package_search filter that gets added to the request made as fq, eg:

# Only include this org:
{"filter": "organization:arkansas-gov"} 
{"filter": "+organization:arkansas-gov"}

# Exclude this org:
{"filter": "-organization:arkansas-gov"}

But it also will support any arbitrary search, eg:

{"filter": "organization:arkansas-gov OR organization:alabama-gov"} 
{"filter": "tags:to-be-harvested"} 
{"filter": "res_format:CSV"} 
...

It seems not worth it having the special case of handling orgs if we can implement the generic filter (and then document specific examples of the most common use cases, including importing and excluding orgs).

What do you think?

@davidread
Copy link
Contributor Author

I'm not sure about embedding SOLR syntax for this purpose. It's harder to write the config string, isn't validated, can vary between CKAN instances, difficult to replace the implementation etc, whilst clearly it is more flexible. I guess you're going to have to look up the docs anyway when you add a filter, so it might as well be in SOLR syntax than some long json key. I'm not convinced either way, but if you think this has merit then am happy to go with it.

@amercader
Copy link
Member

I'm definitely happy to keep the organizations_filter_include and organizations_filter_exclude options to make it easier to configure. Internally I think they should use search filters once #158 gets merged. I think it will be more efficient directly getting the relevant dataset dicts than getting all the datasets ids and exclude or include the ones on the org datasets list.

@davidread
Copy link
Contributor Author

Ok sounds good. Do you want to merge this then and then we can update #158 ?

@amercader amercader merged commit bc49149 into master Oct 27, 2015
@amercader
Copy link
Member

Merged!

@davidread davidread deleted the include-exclude-org branch October 27, 2015 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants