Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove extra / in feed ids. #3026

Closed
wants to merge 2 commits into from
Closed

Remove extra / in feed ids. #3026

wants to merge 2 commits into from

Conversation

k-nut
Copy link
Contributor

@k-nut k-nut commented May 17, 2016

Fixes #3017

Proposed fixes:

The resource_path that is passed to the _create_atom_id function seems to always start with a /. I did not want to manually strip that though because at some point the value passed here might be different. Fortunately the urlparse.urljoin function removes duplicate slashes when joining urls so this should work

The test I added most certainly is not in the correct file (the other three are all touching the api while this is testing a private function) but I did not know where to put it.

Feedback on this is most welcome.

PS: I also marked that this is touching the api. Judging from the docstring the specification seems to be very strict about identifiers never changing. If we merge this change though the ids in the feed will change though (because they do not have the duplicate slash anymore). Not sure how to handle this.

Features:

  • includes tests covering changes
  • includes updated documentation
  • includes user-visible changes
  • includes API changes
  • includes bugfix for possible backport

@@ -145,7 +145,7 @@ def _create_atom_id(resource_path, authority_name=None, date_string=None):
# This is best we can do, and if the site_url is not set, then
# this still results in an invalid feed.
site_url = config.get('ckan.site_url', '')
return '/'.join([site_url, resource_path])
return urlparse.urljoin(site_url, resource_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not quite right. We need to use url_for here so that the url is correctly built including site_url and root_path settings. This fix will only work for sites without a root_path setting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then we do not pass resource_path to this function anymore but pass some more data and create the actual path (or rather full url) in this function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds right. IIUC The intent of this code is to generate a url back to a feed controller method, so we should be able to use the parameters passed to that controller method and the name of the controller to call url_for to generate the correct url

@k-nut
Copy link
Contributor Author

k-nut commented May 18, 2016

@wardi it makes me wonder if the _create_atom_id function is even neccessary in its current form. It takes the authority_name=None, date_string=None, parameters but they are never passed (so always None). Could we just replace that with a single **kwargs and pass those kwarguments to the url_for method directly.
So the function would be

def _create_atom_id(**kwargs):
    authority_name = config.get('ckan.feeds.authority_name', '').strip()
    if not authority_name:
        site_url = config.get('ckan.site_url', '').strip()
        authority_name = urlparse.urlparse(site_url).netloc

    if not authority_name:
        log.warning('No authority_name available for feed generation.  '
                    'Generated feed will be invalid.')

    date_string = config.get('ckan.feeds.date', '')

    if not date_string:
        log.warning('No date_string available for feed generation.  '
                    'Please set the "ckan.feeds.date" config value.')
        return h.url_for(**kwargs)

    tagging_entity = ','.join([authority_name, date_string])
    resource_path = h.url_for(**kwargs)
    return ':'.join(['tag', tagging_entity, resource_path])

And we would for example call it like this:

 guid = _create_atom_id(controller='feed',
                        action='group',
                        id=obj_dict['name'])

That would still make sure that the date string and authority are taken from the config if specified.
Or is there a reason for keeping the two parameters that seem to never be passed?

@k-nut
Copy link
Contributor Author

k-nut commented May 18, 2016

An implementation of the behavior described above can be found here: https://github.com/ckan/ckan/compare/master...k-nut:3017-remove-double-slash-in-feed-ids?expand=1

@wardi wardi self-assigned this May 19, 2016
@k-nut
Copy link
Contributor Author

k-nut commented May 23, 2016

As agreed in #3017 we decided that this should not be changed as it would create a new id for all datasets which is not desirable.

@k-nut k-nut closed this May 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feeds has bug (extra '/' added to <id>)
2 participants