Fix #272 (wrong slugs for non-latin chars) #273

Closed
wants to merge 2 commits into
from

Projects

None yet

4 participants

@spapas
Contributor
spapas commented Oct 11, 2014

This PR resolves #272.

@spapas
Contributor
spapas commented Oct 19, 2014

I'd forgoten to include Unidecode to install_requires of setup.py - now the tests pass.

@fizista
fizista commented Dec 12, 2014

License Package "Unidecode" would require changing the license taggit package under the GPL. And besides, I have a problem after applying this patch.

@spapas
Contributor
spapas commented Dec 12, 2014

Using a GPL program (without including it anywhere) will require to change the license of django-taggit ? Are you sure about that ?

Also, what problem do you have ?

@fizista
fizista commented Dec 12, 2014

Read the second point of the license.
http://www.gnu.org/licenses/gpl-2.0.html

If the library was under the LGPL license you do not need to change the current library license.
http://fosswire.com/post/2007/04/the-differences-between-the-gpl-lgpl-and-the-bsd/

@fizista
fizista commented Dec 14, 2014

A good solution would be to implement this library here:
https://github.com/un33k/django-uuslug

@spapas
Contributor
spapas commented Dec 15, 2014

@fizista django-uuslug seems like a very nice solution, however if you check its requirements, it uses python-slugify (https://github.com/un33k/python-slugify/) which ... (guess what)

also uses Unidecode: https://github.com/un33k/python-slugify/blob/master/requirements.txt

Also, for reference, django uses a simple javascript search-and-replace algorithm in the admin prepopulate_fields which contains all the replacements it will do to create unicode slugs (https://github.com/django/django/blob/master/django/contrib/admin/static/admin/js/urlify.js) , for example

var GREEK_MAP = {
    'α':'a', 'β':'b', 'γ':'g', 'δ':'d', 'ε':'e', 'ζ':'z', 'η':'h', 'θ':'8',

However, this is in javascript!

I think that Unidecode is the only way to create unicode-compliant slugs in python because it contains a complete list of transliterations of unicode characters to latin ones (much more complete than the one django-admin uses) - for example, here are the greek letters: https://github.com/iki/unidecode/blob/master/unidecode/x003.py (from 0x91 = Α, Β, Γ etc)

What could we do to enable unicode slugs ?

Here's a thought: If I removed unidecode from the dependencies of setup.py (so unidecode wouldn't be a requirement of djangoi-taggit) and just do the following in models.py:

slug = default_slugify(tag)
try:
    from unidecode import unidecode
    slug = default_slugify(unidecode(tag))
except:
    slug = default_slugify(tag)

(meaning if the user has installed unidecode then use it to transliterate the tag, if not fail silently and use tag as is). This way, django-taggit would just use the public API of unidecode if the user has installed it himself and wouldn't require it at all. Would this also violate the GPL ?

@fizista
fizista commented Dec 15, 2014

You're right. Licenses sometimes make me crazy.

I found some other solution:
https://github.com/mozilla/unicode-slugify/

The only question is whether we can in URLs, place a non-ASCII characters, which, however, are letters.

If the Mozilla Foundation use this library in your own projects, so I guess there is no problem?

@spapas
Contributor
spapas commented Dec 15, 2014

The unicode-slugify library seems to be ok concerning the license. However, I don't really like the way it works:

From what I've seen, it just converts the unicode strings to unicode slugs by removing spaces, dots etc. It won't do any transliteration of unicode characters to latin (for instance α το a or γ to g etc). Django-admin does not work like this (it works as I've explained in my previous answer) since it transliterates the characters to their latin counterparts.

Also, another reason against the use of unicode characters in slugs is that when you copy the slug from the browser, it will copy the unicode entities representation of the slug. What I mean is that if I go to the url: http://example.com/δοκιμη and then try to copy that URL, I will receive http://example.com/%CE%B4%CE%BF%CE%BA%CE%B9%CE%BC%CE%B7 when I paste it (!)

Of course, some sites do use unicode characters in URLs (for instance, the greek version of wikipedia http://el.wikipedia.org/wiki/%CE%86%CE%BB%CF%86%CE%B1) however I still think that we need to be consistent with django. So I recommend finding out a different solution...

@spapas spapas referenced this pull request in django-crispy-forms/django-crispy-forms Jan 15, 2015
Closed

Tab layout not working when tab name is unicode #396

@yohanboniface

Is there any official status of this PR from the maintainers?

I'm interested too to have it merged. And I also think that using unidecode only if installed (as suggested in #273 (comment)) could be a nice way to keep the default installation light and without deps, and to prevent any licence change issue.

@yohanboniface yohanboniface referenced this pull request in ideascube/ideascube Mar 30, 2015
Open

arabic tags have no slug #118

@yohanboniface yohanboniface added a commit to ideascube/ideascube that referenced this pull request Mar 30, 2015
@yohanboniface yohanboniface Monkeypatch TagBase.slugify to allow non ascii chars (cf #118)
It's possible to override the Tag model used by taggit, but this
means customizing two models, and so having new table in database
(or hardcoding their name to match the current taggit ones). And I
prefer not to go such a complex way for only overriding a slugify
method.
Let's wait for alex/django-taggit#273 or
any other option without impact on DB.
e67d282
@frewsxcv
Collaborator
frewsxcv commented Apr 1, 2015

It'd be great if this could be a feature one could opt-in to instead of having a library that everyone has to install. Maybe some way for the user to specify a function they want to use to generate a slug? Also, I'm not really a fan of adding a GPL dependency to a BSD licensed library.

@spapas
Contributor
spapas commented Apr 1, 2015

@frewsxcv I proposed on my 15 December 2014 comment (#273 (comment)) a solution that doesn't require install of unidecode (use a try/catch block to check if unidecode is installed and fall back to normal slugify if it's not been installed) however I don't want to update my PR until I know it'll be merged ☺️

@frewsxcv
Collaborator

Anyone have opinions on #315 before I merge it?

@spapas
Contributor
spapas commented Jun 23, 2015

@frewsxcv 👍

@frewsxcv
Collaborator

#315 has been merged

0.15.0 has been released which includes it https://github.com/alex/django-taggit/blob/develop/CHANGELOG.txt#L4

@frewsxcv frewsxcv closed this Jun 24, 2015
@frewsxcv
Collaborator

Also, if anyone wants to go above and beyond, we'd probably eventually want a regression test and also a mentioning of the optional dependency (unidecode) in the documentation. If no one else gets around to it, I can take care of it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment