-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slugify can only handle ASCII chars #9
Comments
As a person who regularly deals with modified latin vowels yourself, which one do you prefer? I was thinking of doing no 4), but with an extra option in Alternatively, we could let the user set their own desired character replacement scheme in |
Personally, I would prefer no 4, as it would not lose information, and slugs would still be sensible words. A slug that is "Khe" versus "Kuehe" (cows) is basically nonsensical. The only reason speaking against it is the additional regex rules, ie time consumption. |
I agree with you on no. 4). However, I do think that we need to include an extra configuration setting, because of the following. The number of non-ascii vowels / consonants which can be replaced with one or several ascii-characters can be large. If we do a replacement for each of them, that would waste a lot of CPU cycles. I also suspect each user will only be concerned with a subset of these non-ascii characters, making this a bigger problem. With an extra configuration setting, we can just let Volt do a simple There is the drawback that anyone who wishes to use the non-ascii vowels / consonants will have to set their own rules, which could be repetitive. However, this may be solved by including a language-specific defaults, so the user can specify their desired replacement specifically or just specify a language-dependent scheme. In addition, I was also thinking that we might not need to do regex. I don't know the speed comparison exactly (should look it up), but we can also do a simple EDIT: I made a new branch to play around with the idea, and it seems to work ok for now (all tests passed). |
I have now added some code comments to dac134d#commitcomment-1243896 . Looks good in general. Thanks! |
Replied there as well. I'll push my fixes soon :). |
So far slugify can only work when the title consists of ASCII only (it removes some special chars, though). Being a German I tend to have a few Umlauts in my titles though and volt bombs out. I can see a few ways to handle this better from the volt side of things:
Personally, I think #2 might be the easiest solution.This seems also in line with what e.g. django does on slugify:
"Converts to lowercase, removes non-word characters (alphanumerics and underscores) and converts spaces to hyphens. Also strips leading and trailing whitespace."
If RE_PRUNE would strip all non-ascii chars, this would also make the whole non-ascii check in lines 459-469 redundant.
The text was updated successfully, but these errors were encountered: