Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character Limit With Issue with Urdu Slugs #3514

Closed
andrewfairlie opened this issue Dec 3, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@andrewfairlie
Copy link

commented Dec 3, 2018

Description

In English, we're limited to 255 characters in a slug.

With Urdu, it seems to be significantly less and the error message isn't useful

Could not generate a unique URI based on the URI format.

I suspect that this is because Urdu uses more bytes per character than ASCII characters.

Steps to reproduce

  1. Make a new entry with a very long Urdu title, for example "یہ ایک بہت طویل عنوان ہے یہ ایک بہت طویل عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ واقعی ایک طویل عنوان ہے. ایک بہت طویل عنوان یہ ایک بہت طویل عنوان ہے. یہ واقعی ایک طویل عنوان ہے"
  2. Try to save, you'll see the error

More detail...

A long title

یہ ایک بہت طویل عنوان ہے یہ ایک بہت طویل عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ واقعی ایک طویل عنوان ہے. ایک بہت طویل عنوان یہ ایک بہت طویل عنوان ہے. یہ واقعی ایک طویل عنوان ہے

Is 202 characters so should be sluggable, but because it's 354 bytes (https://mothereff.in/byte-counter) it seems to be disallowed.

Shorten that to 230 bytes (to allow a few bytes for dashes)...

یہ ایک بہت طویل عنوان ہے یہ ایک بہت طویل عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ واقعی ایک طویل عنوان ہے. ا

And it'll work

Additional info

  • Craft version: Craft Pro 3.0.31
  • PHP version: 7.2.12
  • Database driver & version: MySQL 5.7.24
@brandonkelly

This comment has been minimized.

Copy link
Member

commented Dec 3, 2018

There is a validation rule ensuring that slugs and URIs are <= 255 characters so they can fit into their varchar database columns, however the rule should be looking at the byte length rather than character count, so it gets triggered in cases like this where some characters are made up of more than one byte (your sample string is 354 bytes). I’ve fixed that for the next release.

The validation error message still mentions “characters” because most people who run into this won’t be using multi-byte characters and wouldn’t know how to limit to “255 bytes”, but even if it’s not technically accurate, at least it will point you in the right direction.

validation error on a Slug field stating that the slug must be at most 255 characters

@brandonkelly

This comment has been minimized.

Copy link
Member

commented Dec 4, 2018

Take that back. Looking into this further, I was wrong and a varchar(255) column does actually just care that it’s 255 characters and not 255 bytes.

So not totally sure if there’s a good course of action here for us to take. Reopening…

@brandonkelly brandonkelly reopened this Dec 4, 2018

@brandonkelly

This comment has been minimized.

Copy link
Member

commented Dec 4, 2018

One thing you can do is limit the length of the slug in your section’s URI Format. For example:

news/{slug[0:250]}
@brandonkelly

This comment has been minimized.

Copy link
Member

commented Dec 4, 2018

There was already code in place to auto-shrink slugs in order to get the URI down to <= 255 characters, but there were some logic bugs in the code. Fixed now for the next release.

@narration-sd

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2018

Wow, that is a hot one, Brandon. I don't find any doc for this; should there be??

Was watching this one due to experience with languages needing longer sizes...not jumping in though...

@brandonkelly

This comment has been minimized.

Copy link
Member

commented Dec 5, 2018

Seems a little too in the weeds to warrant documentation, especially since generally Craft will just do the right thing going forward.

@narration-sd

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2018

Yeah, I understand -- but also understand from close up for years how much such things bite a multilingual environment. German itself is like 150% of English, still using Latin-1 characters.

The actual saving grace is probably the size of the field vs. generally reasonable slugs -- divided by 2 or 4...

Anyway, thanks for thinking, as ever, Brandon. And I'm having to think very carefully myself what complexity to present, in doc after reducing it as much as possible in the application, you know where. It's going to be the focus-group beta that determines in what form or whether this thing sensibly flies.

You and Brad weren't wrong to think of the alternative of consulting it in, though I can't see how that path with open source would get me out of the hot seat of intense support, exactly -- though, just this moment am thinking again...maybe I missed a point, while falling into the gravitation of long perfecting ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.