Skip to content

Commit

Permalink
Flesh out the parameterize method to support non-ascii text and under…
Browse files Browse the repository at this point in the history
…scores.
  • Loading branch information
NZKoz committed Sep 11, 2008
1 parent 46bac29 commit 1ddde91
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
2 changes: 1 addition & 1 deletion activesupport/lib/active_support/inflector.rb
Expand Up @@ -257,7 +257,7 @@ def demodulize(class_name_in_module)
# <%= link_to(@person.name, person_path %>
# # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>
def parameterize(string, sep = '-')
string.gsub(/[^a-z0-9]+/i, sep).downcase
string.chars.normalize(:kd).to_s.gsub(/[^\x00-\x7F]+/, '').gsub(/[^a-z0-9_\-]+/i, sep).downcase
end

# Create the name of a table like Rails does for models to table names. This method
Expand Down
5 changes: 4 additions & 1 deletion activesupport/test/inflector_test_cases.rb
Expand Up @@ -144,7 +144,10 @@ module InflectorTestCases

StringToParameterized = {
"Donald E. Knuth" => "donald-e-knuth",
"Random text with *(bad)* characters" => "random-text-with-bad-characters"
"Random text with *(bad)* characters" => "random-text-with-bad-characters",
"Malmö" => "malmo",
"Garçons" => "garcons",
"Allow_Under_Scores" => "allow_under_scores"
}

UnderscoreToHuman = {
Expand Down

8 comments on commit 1ddde91

@henrik
Copy link
Contributor

@henrik henrik commented on 1ddde91 Sep 11, 2008

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Shouldn’t the to_s go right after “string”, though?

@tarmo
Copy link
Contributor

@tarmo tarmo commented on 1ddde91 Sep 11, 2008

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_s is to convert the Multibyte::Chars back to a string after normalization.

@henrik
Copy link
Contributor

@henrik henrik commented on 1ddde91 Sep 11, 2008

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tarmo: Ah, right. A to_s after “string” would make it more robust for input like nil or numbers, but that might not be desired.

@NZKoz
Copy link
Member Author

@NZKoz NZKoz commented on 1ddde91 Sep 12, 2008

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure the nil safety is warranted. 99.999% of people will call this with String#parameterize, not Inflector.parameterize…

@tomstuart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should also collapse multiple occurrences of the separator (‘foo—-bar’ => ‘foo-bar’) and strip leading/trailing occurrences (‘foo-bar’ => ‘foo-bar’).

@Manfred
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of considerations. When $KCODE isn’t set to UTF-8 in Ruby <= 1.8.6 this will break because normalize isn’t defined on String. Parameterizing non-ASCII strings results in a blank string: ‘おはよ’.parameterize => ‘’. I know that non of the other inflector methods support non-ASCII characters, what’s the verdict on this?

@henrik
Copy link
Contributor

@henrik henrik commented on 1ddde91 Sep 12, 2008

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated Slugalizer based on some of the code traded in the parameterize comments. The biggest change was that is now turns e.g. “foo@bar.com” into “foo-bar-com” instead of “foobarcom” – but it still squeezes multiple separators and removes leading/trailing separators, so " ! foo—dash@bar.com ! " becomes “foo-dash-bar-com”.

I think the current version of Slugalizer has no downsides compared to the current version of parameterize, but it also handles the stuff tomstuart mentioned. It also works with other $KCODEs than ‘u’, that I can tell.

While I do think it’s good to keep it lean, if this method should be present at all, it might as well be as good as it can be – at least as long as it’s just a matter of another short line or two of code.

Regarding the blank string, I think that’s perfectly reasonable. It would certainly be more useful if Japanese etc were transcribed, but I think then we’re firmly in plugin country (see Stringex).

@karmi
Copy link
Contributor

@karmi karmi commented on 1ddde91 Sep 12, 2008

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, NZKoz!

Also check this ticket

Please sign in to comment.