Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove rules from Slugify and move them into files #81

Merged
merged 5 commits into from
Nov 2, 2015

Conversation

florianeckerstorfer
Copy link
Member

There are two reasons for removing the rules from the Slugify class. First of all, as more and more rules are added (which is great, because it makes the library better) the class gets longer and confusing and harder to maintain and extend. The second big reason is that the current system was not very flexible. We have a lot of problems with conflicting rules in different rules (e.g. #56 or #80) and we want to provide an easy way to change the default rules (or language).

There is a generate-rules.php script that reads the JSON files and updates a DefaultRuleProvider which allows Slugify to still be very fast.

In addition this PR cleans up a lot of stuff, this is required to keep the code clean, maintainable, readable and testable when we add features in the future.

The regular expression has been moved into the options array and a new parameter was added to the constructor called provider. A provider is a class that implements RuleProviderInterface, which defines a getRules() method. The method expects a ruleset parameter and the provider should return the rules for the given ruleset. If no provider is given than an instance of DefaultRuleProvider is created. This class is generated based on the content of the JSON files in Resources/rules, however, the provider just provides the rules, they are not activated by default.

Rulesets (or languages) are activated based on the new rulesets option. Every ruleset mentioned in this array is activated. The order of the array is important, rules in sets mentioned later can overwrite earlier rulesets. By default the following rulesets are activated:

  • default
  • burmese
  • georgian
  • vietnamese
  • ukrainian
  • latvian
  • finnish
  • greek
  • swedish
  • czech
  • arabic
  • turkish
  • polish
  • german
  • russian

If you need Turkish (ÄA) transliterations instead of German (ÄAE) you have two possibilities:

A) Pass the rulesets option to the constructor

$slugify = new Slugify(['rulesets' => [
      'default',
      'burmese',
      'georgian',
      'vietnamese',
      'ukrainian',
      'latvian',
      'finnish',
      'greek',
      'swedish',
      'czech',
      'arabic',
      'polish',
      'german',
      'russian',
      'turkish', // Turkish after German
]);

Or B) you can activate the Turkish ruleset after initialisation.

$slugify = new Slugify();
$slugify->activateRuleset('turkish');

This would solve the problems described in #56 and #80 and allows for additional, language specific rules. For example, I added the ruleset austrian which replaces ß with sz instead of ss (in German).

I would appreciate comments and feedback for this PR very much, since a lot of stuff has been changed. We also need to go through the rules currently in default and move them into rulesets for the specific language (e.g., french, or spanish are missing). I am aware that some rules would be duplicated through this, but I think this situation is better than it was in the past.

Florian Eckerstorfer added 2 commits October 15, 2015 16:00
There is a generate-rules.php script that reads the
JSON files and updates a DefaultRuleProvider which allows
Slugify to still be very fast.

In addition this PR cleans up a lot of stuff, this is required to
keep the code clean, maintainable, readable and testable when we
add features in the future.
@florianeckerstorfer florianeckerstorfer added this to the 2.0 milestone Oct 15, 2015
@florianeckerstorfer
Copy link
Member Author

ping @mac2000

What do you think about these changes??

@mac2000
Copy link
Contributor

mac2000 commented Oct 25, 2015

To me it is totally OK, just because we are not breaking interface we can made such changes and if we will not like them we always may move back to flat file.

But on the other hand comments like "Do not touch - this is generated code" is something new to PHP and to me looks more like a T4 from .Net :)

To be fair this PR not solving problem with slugify configuration - no matter what we will do it will not work out of the box for absolute everyone and some configuration should be applied to match concrete locale.

As for looooooong rules array I do agree with you, just wondering why not just move rules to rulesets then they may be collapsed in any editor and there will be no need to add one more step with "compilation" of rules. But on the other hand having JSON is a cool thing just because we easily can provide same slugifier for front-end.

BTW arrayToString you can make such things in one line var_export - if we are speaking about generated code it does not matter how it is formatted - what is matter is that it will work always. (but that is just my thoughts)

So applying such changes is a matter of preferences, to me, until I can just get library and use it with all defaults without worrying about internal - its all ok - we should not break that probably in first place, all internal is just all about what we like and do not like in code

Regards

PS: One more thing with separate rules we have one problem, for example, imagine that I going to add Ukrainian, should I just add all letters or only that is not already there? How should I know that? You may say just to add letters that is not already there - but now I do need look at many files instead of one. On the other hand I can just add whole alphabet - but then we will make unwanted work with duplicating rules. What I mean, for example cyrillic letters in all languages are transliterated same way almost always, troubles are with some specific letters but it will be bad if to contribute some new rule people must do much more work than they should.

BTW rules are not changing in time, there is just additions off missing letters, and conflicting ones is a problem that is not solved still

Sorry for so long message

@florianeckerstorfer
Copy link
Member Author

Yeah, we're probably not going to find a one size fits all solution, because depending on your preferred language you want to have different transliterations. I dislike the current "solution" because users have to specifically overwrite the rules, that is, they have to know these rules. With this PR you could just say "prefer Turkish rules" and you do not need know what these actually mean.

Regarding the code generation part: do you think it would be better if we generate the full file, not only the rules array?

Thanks for pointing var_export out to me. I totally forgot that this function exists.

@mac2000
Copy link
Contributor

mac2000 commented Oct 26, 2015

I am totally agree with your point about benefits for end users of slugify it will be much simpler, and even more somewhere in the future slugify may look at current locale and choose right rules which may make it work out of the box for some people.

About code generation - personally I am ok with that just because I do see it in other languages, but can not say anything about overall PHP folks, its something new and may be confusing for someone. If the only reason we are moving rules out of main file - to make it more readable (but more complex on the other hand) then probably it is a way.

On the other hand there is other way to split rules and still make it all readable and easy to maintain with rulesets (there was commit for Esperanto if I remember correct) we can just move all rules into multidimensional array so all will work like now, and your idea with language will be implemented and there will no be need to extra compilation steps, but it is up to you to decide such thing

@OskarStark
Copy link
Contributor

why don't use yaml files instead of json

you have syntax highlighting and linting and you can use comments in the files

@mac2000
Copy link
Contributor

mac2000 commented Oct 26, 2015

@OskarStark I can say same about ini files :) its just a matter of preference, someone like yaml, someone json, probably there is cases when someone prefer xml. But there is no faster/easier way than plain php array (that is why they were used from beginning) but again it is too simple so many of us prefer to overengineer things. E.g. in case of yaml we will depend on its library, but for slugify which consists of two files one of which is interface - it is probably too much :)

@florianeckerstorfer
Copy link
Member Author

Yeah, I choose JSON because it is probably the most used one and it is built into PHP.

The reason why I split it into JSON files instead of keeping them in the class is that the list grows longer and longer and since we probably need to duplicate rules it would get out of hand. True, you can collapse in an editor, but there are also Diffs and it's getting hard to find the right place to put them.

@OskarStark
Copy link
Contributor

fine for me 👍

@florianeckerstorfer
Copy link
Member Author

Just started reviewing :)

:octocat: Sent from GH.

florianeckerstorfer pushed a commit that referenced this pull request Nov 2, 2015
Remove rules from Slugify and move them into files
@florianeckerstorfer florianeckerstorfer merged commit 64af0ea into cocur:2.0 Nov 2, 2015
This was referenced Nov 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants