Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation::_pattern['hostname'] matches dotted "domain" names, not "hostnames". #2477

Closed
beporter opened this issue Dec 12, 2013 · 24 comments
Closed

Comments

@beporter
Copy link
Contributor

Ref: Utility/Validation.php#L42.

There is no requirement that a hostname or an email address's domain part use a fully qualified, internet-resolvable DNS name. It's therefore incorrect behavior to reject a hostname like mylocalserver that does not contain a dot. This also has implications (at least) for Validation::email(), which incorrectly rejects addresses like beporter@localhost-- an address that is valid per RFC 2822.

The bottom line is that it's not appropriate to validate "hostnames" solely as "domain" names. I put together some samples using the current regex (link corrected) to illustrate. I feel like the argument though is really whether Cake is better off following common (but incorrect) validation practices, or following the letter of the RFC(s) at the expense of causing incompatibility with other apps and systems that aren't validating correctly.

PS: This post came about because I put 'from' => 'beporter@localhost' in my Config/email.php and got an error even though mail sent to beporter@localhost would have been properly routed and delivered by my local MTA.

@jameswatts
Copy link
Contributor

Unless it changed, this is the regexp used for FILTER_SANITIZE_EMAIL with the filter_var() function in PHP.

/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/iD

Surely building on these existing checks in this extension would be more coherent with the language itself.

As per the docs:

The filter extension is enabled by default as of PHP 5.2.0. Before this time an experimental PECL extension was used, however, the PECL version is no longer recommended or updated.

So it fits within the CakePHP 2.x requirements constraints. Wheels don't need reinventing.

@beporter
Copy link
Contributor Author

The Cake Validation class doesn't seem to be using that for email checks (at least not in the master branch.) It's using the ::_pattern['hostname'] regex for the domain part, which is inadequate at least in the context of email addresses. This pattern is also used in ::url(), which may legitimately be depending on validating domain names instead of hostnames.

In any case, are you suggesting that the solution is to switch to filter_var()? Or to the regex underlying it? I wasn't looking forward to modifying the existing regex, but I bet I can do a PR based on this.

@jameswatts
Copy link
Contributor

Go for it, that's exactly what I'm suggesting, build upon the tools already exposed via the implementation language (PHP). There are also FILTER_VALIDATE_URL and FILTER_VALIDATE_IP filters.

@beporter
Copy link
Contributor Author

Targeting 2.5 I assume then?

@markstory
Copy link
Member

This has come up in the past. Last time this came around we decided to go with the more pragmatic validation vs. more theoretically correct validation. For example on a user registration system root@localhost is almost never what you want to accept.

I think the majority of email addresses people want to validate are publicly accessible ones. While internal mailboxes are very useful, I don't think they are the dominant use case for CakePHP.

@beporter
Copy link
Contributor Author

I don't disagree at all, but then core components like CakeEmail need to somehow allow for special-casing their developer-facing validation. My Config/email.php with 'from' => 'beporter@localhost' is correct and valid, but Cake won't let me do it or override it (that I'm aware of), and I'm not inclined to adjust my entire local network and MTA's configuration just so I can append .local or something to the email address in order to satisfy Cake.

@jameswatts
Copy link
Contributor

@beporter in light of the feedback from @markstory I would think a strategy around the validation (like you suggest) would be in order, to switch out the underlying checks performed. That, or a way to extend the validation system, which currently doesn't exist beyond extending the Validation utility class and using your modified version - http://book.cakephp.org/2.0/en/core-utility-libraries/app.html#overriding-classes-in-cakephp

@jippi
Copy link
Contributor

jippi commented Dec 12, 2013

@beporter
Copy link
Contributor Author

@markstory: The implementation of HTML5 form validation for <input type="email"> in Chrome, Firefox, etc. disagrees with you. The browser vendors all seem to think allowing non-dotted domain-parts is acceptable. In fact I find it contradictory that Cake's FormHelper emits code that is intended to trigger this (more proper) validation at the browser level, thereby allowing a user to enter "user@host" into a <form>, but that Cake would then reject that same input on POST when it hit Validation::email() (the same thing that caused FormHelper to emit type=email in the first place).

I recognize the case you are making, and I even agree with it in an end-user context, but I continue to maintain that the same artificially-limited validation should not be forced upon Cake developers. I am at a loss to suggest a reasonable compromise when it comes to CakeEmail though, beyond adding an additional option to Validation::email() to control the strictness of the check. ("Tight" and incorrect by default for end user cases, "Loose" and correct by request for internal usage such as that in CakeEmail.)

@jameswatts: I appreciate the suggestion, but as it stands I can either:

  • Maintain an entire (slightly modified) override copy of Validation.php in my Lib/Utility/ folder,
  • Or I can subclass both Validation and CakeEmail, minimally update the latter to use the former, and then update my app to use the result.
  • Or it looks like as of 2.4, I can use CakeEmail::emailPattern() in each individual case to override the pattern. (It's unfortunate I can't do that from the email.php config file.)

None of these fill me with joy, I'm afraid.

Thanks for the time, gentlemen. I appreciate it.

@markstory
Copy link
Member

I am at a loss to suggest a reasonable compromise when it comes to CakeEmail though, beyond adding an additional option to Validation::email() to control the strictness of the check.

The method you mentioned in CakeEmail was added exactly for this kind of situation. Exposing it to the config file might be nice.

We can't really force the browsers to change, as the RFC does cite user@host as valid, I worry about relaxing the rules in CakePHP and getting more bug reports about how the email validation allows broadly invalid data through.

@jameswatts
Copy link
Contributor

@beporter maybe a PR to allow it via configuration here -> https://github.com/cakephp/cakephp/blob/master/lib/Cake/Network/Email/CakeEmail.php#L1185

@markstory
Copy link
Member

What about adding a new validation method that does more relaxed/rfc compliant validation? I don't really like the idea of adding more boolean parameters to a method that already has one, it makes reading code harder. A separate method would allow people to opt into the behaviour they want.

@beporter
Copy link
Contributor Author

Either way is fine with me. My main concern is that CakeEmail is essentially an MUA and if you would (rightly) object to Mail.app stopping you from sending an email to someone@localserver then it's right to object to CakeEmail doing it. CakeEmail should be updated to validate against the RFCs (using this proposed additional method) and not the "user-friendly" subset provided by the existing Validation::email().

This will mean that the default user-facing validation will remain as-is, safe-guarding the majority of addresses passed into CakeEmail, but at the same time would resolve the issue both when using local domain-parts in Config/email.php and on demand via CakeEmail::to() or ::from() (for example).

If we're all in agreement, I'll work on a PR. I am also open to appropriate names for the new Validation method, as adjectives such as "strict" imply more restrictions even though in this case, "strict matching" represents fewer restrictions. There are similar issues with, "_permissive", "_relaxed", "_accurate", etc. Maybe something like email_per_rfc()?

@markstory
Copy link
Member

I think having CakeEmail accept addresses like someone@localserver is quite reasonable. For cron jobs and internal tools, those types of addresses are often valid and deliverable.

As for a method name, relaxedEmail or rfcEmail works for me.

@lorenzo
Copy link
Member

lorenzo commented Dec 17, 2013

I would actually remove the use of the Validation class from CakeEmail and just use filter_var() for validating the addresses in CakeEmail. Personally, I don't see the need for having another method in the Validation class

@ADmad
Copy link
Member

ADmad commented Dec 17, 2013

@lorenzo That sound's pretty reasonable 👍.

@markstory
Copy link
Member

That works for me, we should continue to allow custom patterns as well though. They are important for the Japanese community, as there are a few isps in japan that have non-compliant addresses that work.

@ADmad
Copy link
Member

ADmad commented Dec 17, 2013

Looking at the CakeEmail code I see you can already specify custom regex for email validation using emailPattern key in config file or using CakeEmail::emailPattern() method. So @beporter you can just use either of them and your problem will be solved right away.

@beporter
Copy link
Contributor Author

@ADmad That would apply if the project was on 2.4, yes. It is not (yet), although this thread is starting to become more of an investment than the upgrade process. (I kid!)

It also sidesteps the fact that the validation is simply incorrect. I happily concede @markstory's point about the benefits of being more restrictive with user-facing email validation, but CakeEmail shouldn't be using that same (incorrect) subset of allowable addresses internally. A custom regex is only a workaround for an incorrect core implementation.

@josegonzalez (corrected) recently emailed us folks at @loadsys soliciting feedback for the future of Cake. This I feel is one small suggestion I can offer so far. And since neither the custom regex, nor any PR I might submit here will help me with my current project directly, I hope that my argument can be taken as reasonably unbiased.

@ADmad
Copy link
Member

ADmad commented Dec 17, 2013

@ADmad That would apply if the project was on 2.4, yes. It is not (yet), although this thread is starting to become more of an investment than the upgrade process. (I kid!)

So upgrade to 2.4 :)

It also sidesteps the fact that the validation is simply incorrect. I happily concede @markstory's point about the benefits of being more restrictive with user-facing email validation, but CakeEmail shouldn't be using that same (incorrect) subset of allowable addresses internally. A custom regex is only a workaround for an incorrect core implementation.

It's not incorrect, it's intentionally restrictive by default. The custom regex is not a workaround but rather a feature to customize it to suit your needs. As @markstory pointed out few japanese ISPs actually use non-compliant email ids. So you can see even following RFC is not always enough to please everybody.

@lorenzo recently emailed us folks at @loadsys soliciting feedback for the future of Cake. This I feel is one small suggestion I can offer so far. And since neither the custom regex, nor any PR I might submit here will help me with my current project directly, I hope that my argument can be taken as reasonably unbiased.

Your feedback is very welcome and based of that we might probably change the default validation inside CakeEmail in 2.5. As for your current problem we can't make retrospective changes to whatever cake version you are using can we? If you update to 2.4 its just a matter of adding a single line in your email config file with the custom regex to solve your problem.

@beporter
Copy link
Contributor Author

I did investigate upgrading to 2.4 early on but in this case it seems for some reason to be non-trivial and will require appropriate planning. That is in no way your problem though and is orthogonal to this conversation. As I said; I'm here because I think Cake needs repair, not because I'm going to get a solution to my immediate problem out of it.

I'm happy to do the PR for the CakeEmail changes to filter_var for 2.5 if that still seems like a reasonable forward-looking change. I want to make sure there is consensus on that before I waste my time, or yours, though. Thanks guys.

@markstory
Copy link
Member

I think filter_var + an override so the japanese people can get their non-compliant addresses to work would be great.

@ADmad
Copy link
Member

ADmad commented Dec 17, 2013

@beporter Guess what filter_var('beporter@localhost', FILTER_VALIDATE_EMAIL); returns.

@beporter
Copy link
Contributor Author

I wasn't the one that suggested filter_var as a fix, just for reference. Seeing as @ADmad's already done the PR, I think we're done here. Thanks again guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants