Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate false positives #38

Merged
merged 2 commits into from
Oct 18, 2015
Merged

Eliminate false positives #38

merged 2 commits into from
Oct 18, 2015

Conversation

R-J
Copy link
Contributor

@R-J R-J commented Oct 1, 2015

The regex matched a lot of false positives since the word boundary is not set correctly and the dot is not escaped. For example this one /\bnus.edu.sg/ does not only match evil@nus.edu.sg but also
good@nus.edu.sg.com
nus-edu@sg.com
nus@edu.sgood.com
nusmedursg@good.net

I rearranged the code and eliminated the regex in favor of a simple comparison.

The regex matched a lot of false positives since the word boundary is not set correctly and the dot is not escaped. For example this one `/\bnus.edu.sg/` does not only match evil@nus.edu.sg but also 
good@nus.edu.sg.com
nus-edu@sg.com
nus@edu.sgood.com
nusmedursg@good.net

I rearranged the code and eliminated the regex in favor of a simple comparison.
@FGRibreau
Copy link
Owner

Maybe a fix of the regex in the first place would be a better tradeoff?
What do you think?

@R-J
Copy link
Contributor Author

R-J commented Oct 2, 2015

I'm only a hobbyist and chances are high that there might be other aspects that I simply don't know about. But my motivation for replacing the regex completely, is that I understand the purpose of a regex is to search for a pattern in a text while the task here is to match a string against a list.
I would say that preg_match() is the wrong tool for the job. A regex match is more complicated and more time consuming (at least that's what I would expect) than a simple array operation.
I would only favor the regex way if it was much more elegant than its alternatives, but to my opinion that's not the case here.

If you think of fixing the regex, I think you would have to put it like that $pattern = '/@'.str_replace('.', '\.', implode('$|@', array("0-mail.com",...))).'$/'; but I haven't spent much time in testing that.

@FGRibreau
Copy link
Owner

Here is what a friend of mine proposed (credits https://github.com/julienbreux ):

$mail = explode('@', $email);
if (filter_var($email, FILTER_VALIDATE_EMAIL) === false || count($mail) != 2) {
    return false;
} else {
    return !in_array(strtolower($mail[1]), array(...));
}

what do you think @R-J ?

@R-J
Copy link
Contributor Author

R-J commented Oct 5, 2015

I do not understand the purpose of the second condition count($mail). Is there a problem with filter_var giving false positives for mail addresses that do not have the form $string1@$string2?

  1. If that's the case, then that additional test would make sense.
  2. If not,
    2.1 a false email would already fail at the filter condition and the count() would not be evaluated
    2.2 valid mail address would always have count = 2 = true

So in both cases of 2. the count() part would be superfluous.

Only if there is such a problem with filter_var, this check might be useful.
I've looked at the php reference and found this: http://php.net/manual/de/function.filter-var.php#112492
The user states that "this is v@lid!"@example.com could be a valid mail address, but the filter function doesn't accept it. So the second part of the first if clause (count($mail)) would cause a return false although the mail address is valid, if the filter function would not have already stopped that string...

Please excuse me: I tend to talk too much :-)
In short my conclusion is that the count($mail) is not needed.

And if it is really not needed, I would put the explode() after the filter function since that call would be unneeded if the filter function already eliminates a malformed mail address. You only have to explode $email if you compare the last part with the list.


So now that I read that "this is v@lid!"@example.com would be a possible mail address, the code should better read

  if(filter_var($email, FILTER_VALIDATE_EMAIL) === false) {
    return false;
  }
  $mail = explode("@", $email);
  return !in_array(strtolower(end($mail)), array(...));
}

Otherwise it might be possible to trick this function with "evil@msn.com"@somespamhoster.com

Since the $mail variable is only needed once in the code above you can consider using directly return !in_array(strtolower(end(explode("@", $email))), array(...));. You have to weight readability against performance here.

Thinking about it some time longer I think it is a good habit to keep small libraries as fast as they can be, if they still are readable. Since we are talking about less than half a dozen lines of code I really would drop the $mail variable.

@FGRibreau
Copy link
Owner

👍 @R-J could you update your PR with return !in_array(strtolower(end(explode("@", $email))), array(...)); ?

And I will merge it then :)

FGRibreau added a commit that referenced this pull request Oct 18, 2015
Eliminate false positives
@FGRibreau FGRibreau merged commit ada8ab0 into FGRibreau:master Oct 18, 2015
@FGRibreau
Copy link
Owner

@R-J merged! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants