Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check top-level domain suggestion against domains #89

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jdpopkin
Copy link
Contributor

@jdpopkin jdpopkin commented Nov 7, 2014

After correcting the top-level domain, the suggestion is sometimes
close enough to a full domain that accepting the suggestion results in
another bad email address. A previous commit tried to solve this
problem by changing the thresholds at which we suggest top-level
domains and full domains, but some edge cases remained. This
commit cuts out the middleman by directly checking for a full domain
suggestion before returning a top-level domain suggestion.

After correcting the top-level domain, our suggestion is sometimes
close enough to a full domain that accepting the suggestion results in
another bad email address that prompts a suggestion. A previous commit
tried to solve this problem by changing the thresholds at which we
suggest top-level domains and full domains, but some edge cases
remained. This commit cuts out the middleman by directly checking for
a full domain suggestion before returning a top-level domain
suggestion.
@jdpopkin
Copy link
Contributor Author

jdpopkin commented Nov 7, 2014

sdfg@yahou.co currently gets corrected to sdf@yahou.com, which in turn gets corrected to sdf@yahoo.com. This should fix that.

@weilu
Copy link
Contributor

weilu commented Feb 18, 2015

@derrickko currently we have:

domainThreshold: 3,
secondLevelThreshold: 2,
topLevelThreshold: 2,

I think it's confusing and not scalable. I would like to merge this solution and use a single threshold instead. Thoughts? cc @kevinhughes27. Potentially related: #96, #97

@kevinhughes27
Copy link
Contributor

I like the idea proposed in this PR - I don't think removing the individual thresholds is a good idea. Users of this lib don't have to change them if they don't need to but they are quite useful for tweaking.

Btw I didn't submit this in my PR from the Shopify changes but we use much tighter thresholds (2, 1.5, 1,5) IIRC

@weilu
Copy link
Contributor

weilu commented Feb 18, 2015

Users of this lib don't have to change them if they don't need to but they are quite useful for tweaking

What's the basis for tweaking? For example, how did you decide 2, 1.5, 1.5 are the best thresholds? Trail and error? I have trouble even trying to come up with useful default values for these thresholds to avoid future issues like #96 and #97.

Without digging into the code, what do the threshold mean? Do they mean anything by themselves or are they only meaningful when compared among themselves?

@kevinhughes27
Copy link
Contributor

the thresholds are how close a string needs to be to be considered a match and thus be corrected to the matching string (in this case domains or parts of domains). They are pretty important and being able to tweak them is also important let me give an example:

I ran a pretty detailed experiment with mailcheck on our checkout page and collected well over a million data points. I measured 3 things - checkout successes, mailcheck suggestions shown and mailcheck suggestions clicked. Right away we noticed that unclicked mailcheck suggestions were related to decreased checkout successes - so in our environment its important to be conservative with suggestions. After noticing this I played with the thresholds and the default domain lists to bring down the number of unclicked suggestions (I collected the original email address if it was unclicked so I could iterate). That is how I arrived at 2, 1.5, 1.5 for thresholds. Our users going through checkout are fairly international so our domain list is large but if for example you are adding mailcheck to a signup page and most of your users will be other tech enthusiasts then your setup will be very different.

In order to avoid the issues you mentioned we should reduce the default thresholds - but you should still be experimenting with the setup in your environment. Maybe @derrickko can elaborate but my take away from looking through previous PRs (most PRs that alter the default config are closed unless there is a really good reason) is that Mailcheck is intended as more of a framework and less of a plug and play solution. That being said lets tighten the thresholds a bit so the default config is more useful, I should have included this in my original PR here.

@derrickko derrickko mentioned this pull request Mar 18, 2015
@ferreiro
Copy link

Hey @ kevinhughes27 your explanation was amazing. are u still using this library nowadays?

It's no longer maintained, so I rewrote it entirely and update it for 2022 and onward: https://github.com/ZooTools/email-spell-checker.

Would love to get your thoughts and see if there is something we can improve.

  • 💙 Written in TypeScript and removed jQuery
  • ✅ Fixed a couple of bugs like this or this
  • 🚀 Reduced bundle size to <2KB.

  • ✨ Update TLDs (69+) and added modern startup domains (like .io, .so, .xyz or .dev)
  • 🙏 Implemented suggestions that people made in this repo that were ignored.

Link: https://github.com/ZooTools/email-spell-checker

Come check it out and give it a ⭐️ for the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants