Check top-level domain suggestion against domains #89

jdpopkin · 2014-11-07T04:41:19Z

After correcting the top-level domain, the suggestion is sometimes
close enough to a full domain that accepting the suggestion results in
another bad email address. A previous commit tried to solve this
problem by changing the thresholds at which we suggest top-level
domains and full domains, but some edge cases remained. This
commit cuts out the middleman by directly checking for a full domain
suggestion before returning a top-level domain suggestion.

After correcting the top-level domain, our suggestion is sometimes close enough to a full domain that accepting the suggestion results in another bad email address that prompts a suggestion. A previous commit tried to solve this problem by changing the thresholds at which we suggest top-level domains and full domains, but some edge cases remained. This commit cuts out the middleman by directly checking for a full domain suggestion before returning a top-level domain suggestion.

jdpopkin · 2014-11-07T04:42:14Z

sdfg@yahou.co currently gets corrected to sdf@yahou.com, which in turn gets corrected to sdf@yahoo.com. This should fix that.

weilu · 2015-02-18T01:25:08Z

@derrickko currently we have:

domainThreshold: 3,
secondLevelThreshold: 2,
topLevelThreshold: 2,

I think it's confusing and not scalable. I would like to merge this solution and use a single threshold instead. Thoughts? cc @kevinhughes27. Potentially related: #96, #97

kevinhughes27 · 2015-02-18T03:33:19Z

I like the idea proposed in this PR - I don't think removing the individual thresholds is a good idea. Users of this lib don't have to change them if they don't need to but they are quite useful for tweaking.

Btw I didn't submit this in my PR from the Shopify changes but we use much tighter thresholds (2, 1.5, 1,5) IIRC

weilu · 2015-02-18T13:24:09Z

Users of this lib don't have to change them if they don't need to but they are quite useful for tweaking

What's the basis for tweaking? For example, how did you decide 2, 1.5, 1.5 are the best thresholds? Trail and error? I have trouble even trying to come up with useful default values for these thresholds to avoid future issues like #96 and #97.

Without digging into the code, what do the threshold mean? Do they mean anything by themselves or are they only meaningful when compared among themselves?

kevinhughes27 · 2015-02-18T19:08:45Z

the thresholds are how close a string needs to be to be considered a match and thus be corrected to the matching string (in this case domains or parts of domains). They are pretty important and being able to tweak them is also important let me give an example:

I ran a pretty detailed experiment with mailcheck on our checkout page and collected well over a million data points. I measured 3 things - checkout successes, mailcheck suggestions shown and mailcheck suggestions clicked. Right away we noticed that unclicked mailcheck suggestions were related to decreased checkout successes - so in our environment its important to be conservative with suggestions. After noticing this I played with the thresholds and the default domain lists to bring down the number of unclicked suggestions (I collected the original email address if it was unclicked so I could iterate). That is how I arrived at 2, 1.5, 1.5 for thresholds. Our users going through checkout are fairly international so our domain list is large but if for example you are adding mailcheck to a signup page and most of your users will be other tech enthusiasts then your setup will be very different.

In order to avoid the issues you mentioned we should reduce the default thresholds - but you should still be experimenting with the setup in your environment. Maybe @derrickko can elaborate but my take away from looking through previous PRs (most PRs that alter the default config are closed unless there is a really good reason) is that Mailcheck is intended as more of a framework and less of a plug and play solution. That being said lets tighten the thresholds a bit so the default config is more useful, I should have included this in my original PR here.

ferreiro · 2022-10-23T00:50:06Z

Hey @ kevinhughes27 your explanation was amazing. are u still using this library nowadays?

It's no longer maintained, so I rewrote it entirely and update it for 2022 and onward: https://github.com/ZooTools/email-spell-checker.

Would love to get your thoughts and see if there is something we can improve.

💙 Written in TypeScript and removed jQuery
✅ Fixed a couple of bugs like this or this
🚀 Reduced bundle size to <2KB. 
✨ Update TLDs (69+) and added modern startup domains (like .io, .so, .xyz or .dev)
🙏 Implemented suggestions that people made in this repo that were ignored.

Link: https://github.com/ZooTools/email-spell-checker

Come check it out and give it a ⭐️ for the effort.

derrickko mentioned this pull request Mar 18, 2015

Publish to NPM #94

Closed

kevinhughes27 mentioned this pull request Mar 20, 2015

update default settings #102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check top-level domain suggestion against domains #89

Check top-level domain suggestion against domains #89

jdpopkin commented Nov 7, 2014

jdpopkin commented Nov 7, 2014

weilu commented Feb 18, 2015

kevinhughes27 commented Feb 18, 2015

weilu commented Feb 18, 2015

kevinhughes27 commented Feb 18, 2015

ferreiro commented Oct 23, 2022

Check top-level domain suggestion against domains #89

Are you sure you want to change the base?

Check top-level domain suggestion against domains #89

Conversation

jdpopkin commented Nov 7, 2014

jdpopkin commented Nov 7, 2014

weilu commented Feb 18, 2015

kevinhughes27 commented Feb 18, 2015

weilu commented Feb 18, 2015

kevinhughes27 commented Feb 18, 2015

ferreiro commented Oct 23, 2022