Skip to content

Conversation

@wojcikstefan
Copy link
Member

@wojcikstefan wojcikstefan commented Apr 10, 2017

Fixes #1228

Previously, email addresses containing unicode characters didn't work at all. Now, unicode domains are supported out of the box and utf8 validation on the user parts can be explicitly enabled. This PR also adds validation of IP-based domain parts (e.g. "user@[127.0.0.1]") and introduces support for whitelisting otherwise invalid domains (e.g. "root@localhost").

Support for unicode usernames is still somewhat limited in many packages, so that option should be enabled only after a thorough inspection of the other parts of the system. The domain whitelist is empty and the IP validation is disabled by default primarily to be consistent with the past behavior, and because many applications may choose to reject such addresses.

Performance

TL;DR validation of regular ascii-only email addresses is slightly slower than previously, but still fast enough for vast majority of use cases (< 5us). Validation of unicode email addresses is slower by an order of magnitude (~75-85us), but works as expected and is only slow for these less-common addresses.

On master:

In [2]: ef = EmailField()

In [3]: %timeit ef.validate('wojcikstefan@gmail.com')
The slowest run took 7.05 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.13 µs per loop

On this branch:

In [14]: %timeit ef.validate('wojcikstefan@gmail.com')
The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.57 µs per loop

In [15]: %timeit ef.validate(u'wojcikstefan@gmail.com')
The slowest run took 5.52 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.41 µs per loop

In [16]: %timeit unicode_ef.validate('wojcikstefan@gmail.com')
The slowest run took 5.69 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.69 µs per loop

In [17]: %timeit ip_ef.validate('wojcikstefan@gmail.com')
The slowest run took 6.07 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.3 µs per loop

In [18]: %timeit ip_ef.validate(u'user@[192.168.0.1]')
The slowest run took 10.05 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.68 µs per loop

In [19]: %timeit unicode_ef.validate(u'Dörte@Sörensen.example.com')
The slowest run took 51.38 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 77.4 µs per loop

In [20]: %timeit ef.validate(u'ascii@Sörensen.example.com')
10000 loops, best of 3: 82.3 µs per loop

@wojcikstefan wojcikstefan merged commit 466935e into master Apr 16, 2017
Kami added a commit to StackStorm/mongoengine that referenced this pull request Aug 3, 2018
usage.

See MongoEngine#1832 for details.

Revert "Unicode support in EmailField (MongoEngine#1527)"

This reverts commit 466935e.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants