Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support an EMAIL data type which implements correct address validation #24437

Open
rmloveland opened this issue Apr 2, 2018 · 5 comments
Labels
A-sql-datatypes SQL column types usable in table descriptors. C-wishlist A wishlist feature. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@rmloveland
Copy link
Collaborator

rmloveland commented Apr 2, 2018

Based on a quick google search it seems that this is not a data type that is implemented in other databases.

There is an apparently not-much-used (28 stars) Postgres extension: pgemailaddr.

However it seems like a feature that is de facto in use by many applications that they hand-roll and almost always get wrong since it is very difficult to implement a correct email validator (up to and including checking MX records depending on how thorough one wants to be).

For example, see this StackOverflow question about the best way to store an email in Postgres.

The accepted response has two parts: (1) uses a pretty crazy regex, and the "more correct" (2) uses plperlu/links to a Perl library, Email::Valid, which is maintained by the CTO at Fastmail and presumably is in use there. My assumption is that it's pretty battle-tested if Fastmail is using it. The library not only determines whether an email address is valid, it also (optionally) checks whether a mail host exists for the domain.

Are you currently using any workarounds to address this issue?

I am not. However I would wager that many applications that use databases are doing workarounds due to the lack of this data type.

Jira issue: CRDB-5767

@justinj
Copy link
Contributor

justinj commented Apr 3, 2018

This is a cool idea! What do you think is the appropriate extent for validation? One possibility for implementing this without supporting an entirely new data type could be to create a new builtin valid_email or similar, and then users can include a CHECK constraint on their table:

CREATE TABLE users (
  ...
  email STRING CHECK (valid_email(email)),
  ...
);

though I might have to defer to the wisdom I've heard before that I'm not sure there's much benefit gained here over just doing

CREATE TABLE users (
  ...
  email STRING CHECK (email LIKE '%@%'),
  ...
);

given that foo@bar.com is a perfectly valid email that I give to sites all the time :)

@rmloveland
Copy link
Collaborator Author

What do you think is the appropriate extent for validation?

In the linked SO answer, they do something like:

CREATE EXTENSION citext;
CREATE DOMAIN email AS citext
  CHECK ( value ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' );

which seems pretty good without going crazy checking MX records and such (though I am not an email expert so IDK).

If you want to lose your mind you can look at the $RFC822PAT validating regex in the Perl library here, which supposedly is even better (though perhaps for users, not implementors :-) ).

Based on personal experience a lot of marketing email software does not seem to do anywhere near this level of validation, and as a result I get a lot of e-commerce email (including purchase receipts with PII, even!) addressed to other Richard Lovelands who have similar (but not the same) email addresses. I suspect they are doing something like the (email LIKE '%@%') validation you mentioned and possibly even stripping out "special" characters using their own special-sauce validation, hence the incorrect delivery to me.

A data type or function would not necessarily prevent such sloppiness on the application side, but could make doing something closer to the right thing easier.

In any case I definitely don't want to take away your foo@bar.com :-)

@jordanlewis jordanlewis added this to the Later milestone Apr 3, 2018
@knz knz added A-sql-datatypes SQL column types usable in table descriptors. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels May 12, 2018
@knz knz added this to Triage in (DEPRECATED) SQL Front-end, Lang & Semantics via automation May 12, 2018
@knz knz moved this from Triage to Feature requests / pie-in-the-skie in (DEPRECATED) SQL Front-end, Lang & Semantics May 14, 2018
@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@jordanlewis jordanlewis moved this from Triage to Lower priority backlog in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 25, 2019
@asubiotto asubiotto moved this from Lower priority backlog to [TENT] SQL Features in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 3, 2020
@github-actions
Copy link

github-actions bot commented Jun 7, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz knz added C-wishlist A wishlist feature. and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity labels Jun 7, 2021
@knz knz added this to Triage in SQL Sessions - Deprecated via automation Jun 7, 2021
@rmloveland
Copy link
Collaborator Author

I still think this is a valid feature request. lots of apps are out there still getting this wrong in 2021

@rafiss
Copy link
Collaborator

rafiss commented Jun 16, 2021

I'm fine to leave this open, but I am not entirely convinced we'd want this.

Speaking personally, I've always felt that the question "is this a valid email address?" is almost always asking one of two things:

  • "Is this email address tied to a real person?" -- then the best way to answer the question is to send an email verification link.
  • "Did the email address have any typos?" -- then this should be solved in form validation; that is, application logic much before the database is involved.

In other words, email addresses are a business logic concern, so the business logic should make sure the addresses satisfy whatever conditions are needed for the app to work.

That's just my own 2c though. In order for us to prioritize this, we'd need to know what problems are "invalid email addresses" causing for users? Then we can decide if those problems are best solved with this database feature.


I get a lot of e-commerce email (including purchase receipts with PII, even!) addressed to other Richard Lovelands who have similar (but not the same) email addresses.

I relate to this problem! My feeling has always been that I am getting these emails because my equally-named counterparts make typos in their email address when they sign up for things or enter their address anywhere.

@rafiss rafiss moved this from Triage to Longer term backlog in SQL Sessions - Deprecated Jun 16, 2021
@jlinder jlinder added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-datatypes SQL column types usable in table descriptors. C-wishlist A wishlist feature. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
(DEPRECATED) SQL Front-end, Lang & Se...
  
Feature requests / pie-in-the-skie
SQL Sessions - Deprecated
Longer term backlog
Development

No branches or pull requests

7 participants