smtp: Support internationalised characters in envelope addresses #4892
This patch set fixes support for internationalised characters in mailbox addresses, which effects the
As such, this PR not only fixes issue #4828 but also allows UTF-8 to be used "legally" in other parts of an SMTP message.
The log from issue #4828 shows that the server supports the
I have utilised the existing IDN conversion code from
When I first opened this PR I had the following questions. I have kept them here for historical reasons but I believe we have resolved them now.
I don't think so. This is about IDN in the domain names, not what the SMTP server supports.
Update: did some more thinking and I'm no longer really sure what to think. We either need to test it out or find this documented to learn what the answer is.
Yes I think so. IDN is only for the host name and I would be a bit scared of what could happen by accident if we encode more than necessary.
I'd say should rather than need, but yes probably?
Again should rather than need but yes if the host name in the protocol, there will probably be users with IDN names who would like to use that name rather that the punycoded version.
Yep - that's this evening's job. I've dug out the HTTP IDN test cases (165, 1034, 1035, 1448, 2046 and 2047) which will hopefully help me out with the non-ASCII stuff ;-)
Just waiting for my munchies to be delivered by Ocado (other supermarket delivery services are available) before I get stuck in ;-)
…ings This avoids the duplication of strings when the optional AUTH and SIZE parameters are required. It also assists with modifications in curl#4892.
This avoids the duplication of strings when the optional AUTH and SIZE parameters are required. It also assists with the modifications that are part of curl#4892.
I think this is about there and ready for public scrutiny:
My thoughts were similar when I started this journey and I must admit I am still in two minds even now, having spent a a fair amount of time with RFC-6531.
My understanding is that if the client wants to transmit UTF-8 in either a) the local part of the mailbox in an envelope command, or b) any part of a mailbox address used in one or more headers then the SMTPUTF8 extension is required.
However if the client wants to transmit UTF-8 in only the host part of mailboxes in an envelope command (but not in any header) then the client may choose to convert these, at its discretion, using IDN. This is detailed in  and I think is where I started out.
You could argue that with client side SMTPUTF8 support there is no need to convert the host part to an A-label as it could be sent as a U-label - See .
Note: The server side extension is present in the logs from #4828 so in summary I think our only issue there is we didn't advertise SMTPUTF8 support in the MAIL command. If we had of done there the server should have accepted that address.
Yep - agreed, I have made changes to implement this.
I've ended up implementing this for the MAIL command (both FROM and AUTH parameters), RCPT TO command and VRFY command. I've also added the advertisement of SMTPUTF8 to the EXPN command so the server my respond with UTF-8 based addresses.
I think I'll leave IMAP for another rainy day.
 RFC-6531 Section 3.1 Paragraph 3 states:
However, the paragraph then goes on to state:
So I was tempted to implement support for ASCII based recipients (ie. The local part of a mailbox) on UTF-8 hosts using the IDN conversion to A-label.
 RFC-6531 Section 3.2 Paragraph 2 states:
…eter Non-ASCII host names will be ACE encoded if IDN is supported.
…meter Support the SMTPUTF8 extension when sending mailbox information in the FROM parameter. Non-ASCII domain names will be ACE encoded if IDN is supported. Reported-by: ygthien on github Fixes #4828
Simply notify the server we support the SMTPUTF8 extension if it does.
Are you able to take a look at my Windows builds and suggest a way forward please?
I'm trying to update/add some tests that output UTF-8 characters which is then breaking those tests on Windows :(