Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From is null or has character set issues #62

Open
537mfb opened this issue Mar 30, 2012 · 25 comments
Open

From is null or has character set issues #62

537mfb opened this issue Mar 30, 2012 · 25 comments

Comments

@537mfb
Copy link

537mfb commented Mar 30, 2012

Hi

First of all, thanks for sharing this.

I have found two issues with the From object as follows:

If from header ahs no display name (is in the form some@address.com, the messages From object is NULL. I get around this with the following piece of code:

name = "";
addr = "";
if (msg.Value.From != null)
{
name = msg.Value.From.DisplayName;
addr = msg.Value.From.Address;
}
else
{
string[] tok = msg.Value.Headers["From"].RawValue.Split(new string[] { "<", ">" }, StringSplitOptions.RemoveEmptyEntries);
addr = tok[0];
if (tok.Length == 2)
name = tok[1];
}
if (name.CompareTo(string.Empty) == 0)
{
string[] tokens = addr.Split('@');
name = tokens[0].Replace('.', ' ').Replace('-', ' ').Replace('_', ' ');
}

this workd well so far

Another issue i have found is the character set. I didn't even know the was possible but apparently some mail boxes do allow accented characters in mail addresses - wich causes issues on your library since the address Taduções@sutherland.theukhost.net comes back as Tradu??es@sutherland.ukhost.net.

You library doesn't handle well special characters in the address.

regards
Luís Rodrigues

@537mfb
Copy link
Author

537mfb commented Mar 30, 2012

Just noticed that the mail address coming back with strange characters also suffers from no displayname (From is null for lack of display name and i am getting it from header) - so am not sure if that's is an issue or not

I mean - maybe you are addressing the character set issue already, just not the NULL From issue, and i am getting weird characters because i go directly to headers["From"]

@piher
Copy link
Contributor

piher commented Mar 31, 2012

Hi,
You should take a look at issues #61, #54 and #48

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

@piher - None of those accounts for the ?? characters in the address (they only mention subject and body) and i seem to be the only one pointing out that the cause of FROM beeing NULL is that there is no display name in the header['From'] and so the string is not in the format that is expected (name, address).

It's actually in the form (address)

@piher
Copy link
Contributor

piher commented Apr 2, 2012

Have you tried using #54 (comment)
There are still bugs but the code is supposed to handle international headers.

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

I must be missing something cause if i replace my getmessages with that one in ImapClient.cs, i get a lot of error messages:

1 - in line StringBuilder body = new StringBuilder(); i get - A local variable named 'body' is already defined in this scope - that's an easy fix though - change name from body to something else

2 - there are 4 different lines using Utilities.LastIndexOfArray and Utilities.IndexOfArray and on those lines i get - AE.Net.Mail.Utilities does not contain a definition for 'MethodNameHere' - i get these on both and i downloaded AE.Net.Mail last wednesday so am pretty sure it's the last version so far

@piher
Copy link
Contributor

piher commented Apr 2, 2012

You must replace the whole getMessages method in ImapClient.
And the method indexof and lastindexof are just methods that I created for this specific purpose, you'll find them if you read a fex msgs up in the thread.

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

i did replace the all getmessages method

will look closer at that thread for the other methods - thanks

@piher
Copy link
Contributor

piher commented Apr 2, 2012

Okay, I may have left some old variables then.
Anyways, the method needs some clean up to be done, there are unused variables from my previous tests...

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

Re-done the GetMessages replacement and added those 2 methods to the Utilities class.
My first replace of GetMessages must have had some oddities cause now the body issue is gone.

Now instead of ? inside a black lozenge, i get a plain ? in the address i mentioned

some change but not there yet

@piher
Copy link
Contributor

piher commented Apr 2, 2012

Could you show me the raw "from:....." header in the email and some of the text of the headers that surround it ?
We need to know if the regex matches it.

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

Here is the value in raw - notice the ?? making it tradu??es instead of traduções and tradu??o instead of tradução - including in the subject and body

Delivered-To: tt.tradutores@gmail.com
Received: by 10.182.236.42 with SMTP id ur10csp209607obc;
Fri, 30 Mar 2012 02:55:19 -0700 (PDT)
Received: by 10.180.95.74 with SMTP id di10mr8521215wib.1.1333101317951;
Fri, 30 Mar 2012 02:55:17 -0700 (PDT)
Return-Path: fheleno1@sutherland.theukhost.net
Received: from master.multilingues.eu ([213.175.194.88])
by mx.google.com with ESMTPS id s1si1947410wiy.19.2012.03.30.02.55.17
(version=TLSv1/SSLv3 cipher=OTHER);
Fri, 30 Mar 2012 02:55:17 -0700 (PDT)
Received-SPF: neutral (google.com: 213.175.194.88 is neither permitted nor denied by best guess record for domain of fheleno1@sutherland.theukhost.net) client-ip=213.175.194.88;
Authentication-Results: mx.google.com; spf=neutral (google.com: 213.175.194.88 is neither permitted nor denied by best guess record for domain of fheleno1@sutherland.theukhost.net) smtp.mail=fheleno1@sutherland.theukhost.net
Received: from 91.186.0.106
by master.multilingues.eu with esmtps (TLSv1:AES256-SHA:256)
(Exim 4.69)
(envelope-from fheleno1@sutherland.theukhost.net)
id 1SDYY6-0002i1-W2
for ttm@multilingues.eu; Fri, 30 Mar 2012 10:55:14 +0100
Received: from fheleno1 by sutherland.theukhost.net with local (Exim 4.69)
(envelope-from fheleno1@sutherland.theukhost.net)
id 1SDYY4-0001oP-Qk; Fri, 30 Mar 2012 10:55:12 +0100
To: ttm@multilingues.eu ,tt.tradutores@gmail.com, ttm@netcabo.pt
Subject: Formula Tradu??o
X-PHP-Script: www.tra-tec.com/EN/enviar_n.php for 66.249.72.16
From: Tradu??es@sutherland.theukhost.net
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx"
Message-Id: E1SDYY4-0001oP-Qk@sutherland.theukhost.net
Sender: fheleno1@sutherland.theukhost.net
Date: Fri, 30 Mar 2012 10:55:12 +0100
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - sutherland.theukhost.net
X-AntiAbuse: Original Domain - multilingues.eu
X-AntiAbuse: Originator/Caller UID/GID - [33579 33580] / [47 12]
X-AntiAbuse: Sender Address Domain - sutherland.theukhost.net
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - master.multilingues.eu
X-AntiAbuse: Original Domain - multilingues.eu
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - sutherland.theukhost.net

Formula : tradu??o
Origem: 
Destino:
Dias : 
Tipo de Tradu??o:
Prazo:
Nome:
Empresa:
Morada:
Email:
Telefone:
This is a multi-part message in MIME format.

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx
Content-Type:text/html; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Formula : tradu??o
Origem: 
Destino:
Dias : 
Tipo de Tradu??o:
Prazo:
Nome:
Empresa:
Morada:
Email:
Telefone:

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx
Content-Type: application/octet-stream;
name=""
Content-Transfer-Encoding: base64

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx--

@piher
Copy link
Contributor

piher commented Apr 2, 2012

Hmm...
First of all, do you know what was used to send the email ? Because there is absolutely no charset specified in the headers so my code wont change anything and I don't see how any code could.
Second of all, where did you copy this text you pasted ? Could you copy it directly from gmail ( there should be sthing like "show the orignial message" ) ?
Thanks

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

i got that from the raw value in the MailMessage that AE.Net.Mail uses
From my understanding of things, that comes from a form on a website that people can fill and then get's bounced around a few e-mail addresses that keep fowarding it untill finally falling on the mailbox i need to read from (i have no control in this process)
the original output in gmail is the following: (as you can see it gets the character set right)

Delivered-To: tt.tradutores@gmail.com
Received: by 10.182.236.42 with SMTP id ur10csp209607obc;
Fri, 30 Mar 2012 02:55:19 -0700 (PDT)
Received: by 10.180.95.74 with SMTP id di10mr8521215wib.1.1333101317951;
Fri, 30 Mar 2012 02:55:17 -0700 (PDT)
Return-Path: fheleno1@sutherland.theukhost.net
Received: from master.multilingues.eu ([213.175.194.88])
by mx.google.com with ESMTPS id s1si1947410wiy.19.2012.03.30.02.55.17
(version=TLSv1/SSLv3 cipher=OTHER);
Fri, 30 Mar 2012 02:55:17 -0700 (PDT)
Received-SPF: neutral (google.com: 213.175.194.88 is neither permitted nor denied by best guess record for domain of fheleno1@sutherland.theukhost.net) client-ip=213.175.194.88;
Authentication-Results: mx.google.com; spf=neutral (google.com: 213.175.194.88 is neither permitted nor denied by best guess record for domain of fheleno1@sutherland.theukhost.net) smtp.mail=fheleno1@sutherland.theukhost.net
Received: from 91.186.0.106
by master.multilingues.eu with esmtps (TLSv1:AES256-SHA:256)
(Exim 4.69)
(envelope-from fheleno1@sutherland.theukhost.net)
id 1SDYY6-0002i1-W2
for ttm@multilingues.eu; Fri, 30 Mar 2012 10:55:14 +0100
Received: from fheleno1 by sutherland.theukhost.net with local (Exim 4.69)
(envelope-from fheleno1@sutherland.theukhost.net)
id 1SDYY4-0001oP-Qk; Fri, 30 Mar 2012 10:55:12 +0100
To: ttm@multilingues.eu ,tt.tradutores@gmail.com, ttm@netcabo.pt
Subject: Formula Tradução
X-PHP-Script: www.tra-tec.com/EN/enviar_n.php for 66.249.72.16
From: Traduções@sutherland.theukhost.net
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx"
Message-Id: E1SDYY4-0001oP-Qk@sutherland.theukhost.net
Sender: fheleno1@sutherland.theukhost.net
Date: Fri, 30 Mar 2012 10:55:12 +0100
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - sutherland.theukhost.net
X-AntiAbuse: Original Domain - multilingues.eu
X-AntiAbuse: Originator/Caller UID/GID - [33579 33580] / [47 12]
X-AntiAbuse: Sender Address Domain - sutherland.theukhost.net
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - master.multilingues.eu
X-AntiAbuse: Original Domain - multilingues.eu
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - sutherland.theukhost.net

Formula : tradução
Origem: 
Destino:
Dias : 
Tipo de Tradução:
Prazo:
Nome:
Empresa:
Morada:
Email:
Telefone:
This is a multi-part message in MIME format.

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx
Content-Type:text/html; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Formula : tradução
Origem: 
Destino:
Dias : 
Tipo de Tradução:
Prazo:
Nome:
Empresa:
Morada:
Email:
Telefone:

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx
Content-Type: application/octet-stream;
name=""
Content-Transfer-Encoding: base64

--==Multipart_Boundary_x9b9cd318be197de0875d871ebf3e046fx--

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

First of all, do you know what was used to send the email ?

As you can see in the header, it's a PHP script (look at X-PHP script)

Because there is absolutely no charset specified in the headers so my code wont change anything and I don't see how any code could

Actually as i said your code does change thigs - from? inside a lozenge into a plain ?.- not much of a change but still a change - And as you see from the gmail raw data, even without the character set gmail does get it right - so it IS possible for code to get it right - the question is how

Second of all, where did you copy this text you pasted ? Could you copy it directly from gmail ( there should be sthing like "show the orignial message" ) ?

The first one was the raw variable in the MailMessage object in AE.Net.Mail (get's bad character set) - the second one is the one from GMail (character set correct)

@piher
Copy link
Contributor

piher commented Apr 2, 2012

Well from what I've read in the rfc I would say that this mail is not conform to the rfc because the headers contain non us-ASCII characters which are supposed to be signal by a special syntax.
So I'm sorry but you'll have to wait until someone more competent comes around here because I don't see how it is possible to correctly parse this email without running some byte analysis...

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

yes that was my initial assessment

i thought it weird to have non-ascii characters in e-mail address - didn't even think it was allowed (maybe am getting old) - stomped me

@537mfb
Copy link
Author

537mfb commented Apr 2, 2012

according to RFV 3501, the use of & in the address and a mix of a modified utf7 and base64 are used for this cases

Is & the special signal you mentioned? i don't see it in neither the MailMessage raw nor Gmail's output though
this is so weird

Don't know if that helps any,

@piher
Copy link
Contributor

piher commented Apr 2, 2012

No that's not what I was talking about and as you said the email doesn't even have that.
I was talking about the encoded-word syntax which looks like that :
From: ?someCharset?Q?aMailAdressWithAccentuation?=
You can either read rfc2047 or see the very clear explanation on : http://en.wikipedia.org/wiki/MIME#Encoded-Word

@andyedinborough
Copy link
Owner

Could you forward your sample message to andy.edinborough@gmail.com? Thanks!

@537mfb
Copy link
Author

537mfb commented Apr 3, 2012

@andyedinborough - mail sent

@piher - thanks - will look into that too

@537mfb
Copy link
Author

537mfb commented Apr 17, 2012

Just found another FROM null, this time it does have a display name though

headers['From'].rawvalue contains

\"PT, IBM-AP\" <IBM-AP.PT@unilever.com>

Something to do with the " encapsulating the name maybe? They are required according to the RFC since the name contains a comma - as far as i can tell this name/address pair conforms to tthe RFC

to get around this issue i use the following code (fix from one i left on my original posting way above)

name = "";
addr = "";
if (msgs[i].Value.From != null) // Get Name and Address from FROM object
{
    name = msgs[i].Value.From.DisplayName;
    addr = msgs[i].Value.From.Address;
}
else // Parse Name and Address from Header's RawValue
{
    string[] tok = msgs[i].Value.Headers["From"].RawValue.Split(new string[] { "<", ">" }, StringSplitOptions.RemoveEmptyEntries);
    if (tok.Length == 1) // Only Address is found
        addr = tok[0];
    else // Name and Address are found
    {
        addr = tok[1];
        name = tok[0];
    }
}
if (name.CompareTo(string.Empty) == 0) // If Name wasn't found, parse one from Address
{
    string[] tokens = addr.Split('@');
    name = tokens[0].Replace('.', ' ').Replace('-', ' ').Replace('_', ' ');
}

this happens with all names with " in them - not just this one

@537mfb
Copy link
Author

537mfb commented Apr 17, 2012

804a9f6 fixes the character set in address issue

@shawncarr
Copy link

Code was not committed to master only master-net35 so still able to reproduce this with latest.

@jstedfast
Copy link

The only way you'll ever get your address parser to work reliably is if you switch to using a tokenizer. String.Split() and IndexOf() approaches will only become completely unmaintainable and it's unlikely you'll ever reach a point where it works reliably for everyone.

I highly recommend taking a look at email address parser in MimeKit. Take a look at InternetAddressList.cs and InternetAddress.cs - they handle everything you can throw at them, including comments in the middle of the address. It also handles old-style addresses like this:

From: nsb@host (Neil Bornstein)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@andyedinborough @jstedfast @shawncarr @piher @537mfb and others