Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in header parsing #63

Closed
wants to merge 1 commit into from
Closed

Conversation

jkamdjou
Copy link

This PR fixes a bug where a CRLF ("folding" according to the RFC) in ADDRESSES_HEADERS would cause incorrect value extraction. For example, getaddresses() fails to parse this correctly, which will lead to an incorrect value in the respective mailparser field.

For example, this will fail:

foo = '"test \r\n" <foo@bar.com>'
print(email.utils.getaddresses([foo]))
[('', 'test ')]

According to RFC822:

Unfolding is accomplished by regarding CRLF immediately followed by a LWSP-char 
as equivalent to the LWSP-char.

LWSP-char as defined by the RFC:

LWSP-char   =  SPACE / HTAB                 ; semantics = SPACE

References

See RFC on "folding": https://tools.ietf.org/html/rfc822#section-3.1.1
See "linesep": https://docs.python.org/3/library/email.header.html#email.header.Header.encode

@fedelemantuano
Copy link
Contributor

Hi @jkamdjou,
can you attach an email example?

@jkamdjou
Copy link
Author

Sure (note the incorrect from field):

test.eml.txt

mailparser -f test.eml.txt --json
{
  "body": "Test",
  "to_domains": [
    "foo.com"
  ],
  "to": [
    [
      "",
      "bar@foo.com"
    ]
  ],
  "from": [
    [
      "",
      "test"
    ]
  ],
  "subject": "Test",
  "timezone": "-4.0",
  "date": "2019-10-21T18:23:24",
  "has_defects": false
}

@fedelemantuano
Copy link
Contributor

Hi @jkamdjou,
I can't merge this PR because the error is in email library.
When the header passes to your fix, it's already wrong:

h = decode_header_part(self.message.get(name_header, six.text_type()))

The problem is in self.message.get(name_header). I tested your solution but it doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants