Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what(): Couldn't find Content-Type phrase (boundary) #332

Closed
malaterre opened this issue Nov 26, 2013 · 10 comments
Closed

what(): Couldn't find Content-Type phrase (boundary) #332

malaterre opened this issue Nov 26, 2013 · 10 comments

Comments

@malaterre
Copy link

I cannot parse the following mime file. Steps:

$ mkdir /tmp/d
$ cd /tmp/d
$ apt-get source cpp-netlib
$ cd cpp-netlib-0.10.1
$ g++ -o demo ./libs/mime/example/basic_parsing.cpp
$ ./demo.sh
$ ./libs/mime/test/mimeParse.py ./demo.mime


Data from: ./demo.mime
Content-Type: multipart/related
There are 1 headers
There are 1 sub parts
Content-Type: application/pdf
There are 1 headers
The body is 512 bytes long
0 0 0 0 0 ... 0 0 0 0 0

while:

$ ./demo ./demo.mime


terminate called after throwing an instance of 'std::runtime_error'
what(): Couldn't find Content-Type phrase (boundary)
[1] 1328 abort ./demo ./demo.mime

Where:

$ cat demo.sh

!/bin/sh

out=demo.mime

boundary="demo.bug.cpp-netlib"
content_type="Content-Type: multipart/related; type=application/pdf;
boundary=${boundary}"

echo -n "${content_type}\r\n\r\n" > $out
echo -n "--$boundary\r\n" >> $out
echo -n "Content-Type: application/pdf\r\n" >> $out
echo -n "\r\n" >> $out
head -c 1b /dev/zero >> $out
echo -n "\r\n" >> $out
echo -n "--$boundary--" >> $out

@malaterre
Copy link
Author

See also: http://bugs.debian.org/730542

@deanberris
Copy link
Member

I'm not sure about this -- maybe @mclow can help?

@malaterre
Copy link
Author

I have reduced test case. The following does not work:

Content-Type: multipart/related; type=multipart/alternative; boundary="_NextPart_001_0037_D092C96B.3CE29AF1"

--_NextPart_001_0037_D092C96B.3CE29AF1
Content-Type: image/gif; name="ani.gif"

R0lGODlh/ABLAMQAAP///wAAgAAA/wAAADMzmRERiEREoVVVqoiIw+7u9szM5SIikLu73d3d
6ExTaKYZ5BZsvZA17nclNtbzLoq7VTgCe+PJth3kgqqbBUICPJUKSh2CUsthamZ04ZhJAcCq

--_NextPart_001_0037_D092C96B.3CE29AF1--

However if I add " (quotes), then it works ok:

Content-Type: multipart/related; type="multipart/alternative"; boundary="_NextPart_001_0037_D092C96B.3CE29AF1"

--_NextPart_001_0037_D092C96B.3CE29AF1
Content-Type: image/gif; name="ani.gif"

R0lGODlh/ABLAMQAAP///wAAgAAA/wAAADMzmRERiEREoVVVqoiIw+7u9szM5SIikLu73d3d
6ExTaKYZ5BZsvZA17nclNtbzLoq7VTgCe+PJth3kgqqbBUICPJUKSh2CUsthamZ04ZhJAcCq

--_NextPart_001_0037_D092C96B.3CE29AF1--

@malaterre
Copy link
Author

So the parser requires two things:

It should read as:
Content-Type: multipart/related; type="application/dicom"; boundary=4ebf00fbcf09
instead of
Content-Type: multipart/related; type=application/dicom; boundary=4ebf00fbcf09

and 2.
echo -n "--$boundary--" >> $out
should read
echo -n "--$boundary--\r\n" >> $out

the trailing line of a mime message should end with "\r\n"

I am not sure why the python parser is relaxed about those.

@deanberris
Copy link
Member

So, is this a problem with the Python parser, or the MIME code?

@malaterre
Copy link
Author

MIME code. Quotes should not be required, but python handles both forms ref:

http://tools.ietf.org/html/rfc2045#section-5.1

@malaterre
Copy link
Author

I am not sure about the trailing \r\n. some guru would need to double check why MIME code needs trailing \r\n

@mclow
Copy link

mclow commented Nov 26, 2013

On Nov 26, 2013, at 4:15 AM, Mathieu Malaterre notifications@github.com wrote:

So the parser requires two things:

It should read as:
Content-Type: multipart/related; type="application/dicom"; boundary=4ebf00fbcf09
instead of
Content-Type: multipart/related; type=application/dicom; boundary=4ebf00fbcf09

It’s because of the ‘/‘ in "application/dicom”.

Slash is not allowed in a token in the Content-type header.

See the paper Mathieu referenced:
http://tools.ietf.org/html/rfc2045#section-5.1

token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>

 tspecials :=  "(" / ")" / "<" / ">" / "@" /
               "," / ";" / ":" / "\" / <">
               "/" / "[" / "]" / "?" / "="
               ; Must be in quoted-string,
               ; to use within parameter values

@infinity0
Copy link

The terminating CRLF is mandatory too: http://tools.ietf.org/html/rfc2046#section-5.1.1

   The Content-Type field for multipart entities requires one parameter,
   "boundary". The boundary delimiter line is then defined as a line
   consisting entirely of two hyphen characters ("-", decimal value 45)
   followed by the boundary parameter value from the Content-Type header
   field, optional linear whitespace, and a terminating CRLF.
   [..]
   The boundary delimiter line following the last body part is a
   distinguished delimiter that indicates that no further body parts
   will follow.  Such a delimiter line is identical to the previous
   delimiter lines, with the addition of two more hyphens after the
   boundary parameter value.

So this is not a bug and the Python parser is too relaxed.

@malaterre
Copy link
Author

Thanks all for the comments ! Closing then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants