Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new line issues when parsing multipart/form-data? #3768

Closed
crookse opened this issue Jan 24, 2020 · 8 comments
Closed

new line issues when parsing multipart/form-data? #3768

crookse opened this issue Jan 24, 2020 · 8 comments
Labels
bug Something isn't working correctly

Comments

@crookse
Copy link

crookse commented Jan 24, 2020

i noticed something strange while trying to parse a multipart server request body. i kept getting the UnexpectedEOFError: Unexpected EOF error when using MultipartReader inmultipart.ts. example code that fails is below:

const reqBody = decoder.decode(await Deno.readAll(request.body));
const boundary = reqBody.split("\n").shift();
const mr = new MultipartReader(
  request.body,
  boundary
);
const form = await mr.readForm(20);
console.log(form);

having encountered this and not being able to debug it, i decided to clone deno and run the multipart_test.ts file. it worked. i then checked the sample.txt file it uses and saw the ^M characters. i deleted the characters, ran the tests again and they failed. below is a screenshot of the files i tested. the top file fails and the bottom file passes.

Screen Shot 2020-01-23 at 18 50 51

so, i think there is an issue around the way the MultipartReader reads new line characters in a multipart request body. i've traced the issue to this line ...

const line = await this.bufReader.readSlice("\n".charCodeAt(0));

... and could not find a solution.

@bartlomieju bartlomieju added the bug Something isn't working correctly label Jan 24, 2020
@artisonian
Copy link

Not sure if this is a bug:

https://tools.ietf.org/html/rfc7231#section-3.1.1.4

Looks like CRLF is required for multipart messages.

@crookse
Copy link
Author

crookse commented Jan 29, 2020

@artisonian,

So my text file (the one that fails the test) is invalid because it only contains LF? When I encountered this bug, I passed in a .txt file I created from TextEdit. The parser couldn’t read it. If CRLF is required, how can I get my text files to include CRLF on my Mac? Seems a bit annoying to have to rework the files you pass to the parser just to get it to read your files. In that case, I’m screwed because I’m on a Mac and it uses LF.

@rafaelvargas
Copy link
Contributor

rafaelvargas commented Jan 29, 2020

I agree with @artisonian. I don't really think it's a bug. The file named sample.txt, used to test the method readForm from MultipartReader, seems to follow the syntax rules that are defined by IETF for multipart HTTP bodies.

The common syntax for multipart HTTP bodies can be found in the RFC 2046, and a complex example of a multipart HTTP body can be seen in the RFC 2049.

@crookse
Copy link
Author

crookse commented Jan 30, 2020

deno authors,

I took another look at the way I was parsing my file. Below are my findings and the steps I took. I am running macOS Catalina Version 10.15.2.

  1. I created a file named test.txt using VIM - Vi IMproved 8.1 (2018 May 18, compiled Oct 30 2019 23:05:13) (my file below doesn't actually have CRLF... that's just what I wrote so don't take it literally):
hello
i have CRLF
and
this
should
work

  1. I created a deno file named app.ts:
import { BufReader } from "https://deno.land/std@v0.31.0/io/bufio.ts";
import { MultipartReader } from "https://deno.land/std@v0.31.0/mime/multipart.ts";
import { serve } from "https://deno.land/std@v0.31.0/http/server.ts";

const s = serve({ port: 1447 });
console.log("http://localhost:1447/");

for await (const req of s) {
  const readBody = await Deno.readAll(req.body);
  const br = new BufReader(new Deno.Buffer(readBody));
  const line: any = await br.readLine();
  const boundary = new TextDecoder().decode(line.line) + "\r\n";
  const mr = new MultipartReader(
    br,
    boundary
  );

  const encoder = new TextEncoder();

  try {
    const form = await mr.readForm(20);
    console.log(form);
  } catch (error) {
    console.log(error.stack);
  }
  req.respond({ body: "done" });
}
  1. I run the app and make a curl request:
deno --allow-all app.ts
Compile file:///private/var/src/deno-examples/uploading_multipart/app.ts
http://localhost:1447/

curl -F ‘file_1=@test.txt’ localhost:1447

Below is the error I receive after running the curl command

UnexpectedEOFError: Unexpected EOF
    at MultipartReader.nextPart (multipart.ts:350:15)
    at async MultipartReader.readForm (multipart.ts:276:17)
    at async app.ts:21:18

Now what's strange about this finding is that I've cloned the deno repo and changed my import statements to reference my local deno copy. After adding some logging, tweaking my local multipart.ts file, and tweaking my app.ts file, the multipart.ts file was able to read my test.txt file, but I still didn't get what I expected.

Below is what my tweaked multipart.ts file parsed. As you can see, there are 13, 10 byte encodings in some of the lines--the boundaries, the headers, and the empty lines that separate the body from said lines. The body does not include 13, 10 byte encodings. They only have 10s. If you decode these lines, you'll see that they match my test.txt file.

Uint8Array [ 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 48, 99, 51, 98, 54, 51, 51, 50, 99, 102, 102, 101, 101, 51, 98, 49, 13, 10 ]

Uint8Array [ 67, 111, 110, 116, 101, 110, 116, 45, 68, 105, 115, 112, 111, 115, 105, 116, 105, 111, 110, 58, 32, 102, 111, 114, 109, 45, 100, 97, 116, 97, 59, 32, 110, 97, 109, 101, 61, 34, 102, 105, 108, 101, 95, 50, 34, 59, 32, 102, 105, 108, 101, 110, 97, 109, 101, 61, 34, 116, 101, 115, 116, 46, 116, 120, 116, 34, 13, 10 ]

Uint8Array [ 67, 111, 110, 116, 101, 110, 116, 45, 84, 121, 112, 101, 58, 32, 116, 101, 120, 116, 47, 112, 108, 97, 105, 110, 13, 10 ]

Uint8Array [ 13, 10 ]

Uint8Array [ 104, 101, 108, 108, 111, 10 ]

Uint8Array [ 105, 32, 104, 97, 118, 101, 32, 67, 82, 76, 70, 10 ]

Uint8Array [ 97, 110, 100, 10 ]

Uint8Array [ 116, 104, 105, 115, 10 ]

Uint8Array [ 115, 104, 111, 117, 108, 100, 10 ]

Uint8Array [ 119, 111, 114, 107, 10 ]

Uint8Array [ 13, 10 ]

Uint8Array [ 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 48, 99, 51, 98, 54, 51, 51, 50, 99, 102, 102, 101, 101, 51, 98, 49, 45, 45, 13, 10 ]

Symbol(EOF)

Below are my files that I'm currently using to debug the "issue". Sorry btw, GitHub doesn't like .ts files being uploaded here, so I've changed the extensions to .txt.

test.txt
app.txt
multipart.txt

@artisonian / @rafaelvargas , like I said. I'm screwed... as well as others whose machines and editors don't add CRLF. Unless one of you have a proposed solution to handle this, I'm proposing that multipart.ts do some extra checking on line endings for the bodies of multipart messages. It is my opinion that myself and others shouldn't have to make sure our files that we plan to send have CRLF. My "issue" can be solved by taking my file and adding ^M characters to the ends of all my lines, but c'mon... that's jank.

@artisonian
Copy link

Sounds like there are two things going on:

  1. I created a Gist of your example to demonstrate how to use MultipartReader. The uploaded file does not require CRLF line endings.
  2. If you find the need to convert to CRLF, most text editors should have a way to do it. For example, the status bar at the bottom of VSCode allows you to convert between LF and CRLF.

@crookse
Copy link
Author

crookse commented Jan 30, 2020

@artisonian,

Thanks for the response. I tried your Gist. It works. I took a look at the way I'm parsing the boundary and the way you're parsing the boundary. Here is what I found:

yours: ------------------------b625443000c6a438
mine:  --------------------------b625443000c6a438

The issue is in the way I'm parsing the boundary. If it's ok with you, I'd like to take your Gist and create documentation on how to use the MultipartReader. That way people like myself don't run into this issue and use it incorrectly as I did. Please let me know. Thanks again!

@artisonian
Copy link

Feel free to use the gist. Also, I commented on #3763 which asks for documentation too.

@crookse
Copy link
Author

crookse commented Jan 31, 2020

not an issue. was using the MultipartReader incorrectly. documentation would be great though as mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly
Projects
None yet
Development

No branches or pull requests

4 participants