Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pastes get truncated #328

Closed
hasufell opened this issue Jun 19, 2018 · 17 comments · Fixed by #431
Closed

Pastes get truncated #328

hasufell opened this issue Jun 19, 2018 · 17 comments · Fixed by #431

Comments

@hasufell
Copy link

Steps to reproduce

  1. Create a paste of ~1.2mb

What happens

When copy pasting the link into a new browser window, the paste is truncated to a length of 604K.

What should happen

The paste should not be truncated.

Additional information

Basic information

Server OS: Gentoo, but PrivateBin runs in docker

Webserver: jwilder/nginx-proxy as front proxy with the docker container of PrivateBin

Browser: tried with firefox 60.0.2 and chrome 67.0.3396.87

PrivateBin version: privatebin/nginx-fpm-alpine:1.1.1

The size limit in the config is set to a high value (8mb) and the nginx front proxy is configured with client_max_body_size 0; to allow arbitrary big body size.

@elrido
Copy link
Contributor

elrido commented Jun 19, 2018

I could not reproduce this one. With the latest container, I could submit a paste of just a bit under 2 MiB and it was stored completely and could fully be read. I am using big.txt as a source for large paste texts.

@hasufell
Copy link
Author

Still the same with the lastest images.

@elrido
Copy link
Contributor

elrido commented Jun 19, 2018

Great find! Its something in that particular input. The data is already truncated when it gets sent to the server, so it must be some issue in the JS logic. I'll try to find a smaller part that triggers the issue and turn it into a unit test to track it down more easily.

@elrido
Copy link
Contributor

elrido commented Jun 19, 2018

Haven't been able to nail it down, but it looks like the content gets truncated by the DOMPurify library. Something in there makes it look like risky HTML to that tool.

@rugk any idea what it might not like? Is it the ANSI escape sequences?

@hasufell
Copy link
Author

That's pretty bad. Why is it not treated as bytes?

@elrido
Copy link
Contributor

elrido commented Jun 19, 2018

Because we don't want to just display JS exploits to someone visiting a paste? You always have to sanitize user input and since the server can't do it, we have to at least try to do it on the JS end.

PS: Also in JS there are only strings and no "raw byte" type. It's not a strongly typed language. JS treats its strings as UTF-16, while we use UTF-8 in the user input and content encoding from and to the server.

@hasufell
Copy link
Author

hasufell commented Jun 19, 2018

You always have to sanitize user input

Uhm. I don't follow. How can it be an exploit if the data pasted into the input field is not executed and just treated as bytes?

And even if you sanitize, why is it not rejected if it's malicious?

A paste service that cannot guarantee that input = output is a problem.

@rugk
Copy link
Member

rugk commented Jun 19, 2018

No it would be executed when attached to the DOM. Well… but that ("why") does not matter anyway, let's rather tackle the actual issue.

I could reproduce it and it get's truncated here (the selected text is the last one, which is included):
grafik

So it is not really a particular symbol. And when I do not use the whole file, it truncates the input at a different position, so uggh… 😣

In any case, I'd also guess it is a bug in DOMPurify:

And, BTW, a new version has been released some days ago, so we might upgrade and also test it with this one.

But here something similar seems to have happened, where it recognized a tag it should not… mhh… (cannot reproduce this example anymore, however)


However, maybe it's actually not DOMPurify's problem, because on their test page I cannot reproduce it with that input.

@rugk
Copy link
Member

rugk commented Jun 19, 2018

As for @hasufell's argument, I guess #330 should cover it.

@elrido
Copy link
Contributor

elrido commented Jun 25, 2018

Another update on this: I finally got the content into a unit test for the CryptTool. This shows me that the en- & decryption & compression doesn't truncate it, the results are identical.

The issues I had with this string is that it contains all types of quotes, necessitating that I add escaping for one of them. There are also Unicode U+200B Zero-width space characters in there, not ANSI escape sequences as I had assumed on first glance. These cause "Unexpected token ILLEGAL" syntax errors in node if not used in template string quoting (backticks).

I'll now change this test to use DOMPurify on it, something we haven't tested so far, since they run their own unit tests.

@elrido
Copy link
Contributor

elrido commented Jun 25, 2018

Nope, it's not DOMPurify either. Will have to test the full stack then.

@rugk
Copy link
Member

rugk commented Jun 26, 2018

So what else would it be? Some simple display error? Did you look into the source code that is actually added to the DOM?

@elrido elrido self-assigned this Jun 26, 2018
@elrido
Copy link
Contributor

elrido commented Jun 26, 2018

That is my next step. So far I focused on the obvious targets.

I found out that we have this issue since at least ZeroBin 0.18. Doesn't excuse it, but is further evidence that it is something in the core of the application. Something that hasn't changed (much) since.

What I hadn't considered but found out during testing this morning was that this sample compresses really well, from 1.2 MiB down to 30 KiB when deflated, encryption and base64 then inflates this to about 90 KiB. This was the POST size that made me initially think that it is already truncated before storing it. Not so sure about that now, will probably have to dump the contents at various stages and unpack them to be sure where exactly it happens.

@rugk
Copy link
Member

rugk commented Jun 26, 2018

Or just using the debugger somehow and get into different stages? I mean at some point it has to be truncated?

@elrido
Copy link
Contributor

elrido commented Jun 26, 2018

That was what I was talking about.

Now, bad news: The truncation happens during the rawdeflate/rawinflate compression. I am not sure if it happens during the compression or the decompression. Probably the latter, since the compression gives me about the same sized output, as when I use gzip on the sample on the command line.

I hadn't found this yesterday, because, as described, I had to escape some characters to get that string into a unit test. This apparently made it safe for rawdeflate/inflate. I'll rewrite the test to instead read the string out of a file, hope I can that way finally create a test case.

With 1.3 we anyway planned to switch to the pako library, since rawdeflate isn't 100% standard compliant and hence can't be deflated by standard libraries. If we can't find a workaround for this (maybe we can escape whatever causes this?) we will have to leave this in the 1.2 release and fix it with the switch to pako in 1.3.

@elrido
Copy link
Contributor

elrido commented Jun 30, 2018

Lets revisit this issue once we switched to pako.

elrido added a commit that referenced this issue May 15, 2019
Release 1.3 - Review & refactor paste format automation moved this from In progress to Done May 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants