Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on load bigger files #858

Open
PeopleInside opened this issue Nov 11, 2021 · 24 comments
Open

Issue on load bigger files #858

PeopleInside opened this issue Nov 11, 2021 · 24 comments

Comments

@PeopleInside
Copy link

PeopleInside commented Nov 11, 2021

Steps to reproduce

I edited the default sizelimit = 10485760 into config.php to allow a file of 59 MB
I uploaded and sent a MP$ file of 59 MB

What happens

After a short upload time I see an error:

Immagine 2021-11-11 175816

What should happen

The file should be upload successfully with the relative link

Additional information

Server error:
Got error 'PHP message: PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 80355920 bytes) in Unknown on line 0'

Basic information

I prefer not sharing the URL in public

Centos 7

I'm using Apache 2.4, PHP 7.4 and during the upload I checked the CPU and RAM usage and was never at 100 % but much lower.
Consider also on the same server I tried to attach the same file in another PHP app (an help desk app) and the file was attached without errors.

Browser:

Firefox 94.0.1

PrivateBin version:

1.3.5

I can reproduce this issue on https://privatebin.net: Yes / No

No because attachments are not allowed.

Another possible issue is that no alert message alert user if the attachment exceed the max size set in the config file.
Before upload maybe a check should check attachment size and print an error if the size is out of the allowed bytes set in the config file.

@r4sas
Copy link
Member

r4sas commented Nov 11, 2021

@PeopleInside
Copy link
Author

@r4sas already done as I wrote on my post I edited the config file and my server support big file as in the same server in a help desk I'm able to upload the file. I also tried to force with a php.ini file but the issue just on PrivateBin persist.

@r4sas
Copy link
Member

r4sas commented Nov 11, 2021

Please give your current values of upload_max_filesize, post_max_size and memory_limit.
I just added to wiki about requirement of increasing of memory limit, because it is needed more RAM to process bigger pastes.

@r4sas
Copy link
Member

r4sas commented Nov 11, 2021

Also you can see here: #601

add: and, btw, if you want to send 59 MB file, you need to set 1.3 times more size limit (due to encrypting and base64 encoding), so it must be set to ~77 MB.

@PeopleInside
Copy link
Author

memory_limit = 512M
max_execution_time = 180 ; Maximum execution time of each script, in seconds
max_input_time = 60 ; Maximum amount of time each script may spend parsing request data
max_input_nesting_level = 64 ; Maximum input variable nesting level
post_max_size = 10000M
upload_max_filesize = 9000M

@r4sas
Copy link
Member

r4sas commented Nov 11, 2021

Can you check phpinfo()? Maybe your configuration is not applied, if it prints that Allowed memory size of 268435456 bytes exhausted?

@PeopleInside
Copy link
Author

1
2

@r4sas
Copy link
Member

r4sas commented Nov 11, 2021

In that case let's wait someone who using Apache, or knows how it works with such data sizes.

@PeopleInside
Copy link
Author

OK, thank you.

The strange thing is that I use in the same server an Help Desk system, I tried to attach the same file using attachments and in this case is uploaded without errors... so looks strange to me it's a server configuration issue.

Thank you anyway.
I will keep this open and will monitor it.

Also maybe you should consider to add a check when a file is uploaded: if in the config file max upload file size is 10 MB and user try to upload a file that is 20 MB a check should print an error to user and never upload. Maybe also a small sentence that show the user the max file size allowed in the attachment menu will be useful.

Immagine 2021-11-11 183545

@r4sas
Copy link
Member

r4sas commented Nov 11, 2021

Yes, I done that already for my instance: https://paste.i2pd.xyz/

@PeopleInside
Copy link
Author

PeopleInside commented Nov 11, 2021

@r4sas also on your instance I cannot upload a file that is 58 MB I get the error.
the limit indicated seems to be 64 so I should not get the error for 59 MB

I see the same issue on your instance.

@r4sas
Copy link
Member

r4sas commented Nov 11, 2021

64 is overall size for file + encryption, so you can send only 44 MB (Note: Max file size is 70% of overall limit, which is 64 MB).

@PeopleInside
Copy link
Author

Interesting. Max upload file should be so showed as 44 MB will be more easy to understand :)

@PeopleInside
Copy link
Author

I get issue on your website also with 40 MB

screenpastebin

Nice that activate dark mode in automatic, this I think will developed in some new version of PrivateBin?

@rugk
Copy link
Member

rugk commented Nov 20, 2021

Also maybe you should consider to add a check when a file is uploaded: if in the config file max upload file size is 10 MB and user try to upload a file that is 20 MB a check should print an error to user and never upload. Maybe also a small sentence that show the user the max file size allowed in the attachment menu will be useful.

Yes, that is a good idea and tracked in #95

Oh and @r4sas if you have a patch for that, feel free to submit a PR.

Nice that activate dark mode in automatic, this I think will developed in some new version of PrivateBin?

Hmm interesting, @r4sas if so, could you comment in #433? Because that is the proper issue about that feature. As for this issue here, this is unfortunately off-topic.

@r4sas
Copy link
Member

r4sas commented Nov 22, 2021

As I see, for uploading 42MiB file I need ~300MB memory_limit. That's too much...

Here memory usage for some steps inside Controller.php, Request.php and FormatV2.php:

# at start we have file in php://input
[22-Nov-2021 02:20:18 UTC] Memory usage before json::decode = 113113936
# decoded request add whole paste size to used memory
[22-Nov-2021 02:20:19 UTC] Memory usage before preparing operation = 169461512
[22-Nov-2021 02:20:19 UTC] Memory usage before getData = 169469232
[22-Nov-2021 02:20:19 UTC] Memory usage before isValid = 169469608
[22-Nov-2021 02:20:19 UTC] Memory usage before base64_decode = 169470592
# after decoding CT got + file size
[22-Nov-2021 02:20:19 UTC] Memory usage before gzdeflate = 225815192
# here we stuck when gzdeflate is called
[22-Nov-2021 02:20:19 UTC] PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 42891232 bytes) in /home/www/paste.i2pd.xyz/public_html/lib/FormatV2.php on line 115

idk if we can somehow clean php://input after decoding received data.

@rugk
Copy link
Member

rugk commented Nov 22, 2021

As I see, for uploading 42MiB file I need ~300MB memory_limit. That's too much...

So thanks for your investigation, but the question indeed is: What causes this huge memory footprint and what can we do to make it better?

Uff, yeah, I agree.
Actually this could be security, aka availability, relevant as you may be able to turn this into a DoS attack if you exhaust the memory of the server. Through, as we see, PHP already limits that.🙃

Of course the best would be some streaming implementation that does not load the whole file in memory at any time, which currently seems to happen, does not it?

@PeopleInside
Copy link
Author

We are sure the file is not uploaded two times?
I read about some bug that load two times files or paste.

@rugk
Copy link
Member

rugk commented Nov 22, 2021

I read about some bug that load two times files or paste.

Hmm? Are you able to find that again?

Generally, I'd say no, it is not? Because you likely would see that in the resulting paste if two files are attached or one file with doubled size.

@PeopleInside
Copy link
Author

Generally, I'd say no, it is not? Because you likely would see that in the resulting paste if two files are attached or one file with doubled size.

this never happen but, on my install I configured the system to ask for password if user paste something.
My instance is read only and user try to paste a text string or a file is asked to login, this is a server configuration.

What I see is that if I am not authenticated and upload a large file size like 50 MB the system start to upload than after uploaded ask me credentials than upload again. At the end will find memory exhausted.

Maybe is because uploads than ask password and start to elaborate?
Looks strange 50 MB can be 300 MB or more on server memory or not?
Don't know in any case file attachment looks to be limited to smalls files.

privatebin

@rugk
Copy link
Member

rugk commented Nov 22, 2021

What I see is that if I am not authenticated and upload a large file size like 50 MB the system start to upload than after uploaded ask me credentials than upload again. At the end will find memory exhausted.

Could you have a look at the dev tools, specifically network tools?
AFAIK, what happens is e.g.:

  1. first, locally everything is compressed and encrypted. This takes some time, but AFAIK the status message at the top may be a different one – I'm not sure out of the top of my head.
  2. everything is uploaded, AFAIK nothing is chunked (also potentially not a good thing), or is chunking done, @elrido?
  3. then the requests is processed and the server also does some non-easy stuff with that such as attempting to compress/zlib it to check whether it is encrypted.

All of these steps take time and that's why you may have that impression. AFAIK before 2. the password should be asked – or actually, how does the browser know it needs to ask for a password – likely because of a webserver reply. So maybe it is indeed sending it twice, but only because there is no other way?
I'd need to try that out or read-up again on how exactly the default HTTP auth method works and when it does what… 🙃

@r4sas
Copy link
Member

r4sas commented Nov 22, 2021

Of course the best would be some streaming implementation that does not load the whole file in memory at any time, which currently seems to happen, does not it?

Sure it is, but current checks every time do operations, which requires exta memory, like I commented in usage records before. So how you can do it streamed - I don't know.

Again, where memory is allocated:

  1. php://input stream
  2. when raw data fetched from php://input stream, but it will be cleaned because works in pipe with json decoder (see below)
  3. when fetched data loaded to json, and it's output saved to variable

    PrivateBin/lib/Request.php

    Lines 113 to 114 in e36a94c

    $this->_params = Json::decode(
    file_get_contents(self::$_inputStream)
  4. when data loaded to validator:
    if (!FormatV2::isValid($data, $isComment)) {
    $this->_return_message(1, I18n::_('Invalid data.'));
    return;
    }
  5. when ct decoded from base64:
    if (!($ct = base64_decode($message['ct'], true))) {
    return false;
    }
  6. when decoded data deflated:

    PrivateBin/lib/FormatV2.php

    Lines 117 to 119 in e36a94c

    if (strlen($ct) > strlen(gzdeflate($ct))) {
    return false;
    }

We are sure the file is not uploaded two times?

I have php upload limit set to 64MB, so it is not possible. I think before json::decode we have double memory usage because copying of data somewhere (maybe inside php itself) earlier than call of Request.php

@elrido
Copy link
Contributor

elrido commented Nov 25, 2021

Thank you for the analysis r4sas, that also should clarify that the API currently doesn't chunk.

PHP does indeed duplicate memory when passing variables between functions. This could be improved by passing references instead. References are not pointers, as the doc explains, so you have to be careful to only change the passed reference in one place or not at all, or it will still duplicate the content. We'd have to test memory consumption before and after, when trying to change that.

Introducing chunking is more complex. I mainly avoided it when changing the API in the past, because the main purpose of this project is to share snippets of code, text, small images or simple documents. KiB of data, not MiB. As such, they can easily be contained in a single HTTP request. And we can also let the client do all the heavy lifting of creating the paste in the right format and only validate it is correct on the server, but don't need to change it, before writing to disk (for the database we do take them apart, though).

In order to chunk the message, we have to solve the problem of ordering & completeness and how to deal with the JSON. We would need to change the API to store chunked data sent in individual JSON encoded messages. The client passes an initial request with a JSON structure containing i.e. the meta data and an announcement of how many chunks to expect or how large the payload is going to be (if we go with dynamic chunk size). All subsequent chunks are JSON with a serial number and the partial message, so the server can reconstruct the message in the correct order (TCP sequencing doesn't cover this, as we are doing independent HTTP calls in this case and the webserver and proxies in between might change the order, process these in parallel, etc.). Finally the server collects and validates all the chunks (on disk, in a temporary directory) and reconstructs the message when all have arrived and are valid.

All of this sounds very error prone and complex to me, so I'd just recommend to either accept the limitations of this tool or that you need ludicrous amounts of memory (both server and client-side) to transfer large files. I'd personally recommend setting up a dedicated file management software like Nextcloud for transmitting large files. It also comes with user management, quotas and still let's you create anonymous links to share data.

@rugk
Copy link
Member

rugk commented Aug 24, 2023

We have reports of PrivateBIn working for up to 100MB files… 😲 https://github.com/orgs/PrivateBin/discussions/1153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants