-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploads Stopping for Projects with Large Files #4572
Comments
What happens when you do "Tools / Retry pending transfers"? |
I have one computer (Rig-18) waiting to checkpoint ARP and a reboot. They're headless and often when I remote into them BOINCmgr says "Disconnected" and the only way I know to get "Connected" is either restart or reboot. From BoincTasks 1.85 it does nothing. This example is unusual in that no ARP1 WUs are pending upload just OPN1 and OPNG. But 3 ARP WUs are running.
|
client: fix overly aggressive project-wide file transfer backoff policy. #4575 |
It sure would be nice if someone that appreciates the physics of file transfer would take an interest in this issue. E.g., completed ARP WUs are returned as 7 files with multiplexing. I have no idea why they're in 7 files instead of just one file. It seems that one file without multiplexing would be 14x times more efficient. When transferring files there's some handshaking between the client and server and instead of doing that once it's done 2 x 7 = 14 times assuming multiplexing only divides the transfer between two destination servers. If multiplexed the servers then have to recombine the files for additional transactions wasting, time, energy, and bandwidth. If those 7 files need to be kept separate then couldn't they be zipped together on the client and transferred as a single file? |
@Aurum420, it was decided by the Project to have separate files. BOINC client can't zip these files because then they will be rejected by the Project. BOINC acts as instructed by the Project, and does no additional actions |
It has been suggested that I batch completed WUs and try to send too many at once. I do not batch and have even doubled my ISP speed. |
The Problem is that projects that upload large files tend to trigger errors such as "transient http error" that halts uploading. When this happens the only way to restart uploads for that computer is to reboot or restart the BOINC client. A consequence of this halt is that after a certain number of work units are in the upload queue it triggers "Not requesting tasks: too many uploads in progress" and downloads halt. After all work units complete the computer sits idle.
To reproduce use a 12c/24t or greater CPU to run 16 or more ARP1 work units to completion.
BOINC 7.6.16, Linux Mint 20.2, x86_64-pc-linux-gnu
The Goal is to allow any user with a bank of computers operating from a single IP address to run as many work units of any size for any BOINC project.
My goal is to turn in over 2,000 ARP work units per day which is 10% of that project's current daily progress. It doesn't seem in the spirit of BOINC to let a single simulation project run for over a year.
Anecdotal reports are that this does not happen when running a small number of large work units. My experience with ARP work units is that trying to run 16 ARP work units per 18c/36t computer is 100% guaranteed to fail and running 8 fails too often to endure. I'm currently trying 4 and 3 ARP work units but have already seen an upload seizure. Since ARP might not checkpoint for up to nine hours it wastes a lot of time either by dumping work or waiting for all work units to checkpoint.
It's been suggested that it's caused by too many large files uploading at once. The use of <max_file_xfers_per_project> to restrict the number of concurrent uploads has offered no benefit in keeping uploads from seizing up.
It's been suggested that having too many computers running on the same IP address is part of the problem. I thought the objective of BOINC was to get as much work done as fast as possible.
It's been suggested that perhaps <max_nbytes> is being specified too low. The following is an example of a seized WU where ARP returns 7 files. Note that <max_nbytes> takes on 3 different values:
Activating the http_debug flag produces a complicated output since the problem isn't triggered unless multiple files are affected.
The text was updated successfully, but these errors were encountered: