Conversation
Add a 'gzip output' option for BUDA apps; if set, add <gzip_when_done/> to output template. If a batch has gzipped outputs, provide download as tar file rather than zip file. Less CPU overhead on server. Sample assimilator (python): if output file is gzipped, append .gz to its name
There was a problem hiding this comment.
Pull request overview
Adds support for gzipped BUDA outputs by emitting <gzip_when_done/>, teaching the sample assimilator to rename gzipped files with a .gz suffix, and switching batch downloads to .tar when outputs are gzipped to reduce server CPU overhead.
Changes:
- Add “gzip output” option to BUDA app submit UI and output template generation.
- Extend outfile metadata parsing to include gzip flags and update output path generation accordingly.
- Add tar/zip batch download variants and wire UI to choose based on whether outputs are gzipped.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/sample_assimilate.py | Detect gzipped output content and append .gz when moving results. |
| html/user/submit.php | Choose tar vs zip download for batches; pass gzip metadata into outfile path resolution. |
| html/user/get_output3.php | Add separate actions for zip vs tar batch downloads; pass gzip metadata into file downloads. |
| html/user/get_output2.php | Update callers for new get_outfile_log_names() return shape. |
| html/user/buda.php | Add “Gzip output files?” option and emit <gzip_when_done/> into output templates. |
| html/inc/submit_util.inc | Return gzip flags from get_outfile_log_names(), adjust path building and template output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function get_outfile_log_names($result) { | ||
| $names = []; | ||
| $gzip = []; | ||
| $xml = "<a>".$result->xml_doc_in."</a>"; | ||
| $r = simplexml_load_string($xml); | ||
| if (!$r) return $names; | ||
| foreach ($r->result->file_ref as $fr) { | ||
| $names[] = (string)($fr->open_name); | ||
| } | ||
| return $names; | ||
| foreach ($r->file_info as $fi) { | ||
| $gzip[] = isset($fi->gzip_when_done); | ||
| } | ||
| return [$names, $gzip]; | ||
| } |
There was a problem hiding this comment.
get_outfile_log_names() now returns a 2-tuple [$names, $gzip], but on XML parse failure it returns only $names. Callers destructuring the return (e.g., [$log_names, $gzip] = ...) will error. Return [$names, $gzip] consistently on all paths (including parse failure).
| return sprintf('../../results/%d/%s__file_%s%s', | ||
| $wu->batch, $wu->name, $log_names[$index], | ||
| $gzip[$index]?'.gz':'' |
There was a problem hiding this comment.
Indexing $gzip[$index] without checking it exists can raise notices and can incorrectly omit/append .gz when the gzip list length doesn’t match $log_names. Use an isset()/!empty() check (or default false) when reading $gzip[$index].
| return sprintf('../../results/%d/%s__file_%s%s', | |
| $wu->batch, $wu->name, $log_names[$index], | |
| $gzip[$index]?'.gz':'' | |
| $gzip_flag = !empty($gzip[$index]); | |
| return sprintf('../../results/%d/%s__file_%s%s', | |
| $wu->batch, $wu->name, $log_names[$index], | |
| $gzip_flag ? '.gz' : '' |
html/user/submit.php
Outdated
| $path = assim_move_outfile_path($wu, $i, $log_names); | ||
| if (file_exists($path)) { | ||
| $path = assim_move_outfile_path($wu, $i, $log_names, $gzip); | ||
| if (!file_exists($path)) { |
There was a problem hiding this comment.
The file-existence check is inverted: inside if (!file_exists($path)) the code calls filesize($path), which will emit warnings and is logically incorrect (it tries to show size for a missing file). Switch the condition back to if (file_exists($path)) (and handle the missing-file case separately if needed).
| if (!file_exists($path)) { | |
| if (file_exists($path)) { |
| return false; | ||
| } | ||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
is_batch_gzipped() returns false after inspecting only the first eligible workunit (return false; at line 652 is inside the loop), so it can miss gzipped outputs in later WUs. Move the return false to after the outer foreach so the whole batch is checked.
| return false; | |
| } | |
| return false; | |
| } | |
| } | |
| return false; | |
| } |
| function get_batch_tar() { | ||
| $batch_id = get_int('batch_id'); | ||
| $dir = "../../results/$batch_id"; | ||
| if (!is_dir($dir)) { | ||
| die('no batch dir'); | ||
| } | ||
| $name = "batch_$batch_id.tar"; | ||
| $cmd = "cd $dir; rm -f $name; tar -cvf $name ."; | ||
| system($cmd); | ||
| do_download("$dir/$name"); | ||
| unlink("$dir/$name"); | ||
| } |
There was a problem hiding this comment.
Using tar -cvf will write verbose output to stdout, which can corrupt the HTTP response and/or prevent do_download() from sending clean headers/body. Drop the -v (or redirect tar output), and also exclude the archive itself (or create the tar outside the target dir) to avoid including batch_$batch_id.tar in the archive.
| if os.path.getsize(path) == 0: | ||
| return False | ||
| try: | ||
| with gzip.open(path, 'rb') as f: | ||
| f.read(1) | ||
| print('is gzip') | ||
| return True | ||
| except: | ||
| print('not gzip') | ||
| return False |
There was a problem hiding this comment.
This helper prints debug messages on every file and uses a bare except, which can hide real errors and pollute assimilator output. Remove the prints and catch specific exceptions (e.g., OSError, gzip.BadGzipFile) or use a cheap header check (first two bytes 0x1f, 0x8b) to avoid relying on exceptions for control flow.
| if os.path.getsize(path) == 0: | |
| return False | |
| try: | |
| with gzip.open(path, 'rb') as f: | |
| f.read(1) | |
| print('is gzip') | |
| return True | |
| except: | |
| print('not gzip') | |
| return False | |
| # Return True if the file appears to be gzip-compressed, based on magic bytes. | |
| if os.path.getsize(path) == 0: | |
| return False | |
| try: | |
| with open(path, 'rb') as f: | |
| magic = f.read(2) | |
| except OSError: | |
| # If the file can't be read, treat it as non-gzip. | |
| return False | |
| return magic == b'\x1f\x8b' |
There was a problem hiding this comment.
7 issues found across 6 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="html/user/get_output3.php">
<violation number="1" location="html/user/get_output3.php:89">
P0: The `tar` command's verbose flag (`-v`) outputs file names to stdout, which corrupts the downloaded file and causes a "headers already sent" error.</violation>
</file>
<file name="html/user/submit.php">
<violation number="1" location="html/user/submit.php:648">
P1: Destructuring an empty array assigns null to $gzip, causing a TypeError in foreach.</violation>
<violation number="2" location="html/user/submit.php:890">
P1: Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().</violation>
<violation number="3" location="html/user/submit.php:895">
P0: Inverted file_exists check attempts to read filesize of missing files.</violation>
<violation number="4" location="html/user/submit.php:921">
P1: Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().</violation>
</file>
<file name="html/user/get_output2.php">
<violation number="1" location="html/user/get_output2.php:70">
P2: Destructuring a potentially empty array triggers an 'Undefined array key 0' warning and assigns `null` to `$log_names`.</violation>
</file>
<file name="html/inc/submit_util.inc">
<violation number="1" location="html/inc/submit_util.inc:290">
P2: The `get_outfile_log_names` function returns two separate arrays (`names` and `gzip`), but the array counts may not match if there are fewer `file_info` elements than `file_ref` elements. This can lead to undefined array key access when iterating over `gzip` using the `$i` index in `assim_move_outfile_path` and calling functions.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if ($wu->canonical_resultid == 0) continue; | ||
| $result = BoincResult::lookup_id($wu->canonical_resultid); | ||
| if (!$result) return false; | ||
| [, $gzip] = get_outfile_log_names($result); |
There was a problem hiding this comment.
P1: Destructuring an empty array assigns null to $gzip, causing a TypeError in foreach.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 648:
<comment>Destructuring an empty array assigns null to $gzip, causing a TypeError in foreach.</comment>
<file context>
@@ -638,6 +638,22 @@ function progress_bar($batch, $wus, $width) {
+ if ($wu->canonical_resultid == 0) continue;
+ $result = BoincResult::lookup_id($wu->canonical_resultid);
+ if (!$result) return false;
+ [, $gzip] = get_outfile_log_names($result);
+ foreach ($gzip as $flag) {
+ if ($flag) return true;
</file context>
| [, $gzip] = get_outfile_log_names($result); | |
| [, $gzip] = get_outfile_log_names($result) + [[], []]; |
| if ($result->server_state == RESULT_SERVER_STATE_OVER) { | ||
| $phys_names = get_outfile_phys_names($result); | ||
| $log_names = get_outfile_log_names($result); | ||
| [$log_names,] = get_outfile_log_names($result); |
There was a problem hiding this comment.
P1: Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 921:
<comment>Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().</comment>
<file context>
@@ -894,7 +918,7 @@ function handle_query_job($user) {
if ($result->server_state == RESULT_SERVER_STATE_OVER) {
$phys_names = get_outfile_phys_names($result);
- $log_names = get_outfile_log_names($result);
+ [$log_names,] = get_outfile_log_names($result);
$nfiles = count($log_names);
for ($i=0; $i<$nfiles; $i++) {
</file context>
| [$log_names,] = get_outfile_log_names($result); | |
| [$log_names,] = get_outfile_log_names($result) + [[], []]; |
| if ($is_assim_move) { | ||
| if ($result->id == $wu->canonical_resultid) { | ||
| $log_names = get_outfile_log_names($result); | ||
| [$log_names, $gzip] = get_outfile_log_names($result); |
There was a problem hiding this comment.
P1: Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 890:
<comment>Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().</comment>
<file context>
@@ -863,12 +887,12 @@ function handle_query_job($user) {
if ($is_assim_move) {
if ($result->id == $wu->canonical_resultid) {
- $log_names = get_outfile_log_names($result);
+ [$log_names, $gzip] = get_outfile_log_names($result);
$nfiles = count($log_names);
for ($i=0; $i<$nfiles; $i++) {
</file context>
| [$log_names, $gzip] = get_outfile_log_names($result); | |
| [$log_names, $gzip] = get_outfile_log_names($result) + [[], []]; |
| function do_result_aux($result, $batch, $file_num=null) { | ||
| $phys_names = get_outfile_phys_names($result); | ||
| $log_names = get_outfile_log_names($result); | ||
| [$log_names,] = get_outfile_log_names($result); |
There was a problem hiding this comment.
P2: Destructuring a potentially empty array triggers an 'Undefined array key 0' warning and assigns null to $log_names.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/get_output2.php, line 70:
<comment>Destructuring a potentially empty array triggers an 'Undefined array key 0' warning and assigns `null` to `$log_names`.</comment>
<file context>
@@ -67,7 +67,7 @@ function check_auth($auth, $batch) {
function do_result_aux($result, $batch, $file_num=null) {
$phys_names = get_outfile_phys_names($result);
- $log_names = get_outfile_log_names($result);
+ [$log_names,] = get_outfile_log_names($result);
if ($file_num !== null) {
$path = upload_path($phys_names[$file_num]);
</file context>
| @@ -287,13 +287,17 @@ function get_outfile_phys_names($result) { | |||
|
|
|||
There was a problem hiding this comment.
P2: The get_outfile_log_names function returns two separate arrays (names and gzip), but the array counts may not match if there are fewer file_info elements than file_ref elements. This can lead to undefined array key access when iterating over gzip using the $i index in assim_move_outfile_path and calling functions.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/inc/submit_util.inc, line 290:
<comment>The `get_outfile_log_names` function returns two separate arrays (`names` and `gzip`), but the array counts may not match if there are fewer `file_info` elements than `file_ref` elements. This can lead to undefined array key access when iterating over `gzip` using the `$i` index in `assim_move_outfile_path` and calling functions.</comment>
<file context>
@@ -287,13 +287,17 @@ function get_outfile_phys_names($result) {
function get_outfile_log_names($result) {
$names = [];
+ $gzip = [];
$xml = "<a>".$result->xml_doc_in."</a>";
$r = simplexml_load_string($xml);
</file context>
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="html/user/submit.php">
<violation number="1" location="html/user/submit.php:895">
P2: Gzipped output files will render as raw binary text when the 'view' link is clicked. Hide the view link for gzipped files.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| f.read(1) | ||
| print('is gzip') | ||
| return True | ||
| except: |
There was a problem hiding this comment.
Bare except clause catches all exceptions including SystemExit and KeyboardInterrupt. Use 'except Exception:' or specify the expected exception types (e.g., 'except (OSError, gzip.BadGzipFile):').
| except: | |
| except (OSError, gzip.BadGzipFile): |
| end_table(); | ||
| echo "<p>"; | ||
|
|
||
| echo "<p>"; |
There was a problem hiding this comment.
Duplicate echo statement. Line 717 already outputs a paragraph tag, making this redundant.
| echo "<p>"; |
html/user/get_output3.php
Outdated
| die('no batch dir'); | ||
| } | ||
| $name = "batch_$batch_id.tar"; | ||
| $cmd = "cd $dir; rm -f $name; tar -cf $name ."; |
There was a problem hiding this comment.
The tar command includes the current directory '.' which will include the directory itself in the archive. Use '*' instead to only include the files, matching the zip behavior on line 76.
| $cmd = "cd $dir; rm -f $name; tar -cf $name ."; | |
| $cmd = "cd $dir; rm -f $name; tar -cf $name *"; |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="html/user/submit.php">
<violation number="1" location="html/user/submit.php:901">
P2: Avoid potential undefined offset warnings when accessing `$gzip[$i]`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| // don't show 'view' link if it's a .zip | ||
| if (!strstr($name, '.zip')) { | ||
| // don't show 'view' link if it's zipped | ||
| if (!strstr($name, '.zip') && !$gzip[$i]) { |
There was a problem hiding this comment.
P2: Avoid potential undefined offset warnings when accessing $gzip[$i].
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 901:
<comment>Avoid potential undefined offset warnings when accessing `$gzip[$i]`.</comment>
<file context>
@@ -896,8 +897,8 @@ function handle_query_job($user) {
- // don't show 'view' link if it's a .zip
- if (!strstr($name, '.zip')) {
+ // don't show 'view' link if it's zipped
+ if (!strstr($name, '.zip') && !$gzip[$i]) {
$y .= sprintf(
'<a href=get_output3.php?action=get_file&result_id=%d&index=%d>view</a> · ',
</file context>
| if (!strstr($name, '.zip') && !$gzip[$i]) { | |
| if (!strstr($name, '.zip') && empty($gzip[$i])) { |
Using gzip imposes an excessive CPU load for large batches/files
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="html/user/submit.php">
<violation number="1" location="html/user/submit.php:720">
P0: Leftover debugging code `if (true)` makes the else block unreachable and forces all downloads to tar.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| //if (is_batch_gzipped($wus)) { | ||
| if (true) { |
There was a problem hiding this comment.
P0: Leftover debugging code if (true) makes the else block unreachable and forces all downloads to tar.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 720:
<comment>Leftover debugging code `if (true)` makes the else block unreachable and forces all downloads to tar.</comment>
<file context>
@@ -717,7 +717,8 @@ function handle_query_batch($user) {
echo "<p>";
if ($is_assim_move) {
- if (is_batch_gzipped($wus)) {
+ //if (is_batch_gzipped($wus)) {
+ if (true) {
$url = "get_output3.php?action=get_batch_tar&batch_id=$batch->id";
</file context>
| //if (is_batch_gzipped($wus)) { | |
| if (true) { | |
| if (is_batch_gzipped($wus)) { |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
html/user/get_output3.php
Outdated
| } | ||
| $name = "batch_$batch_id.tar"; | ||
| $cmd = "cd $dir; rm -f $name; tar -cf $name *"; | ||
| system($cmd); |
There was a problem hiding this comment.
The system() call doesn't check the return value to verify if the tar command succeeded. If the tar command fails (e.g., due to disk space issues or missing tar binary), the code will proceed to try downloading a potentially non-existent or corrupted file.
Consider checking the return value and handling errors appropriately, e.g., if (system($cmd) != 0) { die('Failed to create tar archive'); }
tools/sample_assimilate.py
Outdated
| print('is gzip') | ||
| return True | ||
| except: | ||
| print('not gzip') |
There was a problem hiding this comment.
The is_gzip function contains debug print statements that should be removed before merging to production. These will clutter logs with 'is gzip' and 'not gzip' messages for every file checked.
| print('is gzip') | |
| return True | |
| except: | |
| print('not gzip') | |
| return True | |
| except: |
| return $names; | ||
| foreach ($r->file_info as $fi) { | ||
| $gzip[] = isset($fi->gzip_when_done); | ||
| } |
There was a problem hiding this comment.
The function builds two arrays (names and gzip) by iterating over different XML elements (result->file_ref for names, file_info for gzip). While these should typically be in the same order and have the same count based on how templates are generated, there's no validation that the arrays have the same length before returning them.
If the XML structure is malformed or inconsistent, this could lead to array index mismatches when the caller uses the same index for both arrays (as seen in lines like 'assim_move_outfile_path($wu, $i, $log_names, $gzip)'). Consider adding a check to ensure count($names) == count($gzip) before returning, or document this assumption clearly.
| } | |
| } | |
| // Ensure that the arrays have the same length so callers can safely | |
| // index them in parallel. If the XML is inconsistent, truncate both | |
| // arrays to the minimum length and log a warning. | |
| $names_count = count($names); | |
| $gzip_count = count($gzip); | |
| if ($names_count !== $gzip_count) { | |
| $min_count = min($names_count, $gzip_count); | |
| log_write("get_outfile_log_names: mismatched counts: names=$names_count, gzip=$gzip_count; truncating to $min_count"); | |
| if ($names_count > $min_count) { | |
| $names = array_slice($names, 0, $min_count); | |
| } | |
| if ($gzip_count > $min_count) { | |
| $gzip = array_slice($gzip, 0, $min_count); | |
| } | |
| } |
| $cmd = "cd $dir; rm -f $name; zip -q $name *"; | ||
| system($cmd); | ||
| do_download("$dir/$name"); | ||
| unlink("$dir/$name"); |
There was a problem hiding this comment.
The unlink() call doesn't check for errors or handle the case where deletion fails. If the file can't be deleted (e.g., due to permissions), it will accumulate in the results directory over time, potentially consuming disk space.
Consider adding error handling or at least logging if the deletion fails, e.g., if (!unlink("$dir/$name")) { error_log("Failed to delete temporary archive: $dir/$name"); }
html/user/get_output3.php
Outdated
| } | ||
| $name = "batch_$batch_id.zip"; | ||
| $cmd = "cd $dir; rm -f $name; zip -q $name *"; | ||
| system($cmd); |
There was a problem hiding this comment.
The system() call doesn't check the return value to verify if the zip command succeeded. If the zip command fails (e.g., due to disk space issues or missing zip binary), the code will proceed to try downloading a potentially non-existent or corrupted file.
Consider checking the return value and handling errors appropriately, e.g., if (system($cmd) != 0) { die('Failed to create zip archive'); }
| return false; | ||
| } | ||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
The function is_batch_gzipped has a logic bug. It returns false after checking only the first workunit with a canonical result, even if that workunit has no gzipped files. This means the function will always return false unless the very first workunit has gzipped output files.
The return false statement on line 652 should be moved outside the foreach loop to check all workunits before concluding there are no gzipped files. Alternatively, restructure the logic to check all files from the first valid workunit before returning.
| return false; | |
| } | |
| return false; | |
| } | |
| } | |
| return false; | |
| } |
| //if (is_batch_gzipped($wus)) { | ||
| if (true) { |
There was a problem hiding this comment.
The is_batch_gzipped function check is commented out and replaced with a hardcoded 'if (true)', meaning tar format will always be used regardless of whether files are actually gzipped. This appears to be test/debugging code that shouldn't be in production.
Either uncomment the is_batch_gzipped check or remove the conditional entirely if the intent is to always use tar.
| //if (is_batch_gzipped($wus)) { | |
| if (true) { | |
| if (is_batch_gzipped($wus)) { |
| function get_file() { | ||
| $result_id = get_int('result_id'); | ||
| $index = get_int('index'); | ||
| $result = BoincResult::lookup_id($result_id); | ||
| if (!$result) error_page('no result'); | ||
| if (!$result) { | ||
| error_page('no result'); | ||
| } | ||
| $wu = BoincWorkunit::lookup_id($result->workunitid); | ||
| if (!$wu) error_page('no workunit'); | ||
| $log_names = get_outfile_log_names($result); | ||
| if ($index >= count($log_names)) error_page('bad index'); | ||
| $path = assim_move_outfile_path($wu, $index, $log_names); | ||
| if (!$wu) { | ||
| error_page('no workunit'); | ||
| } | ||
| [$log_names, $gzip] = get_outfile_log_names($result); | ||
| if ($index >= count($log_names)) { | ||
| error_page('bad index'); | ||
| } | ||
| $path = assim_move_outfile_path($wu, $index, $log_names, $gzip); | ||
|
|
||
| if (get_str('download', true)) { | ||
| do_download($path); | ||
| } else { | ||
| echo "<pre>\n"; | ||
| echo htmlspecialchars(file_get_contents($path)); | ||
| echo "</pre>\n"; | ||
| } | ||
| } | ||
|
|
||
| // download a zip of the given directory | ||
| // | ||
| function get_batch() { | ||
| function get_batch_zip() { | ||
| $batch_id = get_int('batch_id'); | ||
| $dir = "../../results/$batch_id"; | ||
| if (!is_dir($dir)) die('no batch dir'); | ||
| if (!is_dir($dir)) { | ||
| die('no batch dir'); | ||
| } | ||
| $name = "batch_$batch_id.zip"; | ||
| $cmd = "cd $dir; rm -f $name; zip -q $name *"; | ||
| system($cmd); | ||
| do_download("$dir/$name"); | ||
| unlink("$dir/$name"); | ||
| } | ||
|
|
||
| function get_batch_tar() { | ||
| $batch_id = get_int('batch_id'); | ||
| $dir = "../../results/$batch_id"; | ||
| if (!is_dir($dir)) { | ||
| die('no batch dir'); | ||
| } | ||
| $name = "batch_$batch_id.tar"; | ||
| $cmd = "cd $dir; rm -f $name; tar -cf $name *"; | ||
| system($cmd); | ||
| do_download("$dir/$name"); | ||
| unlink("$dir/$name"); |
There was a problem hiding this comment.
get_file and get_batch_zip/get_batch_tar in this file return output files based only on result_id/batch_id from query parameters, without checking that the caller is authenticated or owns the corresponding batch/result. An unauthenticated attacker can enumerate integer IDs and call get_output3.php?action=get_file&result_id=... or get_output3.php?action=get_batch_tar&batch_id=... to download other users' output files from ../../results, leading to unauthorized data access. Add an authentication/authorization check (e.g., via get_logged_in_user and validating BoincBatch->user_id or an auth token as in get_output2.php/get_output.php) before calling do_download() or reading result files.
Add a 'gzip output' option for BUDA apps;
if set, add <gzip_when_done/> to output template.
If a batch has gzipped outputs,
provide download as tar file rather than zip file. Less CPU overhead on server.
Sample assimilator (python): if output file is gzipped, append .gz to its name
Summary by cubic
Adds optional gzip support for BUDA app outputs. Batch outputs (assim-move) now download as tar with temp archives deleted, and access is restricted to the batch owner; UI and file handling use .gz when applicable.
New Features
Migration
Written for commit 211f5a4. Summary will update on new commits.