Skip to content

BUDA: allow output files to be gzipped.#6876

Merged
AenBleidd merged 6 commits intomasterfrom
dpa_submit56
Feb 21, 2026
Merged

BUDA: allow output files to be gzipped.#6876
AenBleidd merged 6 commits intomasterfrom
dpa_submit56

Conversation

@davidpanderson
Copy link
Contributor

@davidpanderson davidpanderson commented Feb 20, 2026

Add a 'gzip output' option for BUDA apps;
if set, add <gzip_when_done/> to output template.

If a batch has gzipped outputs,
provide download as tar file rather than zip file. Less CPU overhead on server.

Sample assimilator (python): if output file is gzipped, append .gz to its name


Summary by cubic

Adds optional gzip support for BUDA app outputs. Batch outputs (assim-move) now download as tar with temp archives deleted, and access is restricted to the batch owner; UI and file handling use .gz when applicable.

  • New Features

    • App setting “Gzip output files?” adds <gzip_when_done/> via file_info_out.
    • Batch downloads (assim-move): tar archive (get_batch_tar), temp tar removed after download, and ownership is checked before serving files/archives.
    • Assim-move paths include .gz when outputs are gzipped; UI shows a tar button and hides “view” for gzipped files.
    • Sample assimilator detects gzip and appends .gz to output filenames.
  • Migration

    • get_output3.php action get_batch is replaced by get_batch_zip and get_batch_tar (UI uses get_batch_tar).
    • get_outfile_log_names now returns [names, gzip_flags]; update callers if used elsewhere.

Written for commit 211f5a4. Summary will update on new commits.

Add a 'gzip output' option for BUDA apps;
if set, add <gzip_when_done/> to output template.

If a batch has gzipped outputs,
provide download as tar file rather than zip file.
Less CPU overhead on server.

Sample assimilator (python): if output file is gzipped, append .gz to its name
Copilot AI review requested due to automatic review settings February 20, 2026 10:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for gzipped BUDA outputs by emitting <gzip_when_done/>, teaching the sample assimilator to rename gzipped files with a .gz suffix, and switching batch downloads to .tar when outputs are gzipped to reduce server CPU overhead.

Changes:

  • Add “gzip output” option to BUDA app submit UI and output template generation.
  • Extend outfile metadata parsing to include gzip flags and update output path generation accordingly.
  • Add tar/zip batch download variants and wire UI to choose based on whether outputs are gzipped.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tools/sample_assimilate.py Detect gzipped output content and append .gz when moving results.
html/user/submit.php Choose tar vs zip download for batches; pass gzip metadata into outfile path resolution.
html/user/get_output3.php Add separate actions for zip vs tar batch downloads; pass gzip metadata into file downloads.
html/user/get_output2.php Update callers for new get_outfile_log_names() return shape.
html/user/buda.php Add “Gzip output files?” option and emit <gzip_when_done/> into output templates.
html/inc/submit_util.inc Return gzip flags from get_outfile_log_names(), adjust path building and template output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 288 to 301
function get_outfile_log_names($result) {
$names = [];
$gzip = [];
$xml = "<a>".$result->xml_doc_in."</a>";
$r = simplexml_load_string($xml);
if (!$r) return $names;
foreach ($r->result->file_ref as $fr) {
$names[] = (string)($fr->open_name);
}
return $names;
foreach ($r->file_info as $fi) {
$gzip[] = isset($fi->gzip_when_done);
}
return [$names, $gzip];
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_outfile_log_names() now returns a 2-tuple [$names, $gzip], but on XML parse failure it returns only $names. Callers destructuring the return (e.g., [$log_names, $gzip] = ...) will error. Return [$names, $gzip] consistently on all paths (including parse failure).

Copilot uses AI. Check for mistakes.
Comment on lines +329 to +331
return sprintf('../../results/%d/%s__file_%s%s',
$wu->batch, $wu->name, $log_names[$index],
$gzip[$index]?'.gz':''
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing $gzip[$index] without checking it exists can raise notices and can incorrectly omit/append .gz when the gzip list length doesn’t match $log_names. Use an isset()/!empty() check (or default false) when reading $gzip[$index].

Suggested change
return sprintf('../../results/%d/%s__file_%s%s',
$wu->batch, $wu->name, $log_names[$index],
$gzip[$index]?'.gz':''
$gzip_flag = !empty($gzip[$index]);
return sprintf('../../results/%d/%s__file_%s%s',
$wu->batch, $wu->name, $log_names[$index],
$gzip_flag ? '.gz' : ''

Copilot uses AI. Check for mistakes.
$path = assim_move_outfile_path($wu, $i, $log_names);
if (file_exists($path)) {
$path = assim_move_outfile_path($wu, $i, $log_names, $gzip);
if (!file_exists($path)) {
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file-existence check is inverted: inside if (!file_exists($path)) the code calls filesize($path), which will emit warnings and is logically incorrect (it tries to show size for a missing file). Switch the condition back to if (file_exists($path)) (and handle the missing-file case separately if needed).

Suggested change
if (!file_exists($path)) {
if (file_exists($path)) {

Copilot uses AI. Check for mistakes.
Comment on lines +652 to +656
return false;
}
return false;
}

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_batch_gzipped() returns false after inspecting only the first eligible workunit (return false; at line 652 is inside the loop), so it can miss gzipped outputs in later WUs. Move the return false to after the outer foreach so the whole batch is checked.

Suggested change
return false;
}
return false;
}
}
return false;
}

Copilot uses AI. Check for mistakes.
Comment on lines +82 to 93
function get_batch_tar() {
$batch_id = get_int('batch_id');
$dir = "../../results/$batch_id";
if (!is_dir($dir)) {
die('no batch dir');
}
$name = "batch_$batch_id.tar";
$cmd = "cd $dir; rm -f $name; tar -cvf $name .";
system($cmd);
do_download("$dir/$name");
unlink("$dir/$name");
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tar -cvf will write verbose output to stdout, which can corrupt the HTTP response and/or prevent do_download() from sending clean headers/body. Drop the -v (or redirect tar output), and also exclude the archive itself (or create the tar outside the target dir) to avoid including batch_$batch_id.tar in the archive.

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +33
if os.path.getsize(path) == 0:
return False
try:
with gzip.open(path, 'rb') as f:
f.read(1)
print('is gzip')
return True
except:
print('not gzip')
return False
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper prints debug messages on every file and uses a bare except, which can hide real errors and pollute assimilator output. Remove the prints and catch specific exceptions (e.g., OSError, gzip.BadGzipFile) or use a cheap header check (first two bytes 0x1f, 0x8b) to avoid relying on exceptions for control flow.

Suggested change
if os.path.getsize(path) == 0:
return False
try:
with gzip.open(path, 'rb') as f:
f.read(1)
print('is gzip')
return True
except:
print('not gzip')
return False
# Return True if the file appears to be gzip-compressed, based on magic bytes.
if os.path.getsize(path) == 0:
return False
try:
with open(path, 'rb') as f:
magic = f.read(2)
except OSError:
# If the file can't be read, treat it as non-gzip.
return False
return magic == b'\x1f\x8b'

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found across 6 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="html/user/get_output3.php">

<violation number="1" location="html/user/get_output3.php:89">
P0: The `tar` command's verbose flag (`-v`) outputs file names to stdout, which corrupts the downloaded file and causes a "headers already sent" error.</violation>
</file>

<file name="html/user/submit.php">

<violation number="1" location="html/user/submit.php:648">
P1: Destructuring an empty array assigns null to $gzip, causing a TypeError in foreach.</violation>

<violation number="2" location="html/user/submit.php:890">
P1: Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().</violation>

<violation number="3" location="html/user/submit.php:895">
P0: Inverted file_exists check attempts to read filesize of missing files.</violation>

<violation number="4" location="html/user/submit.php:921">
P1: Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().</violation>
</file>

<file name="html/user/get_output2.php">

<violation number="1" location="html/user/get_output2.php:70">
P2: Destructuring a potentially empty array triggers an 'Undefined array key 0' warning and assigns `null` to `$log_names`.</violation>
</file>

<file name="html/inc/submit_util.inc">

<violation number="1" location="html/inc/submit_util.inc:290">
P2: The `get_outfile_log_names` function returns two separate arrays (`names` and `gzip`), but the array counts may not match if there are fewer `file_info` elements than `file_ref` elements. This can lead to undefined array key access when iterating over `gzip` using the `$i` index in `assim_move_outfile_path` and calling functions.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

if ($wu->canonical_resultid == 0) continue;
$result = BoincResult::lookup_id($wu->canonical_resultid);
if (!$result) return false;
[, $gzip] = get_outfile_log_names($result);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Destructuring an empty array assigns null to $gzip, causing a TypeError in foreach.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 648:

<comment>Destructuring an empty array assigns null to $gzip, causing a TypeError in foreach.</comment>

<file context>
@@ -638,6 +638,22 @@ function progress_bar($batch, $wus, $width) {
+        if ($wu->canonical_resultid == 0) continue;
+        $result = BoincResult::lookup_id($wu->canonical_resultid);
+        if (!$result) return false;
+        [, $gzip] = get_outfile_log_names($result);
+        foreach ($gzip as $flag) {
+            if ($flag) return true;
</file context>
Suggested change
[, $gzip] = get_outfile_log_names($result);
[, $gzip] = get_outfile_log_names($result) + [[], []];
Fix with Cubic

if ($result->server_state == RESULT_SERVER_STATE_OVER) {
$phys_names = get_outfile_phys_names($result);
$log_names = get_outfile_log_names($result);
[$log_names,] = get_outfile_log_names($result);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 921:

<comment>Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().</comment>

<file context>
@@ -894,7 +918,7 @@ function handle_query_job($user) {
             if ($result->server_state == RESULT_SERVER_STATE_OVER) {
                 $phys_names = get_outfile_phys_names($result);
-                $log_names = get_outfile_log_names($result);
+                [$log_names,] = get_outfile_log_names($result);
                 $nfiles = count($log_names);
                 for ($i=0; $i<$nfiles; $i++) {
</file context>
Suggested change
[$log_names,] = get_outfile_log_names($result);
[$log_names,] = get_outfile_log_names($result) + [[], []];
Fix with Cubic

if ($is_assim_move) {
if ($result->id == $wu->canonical_resultid) {
$log_names = get_outfile_log_names($result);
[$log_names, $gzip] = get_outfile_log_names($result);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 890:

<comment>Destructuring an empty array assigns null to $log_names, causing a Fatal TypeError in count().</comment>

<file context>
@@ -863,12 +887,12 @@ function handle_query_job($user) {
         if ($is_assim_move) {
             if ($result->id == $wu->canonical_resultid) {
-                $log_names = get_outfile_log_names($result);
+                [$log_names, $gzip] = get_outfile_log_names($result);
                 $nfiles = count($log_names);
                 for ($i=0; $i<$nfiles; $i++) {
</file context>
Suggested change
[$log_names, $gzip] = get_outfile_log_names($result);
[$log_names, $gzip] = get_outfile_log_names($result) + [[], []];
Fix with Cubic

function do_result_aux($result, $batch, $file_num=null) {
$phys_names = get_outfile_phys_names($result);
$log_names = get_outfile_log_names($result);
[$log_names,] = get_outfile_log_names($result);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Destructuring a potentially empty array triggers an 'Undefined array key 0' warning and assigns null to $log_names.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/get_output2.php, line 70:

<comment>Destructuring a potentially empty array triggers an 'Undefined array key 0' warning and assigns `null` to `$log_names`.</comment>

<file context>
@@ -67,7 +67,7 @@ function check_auth($auth, $batch) {
 function do_result_aux($result, $batch, $file_num=null) {
     $phys_names = get_outfile_phys_names($result);
-    $log_names = get_outfile_log_names($result);
+    [$log_names,] = get_outfile_log_names($result);
     if ($file_num !== null) {
         $path = upload_path($phys_names[$file_num]);
</file context>
Fix with Cubic

@@ -287,13 +287,17 @@ function get_outfile_phys_names($result) {

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The get_outfile_log_names function returns two separate arrays (names and gzip), but the array counts may not match if there are fewer file_info elements than file_ref elements. This can lead to undefined array key access when iterating over gzip using the $i index in assim_move_outfile_path and calling functions.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/inc/submit_util.inc, line 290:

<comment>The `get_outfile_log_names` function returns two separate arrays (`names` and `gzip`), but the array counts may not match if there are fewer `file_info` elements than `file_ref` elements. This can lead to undefined array key access when iterating over `gzip` using the `$i` index in `assim_move_outfile_path` and calling functions.</comment>

<file context>
@@ -287,13 +287,17 @@ function get_outfile_phys_names($result) {
 
 function get_outfile_log_names($result) {
     $names = [];
+    $gzip = [];
     $xml = "<a>".$result->xml_doc_in."</a>";
     $r = simplexml_load_string($xml);
</file context>
Fix with Cubic

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="html/user/submit.php">

<violation number="1" location="html/user/submit.php:895">
P2: Gzipped output files will render as raw binary text when the 'view' link is clicked. Hide the view link for gzipped files.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copilot AI review requested due to automatic review settings February 20, 2026 20:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

f.read(1)
print('is gzip')
return True
except:
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clause catches all exceptions including SystemExit and KeyboardInterrupt. Use 'except Exception:' or specify the expected exception types (e.g., 'except (OSError, gzip.BadGzipFile):').

Suggested change
except:
except (OSError, gzip.BadGzipFile):

Copilot uses AI. Check for mistakes.
end_table();
echo "<p>";

echo "<p>";
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate echo statement. Line 717 already outputs a paragraph tag, making this redundant.

Suggested change
echo "<p>";

Copilot uses AI. Check for mistakes.
die('no batch dir');
}
$name = "batch_$batch_id.tar";
$cmd = "cd $dir; rm -f $name; tar -cf $name .";
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tar command includes the current directory '.' which will include the directory itself in the archive. Use '*' instead to only include the files, matching the zip behavior on line 76.

Suggested change
$cmd = "cd $dir; rm -f $name; tar -cf $name .";
$cmd = "cd $dir; rm -f $name; tar -cf $name *";

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="html/user/submit.php">

<violation number="1" location="html/user/submit.php:901">
P2: Avoid potential undefined offset warnings when accessing `$gzip[$i]`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

// don't show 'view' link if it's a .zip
if (!strstr($name, '.zip')) {
// don't show 'view' link if it's zipped
if (!strstr($name, '.zip') && !$gzip[$i]) {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Avoid potential undefined offset warnings when accessing $gzip[$i].

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 901:

<comment>Avoid potential undefined offset warnings when accessing `$gzip[$i]`.</comment>

<file context>
@@ -896,8 +897,8 @@ function handle_query_job($user) {
-                        // don't show 'view' link if it's a .zip
-                        if (!strstr($name, '.zip')) {
+                        // don't show 'view' link if it's zipped
+                        if (!strstr($name, '.zip') && !$gzip[$i]) {
                             $y .= sprintf(
                                 '<a href=get_output3.php?action=get_file&result_id=%d&index=%d>view</a> &middot; ',
</file context>
Suggested change
if (!strstr($name, '.zip') && !$gzip[$i]) {
if (!strstr($name, '.zip') && empty($gzip[$i])) {
Fix with Cubic

Using gzip imposes an excessive CPU load for large batches/files
Copilot AI review requested due to automatic review settings February 20, 2026 21:42
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="html/user/submit.php">

<violation number="1" location="html/user/submit.php:720">
P0: Leftover debugging code `if (true)` makes the else block unreachable and forces all downloads to tar.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +720 to +721
//if (is_batch_gzipped($wus)) {
if (true) {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: Leftover debugging code if (true) makes the else block unreachable and forces all downloads to tar.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At html/user/submit.php, line 720:

<comment>Leftover debugging code `if (true)` makes the else block unreachable and forces all downloads to tar.</comment>

<file context>
@@ -717,7 +717,8 @@ function handle_query_batch($user) {
     echo "<p>";
     if ($is_assim_move) {
-        if (is_batch_gzipped($wus)) {
+        //if (is_batch_gzipped($wus)) {
+        if (true) {
             $url = "get_output3.php?action=get_batch_tar&batch_id=$batch->id";
</file context>
Suggested change
//if (is_batch_gzipped($wus)) {
if (true) {
if (is_batch_gzipped($wus)) {
Fix with Cubic

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
$name = "batch_$batch_id.tar";
$cmd = "cd $dir; rm -f $name; tar -cf $name *";
system($cmd);
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system() call doesn't check the return value to verify if the tar command succeeded. If the tar command fails (e.g., due to disk space issues or missing tar binary), the code will proceed to try downloading a potentially non-existent or corrupted file.

Consider checking the return value and handling errors appropriately, e.g., if (system($cmd) != 0) { die('Failed to create tar archive'); }

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +32
print('is gzip')
return True
except:
print('not gzip')
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_gzip function contains debug print statements that should be removed before merging to production. These will clutter logs with 'is gzip' and 'not gzip' messages for every file checked.

Suggested change
print('is gzip')
return True
except:
print('not gzip')
return True
except:

Copilot uses AI. Check for mistakes.
return $names;
foreach ($r->file_info as $fi) {
$gzip[] = isset($fi->gzip_when_done);
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function builds two arrays (names and gzip) by iterating over different XML elements (result->file_ref for names, file_info for gzip). While these should typically be in the same order and have the same count based on how templates are generated, there's no validation that the arrays have the same length before returning them.

If the XML structure is malformed or inconsistent, this could lead to array index mismatches when the caller uses the same index for both arrays (as seen in lines like 'assim_move_outfile_path($wu, $i, $log_names, $gzip)'). Consider adding a check to ensure count($names) == count($gzip) before returning, or document this assumption clearly.

Suggested change
}
}
// Ensure that the arrays have the same length so callers can safely
// index them in parallel. If the XML is inconsistent, truncate both
// arrays to the minimum length and log a warning.
$names_count = count($names);
$gzip_count = count($gzip);
if ($names_count !== $gzip_count) {
$min_count = min($names_count, $gzip_count);
log_write("get_outfile_log_names: mismatched counts: names=$names_count, gzip=$gzip_count; truncating to $min_count");
if ($names_count > $min_count) {
$names = array_slice($names, 0, $min_count);
}
if ($gzip_count > $min_count) {
$gzip = array_slice($gzip, 0, $min_count);
}
}

Copilot uses AI. Check for mistakes.
$cmd = "cd $dir; rm -f $name; zip -q $name *";
system($cmd);
do_download("$dir/$name");
unlink("$dir/$name");
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unlink() call doesn't check for errors or handle the case where deletion fails. If the file can't be deleted (e.g., due to permissions), it will accumulate in the results directory over time, potentially consuming disk space.

Consider adding error handling or at least logging if the deletion fails, e.g., if (!unlink("$dir/$name")) { error_log("Failed to delete temporary archive: $dir/$name"); }

Copilot uses AI. Check for mistakes.
}
$name = "batch_$batch_id.zip";
$cmd = "cd $dir; rm -f $name; zip -q $name *";
system($cmd);
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system() call doesn't check the return value to verify if the zip command succeeded. If the zip command fails (e.g., due to disk space issues or missing zip binary), the code will proceed to try downloading a potentially non-existent or corrupted file.

Consider checking the return value and handling errors appropriately, e.g., if (system($cmd) != 0) { die('Failed to create zip archive'); }

Copilot uses AI. Check for mistakes.
Comment on lines +652 to +656
return false;
}
return false;
}

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is_batch_gzipped has a logic bug. It returns false after checking only the first workunit with a canonical result, even if that workunit has no gzipped files. This means the function will always return false unless the very first workunit has gzipped output files.

The return false statement on line 652 should be moved outside the foreach loop to check all workunits before concluding there are no gzipped files. Alternatively, restructure the logic to check all files from the first valid workunit before returning.

Suggested change
return false;
}
return false;
}
}
return false;
}

Copilot uses AI. Check for mistakes.
Comment on lines +720 to +721
//if (is_batch_gzipped($wus)) {
if (true) {
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_batch_gzipped function check is commented out and replaced with a hardcoded 'if (true)', meaning tar format will always be used regardless of whether files are actually gzipped. This appears to be test/debugging code that shouldn't be in production.

Either uncomment the is_batch_gzipped check or remove the conditional entirely if the intent is to always use tar.

Suggested change
//if (is_batch_gzipped($wus)) {
if (true) {
if (is_batch_gzipped($wus)) {

Copilot uses AI. Check for mistakes.
Comment on lines 41 to +92
function get_file() {
$result_id = get_int('result_id');
$index = get_int('index');
$result = BoincResult::lookup_id($result_id);
if (!$result) error_page('no result');
if (!$result) {
error_page('no result');
}
$wu = BoincWorkunit::lookup_id($result->workunitid);
if (!$wu) error_page('no workunit');
$log_names = get_outfile_log_names($result);
if ($index >= count($log_names)) error_page('bad index');
$path = assim_move_outfile_path($wu, $index, $log_names);
if (!$wu) {
error_page('no workunit');
}
[$log_names, $gzip] = get_outfile_log_names($result);
if ($index >= count($log_names)) {
error_page('bad index');
}
$path = assim_move_outfile_path($wu, $index, $log_names, $gzip);

if (get_str('download', true)) {
do_download($path);
} else {
echo "<pre>\n";
echo htmlspecialchars(file_get_contents($path));
echo "</pre>\n";
}
}

// download a zip of the given directory
//
function get_batch() {
function get_batch_zip() {
$batch_id = get_int('batch_id');
$dir = "../../results/$batch_id";
if (!is_dir($dir)) die('no batch dir');
if (!is_dir($dir)) {
die('no batch dir');
}
$name = "batch_$batch_id.zip";
$cmd = "cd $dir; rm -f $name; zip -q $name *";
system($cmd);
do_download("$dir/$name");
unlink("$dir/$name");
}

function get_batch_tar() {
$batch_id = get_int('batch_id');
$dir = "../../results/$batch_id";
if (!is_dir($dir)) {
die('no batch dir');
}
$name = "batch_$batch_id.tar";
$cmd = "cd $dir; rm -f $name; tar -cf $name *";
system($cmd);
do_download("$dir/$name");
unlink("$dir/$name");
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_file and get_batch_zip/get_batch_tar in this file return output files based only on result_id/batch_id from query parameters, without checking that the caller is authenticated or owns the corresponding batch/result. An unauthenticated attacker can enumerate integer IDs and call get_output3.php?action=get_file&result_id=... or get_output3.php?action=get_batch_tar&batch_id=... to download other users' output files from ../../results, leading to unauthorized data access. Add an authentication/authorization check (e.g., via get_logged_in_user and validating BoincBatch->user_id or an auth token as in get_output2.php/get_output.php) before calling do_download() or reading result files.

Copilot uses AI. Check for mistakes.
@AenBleidd AenBleidd removed this from Planning Feb 21, 2026
@AenBleidd AenBleidd added this to Server Feb 21, 2026
@github-project-automation github-project-automation bot moved this to Backlog in Server Feb 21, 2026
@AenBleidd AenBleidd added this to the Server Release 1.6.2 milestone Feb 21, 2026
@AenBleidd AenBleidd merged commit a225d9f into master Feb 21, 2026
429 checks passed
@AenBleidd AenBleidd deleted the dpa_submit56 branch February 21, 2026 00:59
@github-project-automation github-project-automation bot moved this from Backlog to Done in Server Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants