Skip to content

NASA_VIIRSActiveFireFirms|changed logic to stream subprocesses stdout and stderr#2034

Merged
balit-raibot merged 25 commits into
datacommonsorg:masterfrom
balit-raibot:fix_huge_error_streaming_cloud
May 25, 2026
Merged

NASA_VIIRSActiveFireFirms|changed logic to stream subprocesses stdout and stderr#2034
balit-raibot merged 25 commits into
datacommonsorg:masterfrom
balit-raibot:fix_huge_error_streaming_cloud

Conversation

@balit-raibot
Copy link
Copy Markdown
Contributor

@balit-raibot balit-raibot commented May 24, 2026

This PR is to fix the below error that occurs in the cloud batch execution which can sometimes lead to pipeline halt even before the import can succeed (such as NASA_VIIRSActiveFireFirms)

textPayload: "Failed to flush logs after task task/nasa-viirsactivefi-bb54bcba-8c8d-42e80-group0-0/0/0: saw 1 errors; last: rpc error: code = InvalidArgument desc = Log entry with size 828.6K exceeds maximum size of 256.0K

The code change changes the logic to stream stdout and stderr in chunks instead of one go.

Changes:

  • logs to be streamed instead of rendering to Cloud in one go (leading to failure and pipeline halt)
  • test case fix to accommodate the logic change and thus successful build
  • added config change to ensure correct folder structure "fires/firms/events/"

@balit-raibot balit-raibot requested a review from ajaits May 24, 2026 12:35
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors subprocess logging by removing stdout and stderr from the primary process message and introducing a chunked logging helper to handle large outputs safely. Feedback suggests reducing the chunk size to 50,000 to ensure compatibility with Cloud Logging limits for multi-byte characters and incorporating the import name into log labels for better traceability. Additionally, it was noted that similar chunking should be applied to the venv creation logging in the _run_with_timeout function.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
Comment thread import-automation/executor/app/executor/import_executor.py Outdated
balit-raibot and others added 2 commits May 24, 2026 18:07
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies import_executor.py to log subprocess output in 50,000-character chunks to prevent log size errors. Review feedback points out that the current implementation causes redundant logging for some scripts and misses other subprocess execution paths, suggesting a centralized utility instead. There is also a recommendation to avoid adding chunking headers for small outputs to reduce log noise.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
Comment thread import-automation/executor/app/executor/import_executor.py Outdated
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a _stream_payload_in_chunks helper function to split large log entries into manageable chunks, preventing errors when logging subprocess output. The logging logic in _run_with_timeout and _log_process was updated to use this new helper. Review feedback identifies an unused variable and redundant logging calls in _run_with_timeout that should be removed to simplify the code and avoid duplicate logs.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a _stream_payload_in_chunks helper function to split large log entries into 50,000-character segments, preventing "Log entry too large" errors. The logging logic in _run_with_timeout, _construct_process_message, and _log_process has been updated to utilize this function for subprocess stdout and stderr. Feedback indicates that this may cause redundant logging for asynchronous processes that already perform line-by-line logging, and suggests evaluating if this duplication is necessary to minimize log noise.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a _stream_payload_in_chunks helper function to split large stdout and stderr payloads into smaller chunks, preventing 'Log entry too large' errors. The logic is integrated into _run_with_timeout and _log_process, with a new skip_stream_logging flag added to the latter. Review feedback highlights potential log redundancy where both functions are invoked in the same execution path. It is suggested to centralize the logging logic and verify that all callers of _log_process are updated to prevent duplicate log entries.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
Comment thread import-automation/executor/app/executor/import_executor.py Outdated
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a _stream_payload_in_chunks helper function to split large subprocess output into smaller log entries, preventing 'Log entry too large' errors. Related logging functions were updated to support this chunking and provide an option to skip stream logging. Feedback identifies that _construct_process_message still contains logic to include full streams, which could re-introduce the log size issue if misused, and suggests its removal.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to handle large subprocess logs by chunking stdout and stderr into smaller segments, preventing "Log entry too large" errors. It adds a _stream_payload_in_chunks helper function and a skip_stream_logging parameter to the logging process. Review feedback suggests removing a redundant default argument for cleaner code and refactoring the hardcoded chunk size into a constant or class constructor to improve maintainability.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
Comment thread import-automation/executor/app/executor/import_executor.py Outdated
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces log chunking to prevent "Log entry too large" errors by adding a _stream_payload_in_chunks helper and updating subprocess logging to handle large outputs in parts. It also refactors _construct_process_message to exclude stdout and stderr from the primary message. Feedback was provided regarding the use of a mutable default argument for the metrics parameter in _log_process, which could lead to shared state across function calls.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
Comment thread import-automation/executor/app/executor/import_executor.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements log chunking to prevent 'Log entry too large' errors by introducing a _stream_payload_in_chunks helper function. Subprocess stdout and stderr are now logged in chunks rather than being included in the main process message. Additionally, the PR fixes a mutable default argument issue in _log_process and updates the corresponding tests. I have no feedback to provide.

@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

@balit-raibot balit-raibot changed the title changed logic to stream subprocesses stdout and stderr NASA_VIIRSActiveFireFirms|changed logic to stream subprocesses stdout and stderr May 24, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a log chunking mechanism in the import executor to handle large subprocess outputs and avoid 'Log entry too large' errors. It also refactors subprocess logging to use this chunking and updates the corresponding tests. Regarding the changes in the event processing scripts, feedback was provided concerning the removal of the automatic 'events/' subdirectory in process_events.py. This modification potentially conflicts with repository practices that favor automatic directory management to prevent redundant nesting and ensure consistency across different pipelines.

Comment thread scripts/earthengine/process_events.py Outdated
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to chunk large subprocess logs to prevent "Log entry too large" errors and refactors logging in import_executor.py. It also updates the Earth Engine event processing to allow omitting the 'events' subdirectory and adjusts the fire events pipeline configuration. Feedback was provided to update the docstring for _log_process to include new parameters and to warn about potential redundant logging for asynchronous processes.

Comment thread import-automation/executor/app/executor/import_executor.py
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to split large subprocess logs into chunks, preventing 'Log entry too large' errors during import execution. It also updates the Earth Engine event processing script to allow omitting the 'events' subdirectory and adjusts the fire events pipeline configuration accordingly. Feedback was provided regarding redundant logging in asynchronous subprocesses, where the new chunked logging overlaps with existing line-by-line logs, suggesting the use of the skip_stream_logging flag to minimize log volume.

Comment thread import-automation/executor/app/executor/import_executor.py
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces chunked logging for subprocess outputs to prevent "Log entry too large" errors in Cloud Logging and updates the output directory structure for fire event pipelines. Review feedback highlights that removing real-time logging in _run_with_timeout_async creates a deadlock risk when reading from stdout and stderr pipes and reduces observability for long-running tasks. It is recommended to implement concurrent stream reading and to ensure that chunked logging does not become redundant if real-time logging is restored.

Comment thread import-automation/executor/app/executor/import_executor.py
Comment thread import-automation/executor/app/executor/import_executor.py
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces chunked logging for subprocess output to prevent "Log entry too large" errors and adds configuration options to omit event subdirectories in the Earth Engine and Fire events pipelines. However, the reviewer identified significant issues in the asynchronous process execution logic, including a potential deadlock when reading stdout and stderr sequentially, the failure to enforce timeouts, and the loss of real-time logging. A suggestion was provided to use process.communicate() and raise exceptions on failure to ensure robust execution and proper error reporting.

Comment thread import-automation/executor/app/executor/import_executor.py
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a log chunking mechanism to prevent 'Log entry too large' errors and updates the Earth Engine event processing and fire event pipeline configurations. Feedback highlights significant concerns regarding the switch to process.communicate(), which introduces memory risks and eliminates real-time log visibility. Additionally, it is recommended to optimize the log chunking helper to avoid high memory usage when decoding large byte payloads.

Comment thread import-automation/executor/app/executor/import_executor.py Outdated
Comment thread import-automation/executor/app/executor/import_executor.py
@balit-raibot
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements chunked logging for subprocess stdout and stderr to prevent 'Log entry too large' errors and introduces a configuration option to omit the 'events' subdirectory in output paths. Feedback suggests improving the log labeling logic to handle cases where the import name is empty, preventing confusing empty brackets in the logs.

Comment thread import-automation/executor/app/executor/import_executor.py
@balit-raibot balit-raibot requested a review from vish-cs May 25, 2026 05:18
@balit-raibot balit-raibot merged commit 0e12551 into datacommonsorg:master May 25, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants