Skip to content

Conversation

@titusfortner
Copy link
Member

@titusfortner titusfortner commented Jan 25, 2026

User description

free-disk-space runs on every Bazel run, and can take 1 -2 minutes.
Quick look shows we don't need it.

Looks like GitHub is transitioning.
If you get the Azure Region: eastus2, you end up with Remaining disk space: 88GB
If you get Azure Region: westus, you end up with: Remaining disk space: 16GB
We're only using 5-10GB, and everything I'm seeing shows that the lowest remaining space on any of our CI jobs is 10GB.

💥 What does this PR do?

Remove the free space step from workflow
Temporarily adding a slack alert to get a feel for whether anything is a legit problem.

Update: Added a disk space check at end of each job that fails if < 5GB remaining.
I know that's not ideal because it's not that commit's fault, but seems like the easiest way to alert us to a problem.

  • The cleanup script was added in Feb 2023 when runners had ~85GB total / ~17GB free
    All Some Runners now have ~145GB total / ~92GB free (According to the gh command that Claude ran)
  • Windows never had cleanup and hasn't had issues

🔧 Implementation Notes

Instead of proactive cleanup, we now report remaining disk space to #ci-disk-alerts after every bazel workflow job.
Running all tests to exercise it.

Update: Honestly, I just didn't want to deal with passing Slack permissions to everything.

💡 Additional Considerations

  • #ci-disk-alerts Slack channel created
  • If disk issues occur, we can investigate and re-add cleanup if needed
  • Update: Oh, and if we do add cleanup, just stick to major directory deletions best bang for the buck
  • I'll remove the slack alert if we're not remotely close to an issue.

🔄 Types of changes

  • Cleanup (formatting, renaming)

PR Type

Enhancement, Other


Description

  • Remove disk cleanup script that was taking 1-2 minutes per run

  • Add disk space monitoring with Slack alerts to #ci-disk-alerts channel

  • Implement color-coded thresholds (red <10GB, yellow <30GB, green ≥30GB)

  • Leverage improved runner disk availability (92GB free vs 17GB previously)


Diagram Walkthrough

flowchart LR
  A["Bazel Workflow"] -->|Previously| B["Free Disk Space Script<br/>1-2 minutes"]
  A -->|Now| C["Check Disk Space<br/>Report to Slack"]
  C --> D["Color-coded Alert<br/>Red/Yellow/Green"]
  D --> E["#ci-disk-alerts Channel"]
Loading

File Walkthrough

Relevant files
Cleanup
free-disk-space.sh
Remove disk cleanup script entirely                                           

scripts/github-actions/free-disk-space.sh

  • Entire script deleted (55 lines removed)
  • Script removed packages (dotnet, llvm, php, mongodb, mysql) and
    directories
  • No longer needed due to improved runner disk capacity (92GB free vs
    17GB)
+0/-55   
Configuration changes
bazel.yml
Replace cleanup with disk space monitoring                             

.github/workflows/bazel.yml

  • Removed Free space step that called free-disk-space.sh script
  • Added Check disk space step to calculate available disk space in GB
  • Added Report disk space step using Slack webhook for monitoring
  • Implemented color-coded thresholds: danger (<10GB), warning (<30GB),
    good (≥30GB)
  • Reports disk status after every bazel workflow job
+19/-3   

@selenium-ci selenium-ci added the B-build Includes scripting, bazel and CI integrations label Jan 25, 2026
@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 25, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Third-party action secret

Description: The workflow introduces a third-party GitHub Action (rtCamp/action-slack-notify@v2) that
receives the sensitive secret secrets.SLACK_WEBHOOK_URL via environment variable, creating
a realistic supply-chain/secret-exfiltration risk if that action (or its tag) is
compromised or if the workflow can be triggered in a context where secrets are exposed
(e.g., certain PR scenarios).
bazel.yml [277-287]

Referred Code
- name: Report disk space
  if: always()
  uses: rtCamp/action-slack-notify@v2
  env:
    SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
    SLACK_CHANNEL: ci-disk-alerts
    SLACK_USERNAME: Disk Monitor
    SLACK_TITLE: "${{ steps.disk.outputs.avail }}GB remaining"
    SLACK_MESSAGE: "${{ inputs.name }} on ${{ inputs.os }}"
    SLACK_COLOR: ${{ steps.disk.outputs.avail < 10 && 'danger' || (steps.disk.outputs.avail < 30 && 'warning' || 'good') }}
    SLACK_ICON_EMOJI: ":floppy_disk:"
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing failure handling: The new disk space check assumes df -BG / works and returns a parseable value without
handling errors or empty/non-numeric output, which may break reporting on non-Linux
runners or unexpected environments.

Referred Code
- name: Check disk space
  if: always()
  shell: bash
  id: disk
  run: |
    avail=$(df -BG / | awk 'NR==2 {gsub(/G/,"",$4); print $4}')
    echo "Remaining disk space: ${avail}GB"
    echo "avail=$avail" >> "$GITHUB_OUTPUT"

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 25, 2026

PR Code Suggestions ✨

Latest suggestions up to 6dd40a4

CategorySuggestion                                                                                                                                    Impact
Security
Limit secret exposure in CI

Instead of using secrets: inherit, explicitly pass only the SLACK_WEBHOOK_URL
secret to the reusable workflow to minimize secret exposure and improve
security.

.github/workflows/ci-python.yml [10-18]

 jobs:
   build:
     name: Build
     uses: ./.github/workflows/bazel.yml
-    secrets: inherit
+    secrets:
+      SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
     with:
       name: Build
       run: |
         bazel build //py:selenium-wheel //py:selenium-sdist

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a security risk by using secrets: inherit and proposes adhering to the principle of least privilege by explicitly passing only the required SLACK_WEBHOOK_URL secret, which is a significant security improvement.

Medium
Possible issue
Check correct filesystem space
Suggestion Impact:The workflow’s disk space check was updated from `df -k /` to `df -k "$GITHUB_WORKSPACE"`, aligning the measurement with the workspace filesystem. The commit did not include the suggested fallback default (`${GITHUB_WORKSPACE:-.}`), so it’s a partial (non-exact) implementation.

code diff:

-          avail=$(df -k / | awk 'NR==2 {printf "%.0f", $4/1024/1024}')
+          avail=$(df -k "$GITHUB_WORKSPACE" | awk 'NR==2 {printf "%.0f", $4/1024/1024}')

Modify the disk space check to use df on $GITHUB_WORKSPACE instead of / to
ensure the measurement is for the correct filesystem where the job workspace
resides.

.github/workflows/bazel.yml [269-276]

 - name: Check disk space
   if: always()
   shell: bash
   id: disk
   run: |
-    avail=$(df -k / | awk 'NR==2 {printf "%.0f", $4/1024/1024}')
+    target="${GITHUB_WORKSPACE:-.}"
+    avail=$(df -k "$target" | awk 'NR==2 {printf "%.0f", $4/1024/1024}')
     echo "Remaining disk space: ${avail}GB"
     echo "avail=$avail" >> "$GITHUB_OUTPUT"

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that hardcoding / for the disk space check might be inaccurate on some runners; using $GITHUB_WORKSPACE makes the check more robust and reliable.

Medium
  • Update

Previous suggestions

✅ Suggestions up to commit ab1bc64
CategorySuggestion                                                                                                                                    Impact
General
Skip disk check on Windows

Add a condition to the Check disk space step to prevent it from running on
Windows, as the df command is not available on that OS.

.github/workflows/bazel.yml [269-270]

 - name: Check disk space
-  if: always()
+  if: inputs.os != 'windows'
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the df command will fail on Windows runners and provides a simple, correct fix to prevent this workflow failure.

Medium
Move color logic into bash
Suggestion Impact:The commit removed the complex YAML SLACK_COLOR conditional expression (simplifying the workflow), but it did not implement the suggested approach of computing/outputting `color` in the bash step. Instead, it hardcoded `SLACK_COLOR: danger` and gated the Slack step with an `if` condition on available disk space.

code diff:

-      - name: Report disk space
-        if: always()
+      - name: Warn low disk space
+        if: always() && fromJSON(steps.disk.outputs.avail) < 5
+        continue-on-error: true
         uses: rtCamp/action-slack-notify@v2
         env:
           SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
           SLACK_CHANNEL: ci-disk-alerts
           SLACK_USERNAME: Disk Monitor
-          SLACK_TITLE: "${{ steps.disk.outputs.avail }}GB remaining"
-          SLACK_MESSAGE: "${{ inputs.name }} on ${{ inputs.os }}"
-          SLACK_COLOR: ${{ steps.disk.outputs.avail < 10 && 'danger' || (steps.disk.outputs.avail < 30 && 'warning' || 'good') }}
+          SLACK_TITLE: "Low disk: ${{ steps.disk.outputs.avail }}GB remaining"
+          SLACK_MESSAGE: "${{ inputs.name }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
+          SLACK_COLOR: danger
           SLACK_ICON_EMOJI: ":floppy_disk:"

Refactor the workflow by moving the SLACK_COLOR conditional logic from the YAML
into the Check disk space bash script and outputting the color value.

.github/workflows/bazel.yml [276-286]

 echo "avail=$avail" >> "$GITHUB_OUTPUT"
+if [ "$avail" -lt 10 ]; then
+  color="danger"
+elif [ "$avail" -lt 30 ]; then
+  color="warning"
+else
+  color="good"
+fi
+echo "color=$color" >> "$GITHUB_OUTPUT"
 ...
-SLACK_COLOR: ${{ steps.disk.outputs.avail < 10 && 'danger' || (steps.disk.outputs.avail < 30 && 'warning' || 'good') }}
+SLACK_COLOR: ${{ steps.disk.outputs.color }}
Suggestion importance[1-10]: 5

__

Why: This suggestion improves code readability and maintainability by moving complex conditional logic from the YAML file into the bash script, which is a better place for it.

Low
High-level
Only send alerts on low disk
Suggestion Impact:The workflow Slack notification step was changed from always running to running only when available disk is below a low threshold, reducing alert noise. The step was also renamed and simplified to a single "danger" alert.

code diff:

-      - name: Report disk space
-        if: always()
+      - name: Warn low disk space
+        if: always() && fromJSON(steps.disk.outputs.avail) < 5
+        continue-on-error: true
         uses: rtCamp/action-slack-notify@v2
         env:
           SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
           SLACK_CHANNEL: ci-disk-alerts
           SLACK_USERNAME: Disk Monitor
-          SLACK_TITLE: "${{ steps.disk.outputs.avail }}GB remaining"
-          SLACK_MESSAGE: "${{ inputs.name }} on ${{ inputs.os }}"
-          SLACK_COLOR: ${{ steps.disk.outputs.avail < 10 && 'danger' || (steps.disk.outputs.avail < 30 && 'warning' || 'good') }}
+          SLACK_TITLE: "Low disk: ${{ steps.disk.outputs.avail }}GB remaining"
+          SLACK_MESSAGE: "${{ inputs.name }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
+          SLACK_COLOR: danger

The current implementation sends a Slack notification on every CI run. To reduce
noise, it should be modified to only send alerts when disk space is low (e.g.,
below the 'warning' threshold).

Examples:

.github/workflows/bazel.yml [277-287]
      - name: Report disk space
        if: always()
        uses: rtCamp/action-slack-notify@v2
        env:
          SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
          SLACK_CHANNEL: ci-disk-alerts
          SLACK_USERNAME: Disk Monitor
          SLACK_TITLE: "${{ steps.disk.outputs.avail }}GB remaining"
          SLACK_MESSAGE: "${{ inputs.name }} on ${{ inputs.os }}"
          SLACK_COLOR: ${{ steps.disk.outputs.avail < 10 && 'danger' || (steps.disk.outputs.avail < 30 && 'warning' || 'good') }}

 ... (clipped 1 lines)

Solution Walkthrough:

Before:

- name: Report disk space
  if: always()
  uses: rtCamp/action-slack-notify@v2
  env:
    SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
    SLACK_CHANNEL: ci-disk-alerts
    SLACK_TITLE: "${{ steps.disk.outputs.avail }}GB remaining"
    SLACK_MESSAGE: "${{ inputs.name }} on ${{ inputs.os }}"
    SLACK_COLOR: ${{ steps.disk.outputs.avail < 10 && 'danger' || (steps.disk.outputs.avail < 30 && 'warning' || 'good') }}
    ...

After:

- name: Report disk space
  if: always() && steps.disk.outputs.avail < 30
  uses: rtCamp/action-slack-notify@v2
  env:
    SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
    SLACK_CHANNEL: ci-disk-alerts
    SLACK_TITLE: "${{ steps.disk.outputs.avail }}GB remaining"
    SLACK_MESSAGE: "${{ inputs.name }} on ${{ inputs.os }}"
    SLACK_COLOR: ${{ steps.disk.outputs.avail < 10 && 'danger' || 'warning' }}
    ...
Suggestion importance[1-10]: 7

__

Why: This is a valid design improvement that addresses the high potential for noise in the new alerting system, suggesting a shift from constant reporting to conditional alerting, which is a best practice.

Medium
Learned
best practice
Gate Slack notify on inputs
Suggestion Impact:The commit did not add the suggested `if: always() && ...` guards, but it addressed the same failure concern by (1) declaring `SLACK_WEBHOOK_URL` as an optional secret for the workflow and (2) making the Slack reporting step `continue-on-error: true` so missing/invalid Slack configuration won't fail the job.

code diff:

+    secrets:
+      SLACK_WEBHOOK_URL:
+        required: false
 
 jobs:
   bazel:
@@ -271,11 +274,16 @@
         shell: bash
         id: disk
         run: |
-          avail=$(df -BG / | awk 'NR==2 {gsub(/G/,"",$4); print $4}')
+          case "$RUNNER_OS" in
+            Windows) avail=$(powershell -Command "(Get-PSDrive C).Free / 1GB" | xargs printf '%.0f') ;;
+            macOS)   avail=$(df -g / | awk 'NR==2 {print $4}') ;;
+            Linux)   avail=$(df -BG / | awk 'NR==2 {gsub(/G/,"",$4); print $4}') ;;
+          esac
           echo "Remaining disk space: ${avail}GB"
           echo "avail=$avail" >> "$GITHUB_OUTPUT"
       - name: Report disk space
         if: always()
+        continue-on-error: true
         uses: rtCamp/action-slack-notify@v2
         env:
           SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}

Only run Slack reporting when the webhook secret exists and the disk step
produced a valid output, otherwise this can fail the workflow or evaluate
invalid comparisons.

.github/workflows/bazel.yml [277-287]

 - name: Report disk space
-  if: always()
+  if: always() && secrets.SLACK_WEBHOOK_URL != '' && steps.disk.outputs.avail != ''
   uses: rtCamp/action-slack-notify@v2
   env:
     SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
     SLACK_CHANNEL: ci-disk-alerts
     SLACK_USERNAME: Disk Monitor
     SLACK_TITLE: "${{ steps.disk.outputs.avail }}GB remaining"
     SLACK_MESSAGE: "${{ inputs.name }} on ${{ inputs.os }}"
     SLACK_COLOR: ${{ steps.disk.outputs.avail < 10 && 'danger' || (steps.disk.outputs.avail < 30 && 'warning' || 'good') }}
     SLACK_ICON_EMOJI: ":floppy_disk:"

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Add explicit validation/availability guards at integration boundaries (secrets/webhooks and dependent step outputs) before using them.

Low

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the disk cleanup script (free-disk-space.sh) that was taking 1-2 minutes per workflow run and replaces it with monitoring that reports disk space usage to a Slack channel. The change is motivated by GitHub-hosted runners now having significantly more disk space available (92GB free vs 17GB previously).

Changes:

  • Deleted the free-disk-space.sh script that cleaned up packages and directories
  • Replaced the cleanup step with disk space monitoring that checks available space after each job
  • Added Slack notifications to the #ci-disk-alerts channel with color-coded alerts based on thresholds

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
scripts/github-actions/free-disk-space.sh Complete removal of the disk cleanup script (55 lines deleted)
.github/workflows/bazel.yml Removed "Free space" step and added "Check disk space" and "Report disk space" steps with Slack integration

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

@titusfortner titusfortner merged commit 2220770 into trunk Jan 25, 2026
64 of 65 checks passed
@titusfortner titusfortner deleted the remove_disk_cleanup branch January 25, 2026 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations Review effort 2/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants