Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 68 additions & 22 deletions .github/workflows/apply_peribolos.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,35 @@
# Workflow to apply the Peribolos org configuration.
# Runs on push to main, daily schedule, or manual dispatch.
# Uses the complytime-bot GitHub App for authentication.
name: Apply Peribolos

on:
push:
branches:
- main
pull_request:
schedule:
# Daily at 05:30 UTC — results visible by EU morning (07:30 CEST / 06:30 CET)
- cron: '30 5 * * *'
workflow_dispatch:
inputs:
dry-run:
description: 'Run Peribolos without --confirm (dry-run mode)'
required: false
type: boolean
default: false

concurrency:
group: peribolos-apply
cancel-in-progress: false

jobs:
Apply-peribolos:
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
permissions:
contents: read
steps:
- name: Checkout complytime/.github repo
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

Expand All @@ -19,36 +40,61 @@ jobs:

- name: Copy peribolos.yaml
run: cp peribolos.yaml /tmp

- name: Checkout ghproxy and peribolos code
if: ${{ github.repository_owner == 'complytime' && github.event_name == 'push' && github.ref == 'refs/heads/main' }}

- name: Checkout peribolos code
if: >-
github.repository_owner == 'complytime' &&
github.event_name != 'pull_request'
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
repository: kubernetes-sigs/prow

- name: Build ghproxy
if: ${{ github.repository_owner == 'complytime' && github.event_name == 'push' && github.ref == 'refs/heads/main' }}
run: |
cd cmd/ghproxy
go mod tidy
go build -o ghproxy .
cp ghproxy /tmp

- name: Build peribolos
if: ${{ github.repository_owner == 'complytime' && github.event_name == 'push' && github.ref == 'refs/heads/main' }}
if: >-
github.repository_owner == 'complytime' &&
github.event_name != 'pull_request'
run: |
cd cmd/peribolos
go mod tidy
go build -o .
go build -o .
cp peribolos /tmp

- name: Generate GitHub App token
if: >-
github.repository_owner == 'complytime' &&
github.event_name != 'pull_request'
id: app-token
uses: actions/create-github-app-token@1b10c78c7865c340bc4f6099eb2f838309f1e8c3 # v3.1.1
with:
client-id: ${{ secrets.COMPLYTIME_BOT_CLIENT_ID }}
private-key: ${{ secrets.COMPLYTIME_BOT_PRIVATE_KEY }}
owner: complytime

- name: Apply peribolos.yaml
if: ${{ github.repository_owner == 'complytime' && github.event_name == 'push' && github.ref == 'refs/heads/main' }}
if: >-
github.repository_owner == 'complytime' &&
github.event_name != 'pull_request'
env:
APP_TOKEN: ${{ steps.app-token.outputs.token }}
DRY_RUN: ${{ github.event_name == 'workflow_dispatch' && inputs.dry-run }}
run: |
echo ${{ secrets.APP_ACCESS_TOKEN }} > auth.txt
/tmp/ghproxy --legacy-disable-disk-cache-partitions-by-auth-header=false --get-throttling-time-ms=300 --throttling-time-ms=900 --throttling-time-v4-ms=850 --throttling-max-delay-duration-seconds=45 --throttling-max-delay-duration-v4-seconds=110 --request-timeout=120 1>/dev/null 2>&1 &
pid=$!
jobs
/tmp/peribolos --config-path /tmp/peribolos.yaml --fix-org --fix-org-members --fix-teams --fix-team-members --fix-repos --fix-team-repos --min-admins 2 --github-token-path auth.txt --confirm 2>&1 | jq -r '[.severity, .time, .msg] | join(" | ")'
kill $pid
rm auth.txt
set -o pipefail

PERIBOLOS_ARGS=(
--config-path /tmp/peribolos.yaml
--fix-org
--fix-org-members
--fix-teams
--fix-team-members
--fix-repos
--fix-team-repos
--min-admins 2
--require-self=false
--github-token-path <(printf '%s' "$APP_TOKEN")
)

if [ "$DRY_RUN" != "true" ]; then
PERIBOLOS_ARGS+=(--confirm)
fi

/tmp/peribolos "${PERIBOLOS_ARGS[@]}" 2>&1 | jq -r '[.severity, .time, .msg] | join(" | ")'
127 changes: 127 additions & 0 deletions .github/workflows/drift_detection.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Workflow to detect drift between peribolos.yaml and actual GitHub org state.
# Runs weekly on Monday mornings. Opens or updates a GitHub issue when drift is detected.
name: Drift Detection

on:
schedule:
# Monday at 04:30 UTC — drift issues visible by EU morning (06:30 CEST / 05:30 CET),
# before daily reconciliation at 05:30 UTC
- cron: '30 4 * * 1'
workflow_dispatch:

jobs:
detect-drift:
if: github.repository_owner == 'complytime'
runs-on: ubuntu-latest
timeout-minutes: 20
permissions:
contents: read
issues: write
steps:
- name: Checkout complytime/.github repo
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

- name: Install Go
uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
with:
go-version-file: './go.mod'

- name: Checkout and build peribolos
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
repository: kubernetes-sigs/prow

- name: Build peribolos
run: |
cd cmd/peribolos
go mod tidy
go build -o .
cp peribolos /tmp

- name: Generate GitHub App token
id: app-token
uses: actions/create-github-app-token@1b10c78c7865c340bc4f6099eb2f838309f1e8c3 # v3.1.1
with:
client-id: ${{ secrets.COMPLYTIME_BOT_CLIENT_ID }}
private-key: ${{ secrets.COMPLYTIME_BOT_PRIVATE_KEY }}
owner: complytime

- name: Dump current org state
env:
APP_TOKEN: ${{ steps.app-token.outputs.token }}
run: |
set -o pipefail
/tmp/peribolos \
--config-path peribolos.yaml \
--require-self=false \
--github-token-path <(printf '%s' "$APP_TOKEN") \
--dump complytime \
--dump-full 2>/tmp/peribolos-dump-stderr.log | yq -P 'sort_keys(..)' > /tmp/org-actual.yaml
yq -P 'sort_keys(..)' peribolos.yaml > /tmp/org-expected.yaml
Comment thread
marcusburghardt marked this conversation as resolved.

- name: Compare org state
id: diff
run: |
if diff -u /tmp/org-expected.yaml /tmp/org-actual.yaml > /tmp/drift-diff.txt 2>&1; then
echo "drift=false" >> "$GITHUB_OUTPUT"
echo "No drift detected."
else
echo "drift=true" >> "$GITHUB_OUTPUT"
echo "Drift detected between peribolos.yaml and actual org state."
fi

- name: Check for existing drift issue
if: steps.diff.outputs.drift == 'true'
id: existing-issue
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
ISSUE_NUMBER=$(gh issue list \
--label peribolos-drift \
--state open \
--limit 1 \
--json number \
--jq '.[0].number // empty')
echo "issue_number=${ISSUE_NUMBER}" >> "$GITHUB_OUTPUT"

- name: Create or update drift issue
if: steps.diff.outputs.drift == 'true'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE_NUMBER: ${{ steps.existing-issue.outputs.issue_number }}
WORKFLOW_URL: "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
run: |
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

{
echo "## Peribolos Drift Detected"
echo ""
echo "**Date**: ${TIMESTAMP}"
echo "**Workflow run**: ${WORKFLOW_URL}"
echo ""
echo "The actual GitHub org state differs from what is declared in \`peribolos.yaml\`."
echo "This may indicate manual changes were made via the GitHub UI."
echo ""
echo "### Diff"
echo ""
echo '```diff'
cat /tmp/drift-diff.txt
echo '```'
echo ""
echo "### Recommended Action"
echo ""
echo "- Review the diff to determine if the changes are intentional"
echo "- If unintentional: trigger a manual Peribolos apply via \`workflow_dispatch\`"
echo "- If intentional: update \`peribolos.yaml\` to match the desired state"
} > /tmp/issue-body.md

if [ -n "$ISSUE_NUMBER" ]; then
gh issue edit "$ISSUE_NUMBER" --body-file /tmp/issue-body.md
echo "Updated existing issue #${ISSUE_NUMBER}"
else
gh issue create \
--title "Peribolos Drift Detected - $(date -u +%Y-%m-%d)" \
--body-file /tmp/issue-body.md \
--label peribolos-drift
echo "Created new drift issue"
fi
23 changes: 23 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,26 @@ go.work.sum
# Editor/IDE
.idea/
.vscode/

# Unbound Force — managed by uf init
# Runtime data under .uf/ (databases, caches, locks, logs)
.uf/workflows/
.uf/artifacts/
.uf/dewey/graph.db
.uf/dewey/graph.db-shm
.uf/dewey/graph.db-wal
.uf/dewey/*.lock
.uf/dewey/cache/
.uf/dewey/dewey.log
.uf/replicator/*.db
.uf/replicator/*.db-shm
.uf/replicator/*.db-wal
.uf/replicator/*.lock
.uf/muti-mind/artifacts/
.uf/mx-f/data/
# Legacy tool directories (renamed to .uf/ in Spec 025)
.dewey/
.hive/
.unbound-force/
.muti-mind/
.mx-f/
6 changes: 6 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -225,5 +225,11 @@ func TestOrgs(t *testing.T) {
if !isSorted(org.Members) {
t.Errorf("members for %s org are unsorted", *org.Name)
}

if org.Teams != nil {
for _, err := range testTeamMembers(org.Teams, admins, allOrgMembers, *org.Name) {
t.Errorf("%v", err)
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-07
95 changes: 95 additions & 0 deletions openspec/changes/fix-peribolos-implementation/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
## Context

The `complytime` GitHub organization uses Peribolos (a Prow CLI tool) to manage org settings, teams, memberships, and repository permissions as code via `peribolos.yaml`. The implementation lives in the `complytime/.github` repository.

Current state:
- Peribolos has been silently failing since April 16, 2026 due to an expired GitHub App user access token
- The workflow pipeline masks failures: `peribolos ... 2>&1 | jq ...` swallows the exit code
- Every run since April 16 reports `success` while Peribolos exits with `fatal: Configuration failed: status code 404`
- The `complytime-bot` GitHub App is installed with correct permissions (`organization_administration: write`, `members: write`, `administration: write`)
- `pme-bot` is a regular user account (not an app bot) whose expired token was stored in `secrets.APP_ACCESS_TOKEN`
- The `testTeamMembers()` function in `config/config_test.go` is defined but never called from any test
- Org admins (`jpower432`, `marcusburghardt`) are listed as `members:` instead of `maintainers:` in several teams

## Goals / Non-Goals

### Goals

- Restore Peribolos to a working state with reliable, self-renewing authentication
- Make Peribolos failures visible (fail the workflow when Peribolos fails)
- Enable on-demand manual reapplication of org settings
- Enable daily automatic reconciliation
- Detect and alert when org state drifts from the declared config
- Fix config and test issues that allow invalid configurations

### Non-Goals

- Managing branch protection rules (requires `branchprotector`, a separate Prow tool)
- Managing GitHub Actions permissions, webhooks, or repository rulesets
- Adding new repository settings beyond what is currently declared (keep it minimalist per user preference)
- Migrating away from Peribolos to a different tool

## Decisions

### D1: Use `actions/create-github-app-token` for authentication

**Decision**: Replace the static `APP_ACCESS_TOKEN` secret with per-run installation tokens generated by `actions/create-github-app-token@v3` (SHA-pinned per existing workflow conventions).

**Rationale**: The existing `complytime-bot` GitHub App already has all required permissions. Installation tokens are generated fresh per-run (1-hour TTL, auto-revoked after job), eliminating token expiry as a failure mode. This is the GitHub-recommended approach (official action, 800+ stars). The user access token approach requires manual regeneration every 8 hours and the device flow requires human interaction, making it unsuitable for CI.

**Alternative considered**: Fine-grained PAT — simpler but still requires manual rotation. Refresh token rotation in CI is fragile and creates security risks (workflow modifying its own secrets).

**Configuration**:
- `secrets.COMPLYTIME_BOT_CLIENT_ID` and `secrets.COMPLYTIME_BOT_PRIVATE_KEY` (already created)
- Token scoped to `owner: complytime` for org-wide access
- `skip-token-revoke: false` (default, auto-revoke after job)

**Migration plan**:
1. Deploy the updated workflow while `APP_ACCESS_TOKEN` still exists as a fallback reference
2. Run a manual `workflow_dispatch` dry-run to validate token generation works
3. After one successful push-triggered run on `main`, mark `APP_ACCESS_TOKEN` as deprecated
4. Remove `APP_ACCESS_TOKEN` after 3 successful runs

### D2: Add `--require-self=false` to Peribolos

**Decision**: Disable the `--require-self` check (which defaults to `true`).

**Rationale**: The `--require-self` flag calls `GET /user` to verify the authenticated user is an org admin. Installation tokens cannot call `GET /user` (it's a user-only endpoint). This is the only Peribolos endpoint incompatible with installation tokens. The safety check is replaced by `--min-admins 2` and the app's own permission constraints (the App can only perform actions within its granted permission scopes).

### D3: Fix pipeline exit code propagation

**Decision**: Add `set -o pipefail` to the shell step that runs Peribolos.

**Rationale**: Without `pipefail`, the exit code of `peribolos ... | jq ...` is jq's exit code (always 0), not Peribolos' exit code. This is the root cause of silent failures since April 16.

### D4: Remove ghproxy from the apply workflow

**Decision**: Remove the ghproxy sidecar process from the workflow.

**Rationale**: Peribolos is not configured to route through ghproxy (no `--github-endpoint=http://localhost:8888` flag), so ghproxy runs but is never used. The warning `"It doesn't look like you are using ghproxy"` confirms this. For a small org (~12 repos, ~20 members, 5 teams), API rate limiting is not a concern. Removing it simplifies the workflow.

### D5: Drift detection via `peribolos --dump`

**Decision**: Create a separate weekly scheduled workflow that runs `peribolos --dump complytime` to capture actual org state, then diffs against `peribolos.yaml`. Opens a GitHub issue when drift is detected.

**Rationale**: Even with daily reconciliation, there is value in explicitly detecting drift. Weekly frequency is chosen because daily reconciliation handles remediation; drift detection catches persistent or reconciliation-resistant drift. A separate workflow keeps concerns isolated from the apply workflow.

### D6: Trigger behavior matrix

The apply workflow supports multiple trigger types with different behaviors:

| Trigger | Token Gen | Peribolos Build | Apply (`--confirm`) | Dry-run only |
|---|---|---|---|---|
| `pull_request` | No | No | No | No (skip entirely) |
| `push` to `main` | Yes | Yes | Yes | No |
| `workflow_dispatch` (dry-run=false) | Yes | Yes | Yes | No |
| `workflow_dispatch` (dry-run=true) | Yes | Yes | No | Yes |
| `schedule` (daily cron) | Yes | Yes | Yes | No |

## Risks / Trade-offs

- **[`--require-self=false` removes a safety check]** → Mitigated by `--min-admins 2` flag, which prevents accidental admin removal. The app's installation permissions constrain what can be changed.
- **[Installation token 1-hour TTL]** → Peribolos runs complete in ~2 minutes for this org size. No risk of timeout.
- **[Drift detection may be noisy]** → The detection workflow only opens an issue, it does not auto-remediate. Org admins can triage and decide whether to reapply or update config.
- **[Removing ghproxy]** → If the org grows significantly, rate limiting could become relevant. ghproxy can be re-added with proper `--github-endpoint` configuration if needed.
- **[Upstream Prow dependency]** → The workflow builds Peribolos from source via `kubernetes-sigs/prow`. If the repo is unavailable or the build breaks, all runs fail. Go module caching mitigates transient outages.
Loading