Skip to content

Automated gh-pages History Cleanup - Rolling 1-Week Window (or 1 Backup) #658

@mmcky

Description

@mmcky

Automated gh-pages History Cleanup - Rolling 1-Week Window (or 1 Backup)

📋 Overview

This issue documents how to implement an aggressive rolling 1-week cleanup of the gh-pages branch to minimize repository size while maintaining recent deployment history.

Current State (as of October 27, 2025)

  • Total commits in gh-pages: 194
  • Repository size: ~4.2 GB
  • Date range: December 8, 2020 → October 22, 2025

Proposed State

  • Keep: Last 7 days of deployments (~1-5 commits typically)
  • OR: One backup commit (if no deployments in the last 7 days)
  • Total to keep: Usually 1-5 commits
  • Remove: All other commits (~189-193 commits)
  • Expected size: ~200-300 MB (93-95% reduction)
  • Automation: Weekly GitHub Action to maintain rolling window

🎯 Benefits

  1. Minimal repository size: 93-95% reduction (~3.9 GB savings)
  2. Ultra-fast clones: ~30 seconds instead of 5-10 minutes
  3. Essential history only: Last week's commits OR 1 backup (never both)
  4. Automated maintenance: Set-and-forget weekly cleanup
  5. No manual intervention: GitHub Actions handles everything
  6. Sustainable: Repository stays small indefinitely

🛠️ Part 1: Initial Manual Cleanup

Current Analysis (1-week window)

Last 7 days of deployments (as of Oct 27, 2025):
- Commits in last week: 1
  - d86db35bb (2025-10-22 05:01:54)

Result: Keep 1 commit from last week
No backup commit needed (we have recent activity)

Total commits to keep: 1
Total commits to remove: 193 (99.5%)

Strategy:

  • IF commits exist in the last 7 days → Keep ALL those commits (no backup needed)
  • IF NO commits in the last 7 days → Keep ONE backup commit (the most recent one)

This ensures you always have recent deployments when there's activity, but maintain at least one commit for rollback when there's no recent activity.

Step-by-Step Manual Cleanup

# ============================================
# STEP 1: BACKUP (CRITICAL!)
# ============================================

cd /path/to/lecture-python.myst

# Create local backup branch
git branch gh-pages-backup origin/gh-pages

# Optional: Full repository backup
cd ..
cp -r lecture-python.myst lecture-python.myst-backup-$(date +%Y%m%d)
cd lecture-python.myst

# ============================================
# STEP 2: Identify commits to keep
# ============================================

# Find commits from the last 7 days
WEEK_AGO=$(date -d "7 days ago" +%Y-%m-%d)
echo "Keeping commits since: ${WEEK_AGO}"

# Get the first commit in the last week
FIRST_RECENT=$(git log origin/gh-pages --format="%H" --since="7 days ago" --reverse | head -1)

if [ -z "${FIRST_RECENT}" ]; then
  echo "No commits in the last 7 days! Keeping one backup commit."
  BACKUP_COMMIT=$(git log origin/gh-pages --format="%H" -1)
  TOTAL_KEEP=1
  RECENT_COUNT=0
else
  # We have commits in the past week, no backup needed
  BACKUP_COMMIT=""
  RECENT_COUNT=$(git rev-list --count ${FIRST_RECENT}..origin/gh-pages)
  TOTAL_KEEP=$((RECENT_COUNT + 1))  # +1 for first_recent itself
  
  echo "Commits from last week: ${TOTAL_KEEP}"
  echo "First commit in range: ${FIRST_RECENT}"
  git log --oneline -1 ${FIRST_RECENT}
fi

if [ -n "${BACKUP_COMMIT}" ]; then
  echo "Backup commit (no recent activity):"
  git log --oneline -1 ${BACKUP_COMMIT}
fi

echo "Total commits to keep: ${TOTAL_KEEP}"

# ============================================
# STEP 3: Create new gh-pages with recent history
# ============================================

# Fetch latest
git fetch origin gh-pages

# Checkout gh-pages
git checkout gh-pages

# Create new orphan branch
git checkout --orphan gh-pages-temp
git rm -rf .

if [ -z "${FIRST_RECENT}" ]; then
  # No commits in last week, keep one backup
  git cherry-pick ${BACKUP_COMMIT}
else
  # Keep all commits from last week (includes first_recent)
  git cherry-pick ${FIRST_RECENT}..gh-pages
  # Also include the first_recent commit itself
  PREV_COMMIT=$(git log origin/gh-pages --format="%H" ${FIRST_RECENT}^ -1)
  if [ -n "${PREV_COMMIT}" ]; then
    git reset --hard ${PREV_COMMIT}
    git cherry-pick ${FIRST_RECENT}..origin/gh-pages
  else
    # FIRST_RECENT is the root commit
    git cherry-pick ${FIRST_RECENT}
  fi
fi

# ============================================
# STEP 4: Verify the new branch
# ============================================

# Check commit count
NEW_COUNT=$(git rev-list --count gh-pages-temp)
echo "New commit count: ${NEW_COUNT}"
echo "Expected: ${TOTAL_KEEP}"

# Check date range
echo "Commits kept:"
git log gh-pages-temp --format="%h %ai %s"

# Verify file contents match current gh-pages
git diff gh-pages gh-pages-temp
# Should show no differences in files (only history changes)

# ============================================
# STEP 5: Replace old gh-pages
# ============================================

# Delete old gh-pages
git branch -D gh-pages

# Rename new branch
git branch -m gh-pages-temp gh-pages

# ============================================
# STEP 6: Force push (REWRITES HISTORY!)
# ============================================

# IMPORTANT: This requires coordination with team
# Push to GitHub
git push origin gh-pages --force

# ============================================
# STEP 7: Return to main and cleanup
# ============================================

git checkout main

# Aggressive garbage collection
git reflog expire --expire=now --all
git gc --aggressive --prune=now

# ============================================
# STEP 8: Verify results
# ============================================

# Check repository size
du -sh .git
echo "Expected: ~200-300 MB (after removing backup branch)"

# Check gh-pages commit count
git rev-list --count origin/gh-pages
echo "Expected: ${TOTAL_KEEP} commits"

# Verify website still works
echo "Visit: https://python.quantecon.org"

# ============================================
# STEP 9: Delete backup after verification (1-2 weeks)
# ============================================

# After confirming everything works:
git branch -D gh-pages-backup
git reflog expire --expire=now --all
git gc --aggressive --prune=now

# Final size check
du -sh .git

🤖 Part 2: Automated Weekly Cleanup

Create GitHub Actions Workflow

Create .github/workflows/cleanup-gh-pages-history.yml:

name: Cleanup gh-pages History (Keep 1 Week + 1 Backup)

on:
  schedule:
    # Run weekly on Sunday at 00:00 UTC
    - cron: '0 0 * * 0'
  
  workflow_dispatch:
    # Allow manual triggering
    inputs:
      days_to_keep:
        description: 'Number of days of history to keep'
        required: false
        default: '7'

jobs:
  cleanup-gh-pages:
    runs-on: ubuntu-latest
    
    # Only run on the main repository, not forks
    if: github.repository == 'QuantEcon/lecture-python.myst'
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history needed
          token: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Configure Git
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
      
      - name: Fetch gh-pages branch
        run: |
          git fetch origin gh-pages:gh-pages
      
      - name: Find commits to keep
        id: commits
        run: |
          git checkout gh-pages
          
          DAYS="${{ github.event.inputs.days_to_keep || '7' }}"
          echo "Retention period: ${DAYS} days"
          
          # Find first commit within retention period
          FIRST_RECENT=$(git log --format="%H" --since="${DAYS} days ago" --reverse | head -1)
          
          if [ -z "${FIRST_RECENT}" ]; then
            echo "No commits in last ${DAYS} days. Keeping one backup commit."
            BACKUP_COMMIT=$(git log --format="%H" -1)
            TOTAL_KEEP=1
            RECENT_COUNT=0
            
            echo "backup_commit=${BACKUP_COMMIT}" >> $GITHUB_OUTPUT
            echo "first_recent=" >> $GITHUB_OUTPUT
            echo "recent_count=0" >> $GITHUB_OUTPUT
          else
            # We have commits in the past week, no backup needed
            RECENT_COUNT=$(git rev-list --count ${FIRST_RECENT}..gh-pages)
            TOTAL_KEEP=$((RECENT_COUNT + 1))  # +1 for first_recent itself
            
            echo "Commits from last ${DAYS} days: ${TOTAL_KEEP}"
            echo "No backup commit needed (have recent activity)"
            
            echo "backup_commit=" >> $GITHUB_OUTPUT
            echo "first_recent=${FIRST_RECENT}" >> $GITHUB_OUTPUT
            echo "recent_count=${RECENT_COUNT}" >> $GITHUB_OUTPUT
          fi
          
          echo "total_keep=${TOTAL_KEEP}" >> $GITHUB_OUTPUT
          echo "Total commits to keep: ${TOTAL_KEEP}"
          
          # Show what we're keeping
          echo "=== Commits to keep ==="
          if [ -z "${FIRST_RECENT}" ]; then
            echo "Backup commit (no recent activity):"
            git log --oneline -1 ${BACKUP_COMMIT}
          else
            echo "Recent commits (last ${DAYS} days):"
            git log --oneline ${FIRST_RECENT}..gh-pages
            git log --oneline -1 ${FIRST_RECENT}
          fi
          
          # Only proceed if we're removing at least 5 commits
          CURRENT_COUNT=$(git rev-list --count gh-pages)
          REMOVE_COUNT=$((CURRENT_COUNT - TOTAL_KEEP))
          
          echo "Total commits: ${CURRENT_COUNT}"
          echo "Removing: ${REMOVE_COUNT}"
          
          if [ ${REMOVE_COUNT} -lt 5 ]; then
            echo "Not enough commits to warrant cleanup (would only remove ${REMOVE_COUNT})"
            echo "skip=true" >> $GITHUB_OUTPUT
          else
            echo "skip=false" >> $GITHUB_OUTPUT
          fi
      
      - name: Create cleaned gh-pages branch
        if: steps.commits.outputs.skip != 'true'
        run: |
          FIRST_RECENT="${{ steps.commits.outputs.first_recent }}"
          BACKUP_COMMIT="${{ steps.commits.outputs.backup_commit }}"
          
          # Create orphan branch
          git checkout --orphan gh-pages-clean
          git rm -rf .
          
          if [ -z "${FIRST_RECENT}" ]; then
            # No commits in last week, keep one backup
            echo "Keeping backup commit only"
            git cherry-pick ${BACKUP_COMMIT} || {
              echo "Failed to cherry-pick backup, using alternative"
              git cherry-pick --abort
              git read-tree ${BACKUP_COMMIT}
              git commit -m "Backup deployment: $(git log -1 --format='%ai' ${BACKUP_COMMIT})"
            }
          else
            # Keep all commits from last week
            echo "Keeping commits from last 7 days"
            
            # Get parent of first_recent to start cherry-pick range
            PREV_COMMIT=$(git log origin/gh-pages --format="%H" ${FIRST_RECENT}^ -1)
            
            if [ -n "${PREV_COMMIT}" ]; then
              # Cherry-pick from first_recent to HEAD (includes first_recent)
              git cherry-pick ${FIRST_RECENT}..gh-pages || {
                echo "Cherry-pick failed, using alternative method"
                git cherry-pick --abort
                git checkout gh-pages -- .
                git add -A
                git commit -m "Recent deployments (last 7 days) as of $(date +%Y-%m-%d)"
              }
              # Also need to include first_recent itself
              git cherry-pick ${FIRST_RECENT} 2>/dev/null || echo "First commit already included"
            else
              # FIRST_RECENT is the root commit, keep everything from there
              git cherry-pick ${FIRST_RECENT}..gh-pages || {
                echo "Cherry-pick failed, using alternative"
                git cherry-pick --abort
                git checkout gh-pages -- .
                git add -A
                git commit -m "Recent deployments (last 7 days) as of $(date +%Y-%m-%d)"
              }
            fi
          fi
      
      - name: Verify and replace gh-pages
        if: steps.commits.outputs.skip != 'true'
        run: |
          # Verify file contents match
          DIFF_COUNT=$(git diff gh-pages gh-pages-clean -- . ':(exclude).git' | wc -l)
          
          if [ ${DIFF_COUNT} -ne 0 ]; then
            echo "Warning: File differences detected!"
            git diff --stat gh-pages gh-pages-clean -- . ':(exclude).git'
          fi
          
          # Check commit count
          NEW_COUNT=$(git rev-list --count gh-pages-clean)
          echo "New commit count: ${NEW_COUNT}"
          echo "Expected: ${{ steps.commits.outputs.total_keep }}"
          
          # Replace old branch
          git branch -D gh-pages
          git branch -m gh-pages-clean gh-pages
      
      - name: Force push to GitHub
        if: steps.commits.outputs.skip != 'true'
        run: |
          # Force push the cleaned branch
          git push origin gh-pages --force
          
          echo "✅ gh-pages history cleanup complete"
          echo "Kept commits: ${{ steps.commits.outputs.total_keep }}"
      
      - name: Create summary
        if: steps.commits.outputs.skip != 'true'
        run: |
          BACKUP="${{ steps.commits.outputs.backup_commit }}"
          
          cat >> $GITHUB_STEP_SUMMARY << EOF
          ## gh-pages History Cleanup Complete
          
          **Retention Period**: Last 7 days OR 1 backup commit (if no recent activity)
          **Commits Kept**: ${{ steps.commits.outputs.total_keep }}
          
          ### Retained Commits
          EOF
          
          if [ -n "${BACKUP}" ]; then
            echo "**Backup commit** (no recent activity):" >> $GITHUB_STEP_SUMMARY
            echo "\`${BACKUP:0:9}\`" >> $GITHUB_STEP_SUMMARY
            echo "" >> $GITHUB_STEP_SUMMARY
          else
            echo "**Recent commits** (last 7 days):" >> $GITHUB_STEP_SUMMARY
            git log --oneline gh-pages | sed 's/^/- /' >> $GITHUB_STEP_SUMMARY
          fi
          
          cat >> $GITHUB_STEP_SUMMARY << EOF
          
          ### Next Steps
          
          Contributors should update their local repositories:
          
          \`\`\`bash
          git fetch origin
          git checkout gh-pages
          git reset --hard origin/gh-pages
          git gc --aggressive --prune=now
          \`\`\`
          
          Or simply re-clone the repository for a fresh start.
          EOF
      
      - name: Skip notification
        if: steps.commits.outputs.skip == 'true'
        run: |
          echo "Cleanup skipped - not enough history to remove"
          cat >> $GITHUB_STEP_SUMMARY << EOF
          ## gh-pages History Cleanup Skipped
          
          Not enough commit history to warrant cleanup.
          Current commits: $(git rev-list --count gh-pages)
          EOF

Alternative: Simpler Version (Single Commit Fallback)

If the cherry-pick approach is too complex, use this simpler version that keeps only the latest state as a single commit when there's recent activity:

name: Cleanup gh-pages History (Simple)

on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly on Sunday
  workflow_dispatch:

jobs:
  cleanup:
    runs-on: ubuntu-latest
    if: github.repository == 'QuantEcon/lecture-python.myst'
    
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          ref: gh-pages
      
      - name: Configure Git
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
      
      - name: Check for recent activity
        id: check
        run: |
          # Find commits from last 7 days
          RECENT=$(git log --format="%H" --since="7 days ago" | wc -l)
          
          if [ ${RECENT} -eq 0 ]; then
            # No recent activity, keep current state as backup
            echo "No recent activity - keeping current state as backup"
            echo "skip=true" >> $GITHUB_OUTPUT
          else
            # Has recent activity, proceed with cleanup
            echo "Recent activity found: ${RECENT} commits"
            echo "skip=false" >> $GITHUB_OUTPUT
          fi
      
      - name: Simplify to single commit
        if: steps.check.outputs.skip != 'true'
        run: |
          # Create orphan branch with current state
          git checkout --orphan temp
          git add -A
          git commit -m "Weekly deployment snapshot: $(date +%Y-%m-%d)"
          
          # Replace gh-pages
          git branch -D gh-pages
          git branch -m gh-pages
          
          # Force push
          git push origin gh-pages --force
          
          echo "✅ Kept ${{ steps.commits.outputs.total }} commits"

📅 Workflow Configuration Options

Schedule Options

# Weekly on Sunday at midnight UTC (RECOMMENDED)
- cron: '0 0 * * 0'

# Daily at midnight UTC
- cron: '0 0 * * *'

# Twice weekly (Sunday and Wednesday)
- cron: '0 0 * * 0,3'

# Monthly on the 1st at midnight UTC
- cron: '0 0 1 * *'

Retention Period Options

Modify the days calculation:

# 3 days
FIRST_RECENT=$(git log --format="%H" --since="3 days ago" --reverse | head -1)

# 7 days (default - 1 week)
FIRST_RECENT=$(git log --format="%H" --since="7 days ago" --reverse | head -1)

# 14 days (2 weeks)
FIRST_RECENT=$(git log --format="%H" --since="14 days ago" --reverse | head -1)

# 30 days (1 month)
FIRST_RECENT=$(git log --format="%H" --since="30 days ago" --reverse | head -1)

Backup Commit Options

You can modify how many backup commits to keep:

# Keep 1 backup commit (default)
BACKUP=$(git log --format="%H" ${FIRST_RECENT}^ -1)

# Keep 2 backup commits
BACKUP=$(git log --format="%H" ${FIRST_RECENT}~2 -1)

# Keep 3 backup commits
BACKUP=$(git log --format="%H" ${FIRST_RECENT}~3 -1)

# No backup (keep only recent commits)
BACKUP=""

🔔 Notification Setup (Optional)

Add Slack/Discord Notification

Add this step after the force push:

- name: Notify team
  if: steps.commit.outputs.skip != 'true'
  uses: slackapi/slack-github-action@v1
  with:
    webhook-url: ${{ secrets.SLACK_WEBHOOK_URL }}
    payload: |
      {
        "text": "gh-pages history cleanup completed",
        "blocks": [
          {
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "*gh-pages History Cleanup*\n\nKept commits from the last 7 days (or 1 backup if no recent activity).\n\nContributors: Update your local repos with `git reset --hard origin/gh-pages`"
            }
          }
        ]
      }

Create GitHub Issue After Cleanup

- name: Create notification issue
  if: steps.commits.outputs.skip != 'true'
  uses: actions/github-script@v7
  with:
    script: |
      await github.rest.issues.create({
        owner: context.repo.owner,
        repo: context.repo.repo,
        title: `gh-pages history cleanup completed - ${new Date().toISOString().split('T')[0]}`,
        body: `## Automated gh-pages History Cleanup
        
        The weekly gh-pages history cleanup has completed successfully.
        
        **Details:**
        - Retention: Last 7 days OR 1 backup commit (if no recent activity)
        - Commits kept: ${{ steps.commits.outputs.total_keep }}
        ${steps.commits.outputs.backup_commit ? '- Backup commit: \`' + steps.commits.outputs.backup_commit + '\` (no recent activity)' : '- Recent commits from last 7 days'}
        
        **For contributors:**
        Update your local repository:
        \`\`\`bash
        git fetch origin
        git checkout gh-pages
        git reset --hard origin/gh-pages
        git gc --aggressive --prune=now
        \`\`\`
        
        This issue will auto-close in 7 days.`,
        labels: ['maintenance', 'automated']
      });

🧪 Testing the Automation

Test on a Fork First

  1. Fork the repository
  2. Add the workflow file
  3. Trigger manually via workflow_dispatch
  4. Verify the cleanup works as expected
  5. Check repository size reduction

Manual Trigger

# Using GitHub CLI
gh workflow run cleanup-gh-pages-history.yml

# Or via GitHub UI
# Go to Actions → Cleanup gh-pages History → Run workflow

Dry Run Script

Test locally before automation:

#!/bin/bash
# dry-run-cleanup.sh

set -e

MONTHS=6

echo "=== DRY RUN: gh-pages cleanup ==="
echo "Retention: ${MONTHS} months"

git fetch origin gh-pages

CUTOFF=$(git log origin/gh-pages --format="%H" --since="${MONTHS} months ago" --reverse | head -1)

if [ -z "${CUTOFF}" ]; then
  echo "No commits to remove"
  exit 0
fi

echo "Cutoff commit:"
git log --oneline -1 ${CUTOFF}

KEEP_COUNT=$(git rev-list --count ${CUTOFF}..origin/gh-pages)
TOTAL_COUNT=$(git rev-list --count origin/gh-pages)
REMOVE_COUNT=$((TOTAL_COUNT - KEEP_COUNT))

echo ""
echo "Total commits: ${TOTAL_COUNT}"
echo "Commits to keep: ${KEEP_COUNT}"
echo "Commits to remove: ${REMOVE_COUNT}"
echo "Percentage removed: $((REMOVE_COUNT * 100 / TOTAL_COUNT))%"

echo ""
echo "Commits to be removed:"
git log origin/gh-pages --oneline --reverse | head -${REMOVE_COUNT}

📊 Monitoring and Metrics

Track Repository Size Over Time

Add to workflow:

- name: Record repository size
  run: |
    SIZE=$(git count-objects -v | grep size-pack | awk '{print $2}')
    echo "Repository size (KB): ${SIZE}"
    
    # Optional: Send to monitoring service
    curl -X POST https://your-monitoring-service.com/metrics \
      -d "repo_size=${SIZE}&timestamp=$(date +%s)"

Create Weekly Cleanup Report

- name: Generate cleanup report
  run: |
    cat > report.md << EOF
    # gh-pages Cleanup Report - $(date +%Y-%m-%d)
    
    ## Statistics
    - Date: $(date)
    - Commits kept: ${{ steps.commits.outputs.total_keep }}
    - Retention: Last 7 days OR 1 backup
    - Oldest commit: $(git log origin/gh-pages --format="%ai" --reverse | head -1)
    - Repository size: $(du -sh .git | cut -f1)
    
    ## Commit History
    $(git log origin/gh-pages --oneline)
    EOF
    
    # Upload as artifact
    echo "report_path=report.md" >> $GITHUB_OUTPUT

⚠️ Important Considerations

Before Enabling Automation

  • Test the workflow on a fork first
  • Verify the manual cleanup process works
  • Set up notifications for the team
  • Document the process in CONTRIBUTING.md
  • Ensure CI/CD won't break with force pushes
  • Consider impact on pull request previews

Risks and Mitigations

Risk Mitigation
Accidental data loss Backup branch created before cleanup
Failed deployments Test deployment after cleanup
Broken CI Use fetch-depth: 0 in deployment workflows
Team confusion Automated notification when cleanup runs
Force push conflicts Only automation touches gh-pages

Repository Protection Rules

Consider these settings for gh-pages:

# .github/branch-protection.yml (if using branch protection)
gh-pages:
  # Allow force pushes only from Actions
  enforce_admins: false
  required_status_checks: null
  restrictions:
    users: []
    teams: []
  # Allow Actions bot to force push
  allow_force_pushes: true

📝 Documentation Updates Needed

Update CONTRIBUTING.md

Add section:

## gh-pages Branch

The `gh-pages` branch is automatically managed:

- **Automated cleanup**: Runs weekly (every Sunday)
- **Retention**: Last 7 days of deployments OR 1 backup commit (if no recent activity)
- **History**: Older deployments are automatically removed

### If You Work with gh-pages

After weekly cleanup, update your local branch:

\`\`\`bash
git fetch origin
git checkout gh-pages
git reset --hard origin/gh-pages
git gc --aggressive --prune=now
\`\`\`

**Note**: Never manually commit to `gh-pages` - it's auto-generated by CI/CD.

Update README.md

Add badge showing last cleanup:

![gh-pages cleanup](https://github.com/QuantEcon/lecture-python.myst/actions/workflows/cleanup-gh-pages-history.yml/badge.svg)

🎯 Expected Outcomes

Immediate (After First Cleanup)

  • Repository size: 4.2 GB → ~200-300 MB (93-95% reduction)
  • Clone time: 5-10 min → ~30 sec
  • gh-pages commits: 194 → 1 (currently)

Long-term (With Weekly Automation)

  • Minimal repository size: ~200-300 MB
  • Predictable history: Last 7 days OR 1 backup
  • Zero maintenance: Fully automated
  • Better performance: Ultra-fast clones for new contributors

🔗 Related Issues


✅ Implementation Checklist

Phase 1: Initial Cleanup (Manual)

  • Create backup: git branch gh-pages-backup origin/gh-pages
  • Run manual cleanup script (see Part 1)
  • Force push cleaned gh-pages
  • Verify website works
  • Verify repository size reduction
  • Notify team to update local repos
  • Monitor for 1-2 weeks
  • Delete backup branch after verification

Phase 2: Automation Setup

  • Create workflow file .github/workflows/cleanup-gh-pages-history.yml
  • Test workflow on a fork
  • Create dry-run script for testing
  • Set up notifications (Slack/Discord/Issues)
  • Enable workflow on main repository
  • Test manual trigger
  • Wait for first scheduled run

Phase 3: Documentation

  • Update CONTRIBUTING.md
  • Update README.md
  • Add workflow badge
  • Document rollback procedure
  • Create runbook for troubleshooting

Phase 4: Monitoring

  • Track repository size monthly
  • Monitor workflow success rate
  • Review team feedback
  • Adjust retention period if needed

🆘 Troubleshooting

Workflow Fails During Cherry-Pick

Use the simpler single-commit approach:

git checkout gh-pages -- .
git add -A
git commit -m "gh-pages cleanup: keeping last week or 1 backup"

Repository Size Not Decreasing

Contributors need to run garbage collection:

git gc --aggressive --prune=now

Website Stops Working

Rollback to backup:

git push origin gh-pages-backup:gh-pages --force

Need to Recover Old Deployment

Check backup branch or GitHub releases/tags.


📞 Next Steps

  1. Review this proposal with the team
  2. Test manual cleanup on a fork or backup
  3. Run first manual cleanup to establish baseline
  4. Enable automation after successful manual cleanup
  5. Monitor and adjust retention period as needed

Estimated Effort:

  • Manual cleanup: 30 minutes
  • Automation setup: 1-2 hours
  • Testing and verification: 1-2 hours
  • Documentation: 30 minutes

Total: ~3-4 hours for complete implementation

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions