Skip to content

Blackout Windows

DarthNorse edited this page Oct 21, 2025 · 1 revision

Blackout Windows

Blackout Windows allow you to schedule maintenance periods during which alerts are suppressed, preventing notification storms during planned downtime.

Overview

Blackout Windows provide:

  • Scheduled maintenance periods - Define when alerts should be suppressed
  • Day-of-week selection - Set recurring windows for specific days
  • Overnight window support - Windows can span across midnight
  • Post-blackout checks - Automatic container health verification after windows end
  • Real-time status - Dashboard banner shows active blackout windows
  • Multiple windows - Create as many blackout windows as needed

How It Works

During Blackout Window

When a blackout window is active:

  1. Alert notifications are suppressed:

    • State change alerts (container stopped, died)
    • Metric alerts (CPU, memory thresholds)
    • Health check alerts
    • Update notifications
  2. Events continue logging:

    • All container events are recorded
    • Event Viewer shows all activity
    • Audit trail remains complete
  3. Auto-restart is deferred:

    • Containers that stop during blackout are not immediately restarted
    • Restart attempts queued for after blackout ends
  4. Dashboard shows banner:

    • Yellow banner at top of page
    • "Blackout Window Active: [window name]"
    • Tooltip shows window details

After Blackout Window Ends

When a blackout window ends:

  1. Post-blackout health check runs:

    • All containers across all hosts are checked
    • Problematic states identified (exited, dead, paused)
  2. Deferred alerts are sent:

    • Alerts for containers found in failed state
    • Note added: "Container found in [state] after maintenance window ended"
    • Sent through configured alert channels
  3. Auto-restart resumes:

    • Containers with desired_state: should_run are restarted
    • Normal auto-restart logic applies
    • Retry counters reset
  4. Dashboard banner removed:

    • Yellow banner disappears
    • WebSocket broadcasts status change to all clients

Configuration

Creating a Blackout Window

Access: Settings → Alerts → Blackout Windows

Steps:

  1. Click "Add Blackout Window" button
  2. Configure window parameters:
    • Name: Descriptive name (e.g., "Weekend Maintenance")
    • Days: Select days of week (Monday=0, Sunday=6)
    • Start Time: When blackout begins (24-hour format)
    • End Time: When blackout ends (24-hour format)
    • Enabled: Toggle to activate/deactivate
  3. Click "Save" to create window

Window Parameters

Parameter Format Example Description
Name Text (1-100 chars) "Nightly Backups" Human-readable identifier
Days Array of integers 0-6 [0,1,2,3,4] Mon-Fri = 0-4, Sat=5, Sun=6
Start Time HH:MM (24-hour) 22:00 When blackout begins
End Time HH:MM (24-hour) 06:00 When blackout ends
Enabled Boolean true Whether window is active

Editing a Blackout Window

  1. Navigate to Settings → Alerts → Blackout Windows
  2. Click "Edit" button next to window
  3. Modify parameters as needed
  4. Click "Save" to update

Note: Changes to blackout windows take effect immediately. If editing a currently active window, the new schedule applies right away.

Deleting a Blackout Window

  1. Navigate to Settings → Alerts → Blackout Windows
  2. Click "Delete" button next to window
  3. Confirm deletion in modal dialog

Warning: Deletion is immediate and cannot be undone. If a window is currently active when deleted, alerts resume immediately.

Window Types

Recurring Windows

Windows that repeat on scheduled days each week.

Use cases:

  • Nightly backup windows (every night 02:00-04:00)
  • Weekend maintenance (Saturday-Sunday all day)
  • Weekly deployment windows (Thursday 20:00-23:00)

Example: Nightly database maintenance

Name: Nightly DB Maintenance
Days: [0,1,2,3,4,5,6]  (Every day)
Start Time: 02:00
End Time: 04:00
Enabled: true

Single-Day Windows

Windows that occur on specific day of week.

Use cases:

  • Monday morning deployments
  • Friday afternoon team updates
  • Sunday infrastructure upgrades

Example: Monday morning deployment window

Name: Monday Deployments
Days: [0]  (Monday only)
Start Time: 08:00
End Time: 10:00
Enabled: true

Overnight Windows

Windows that span midnight (start time > end time).

Use cases:

  • Late night maintenance (23:00 - 02:00)
  • Overnight backups (22:00 - 06:00)
  • Extended deployments (20:00 - 08:00)

Example: Overnight maintenance

Name: Overnight Maintenance
Days: [0,1,2,3,4]  (Mon-Fri)
Start Time: 23:00
End Time: 02:00
Enabled: true

How overnight windows work:

  • If current time >= start time: Check if today is in window days
  • If current time < end time: Check if yesterday is in window days
  • Handles day-of-week transition correctly

Example scenario:

  • Window: Monday 23:00 - 02:00
  • Current time: Tuesday 01:00
  • Result: Blackout is active (started Monday night, ends Tuesday morning)

Best Practices

When to Use Blackout Windows

DO use blackout windows for:

  • Scheduled infrastructure maintenance
  • Planned container updates/restarts
  • Database migration periods
  • Network maintenance windows
  • Testing disaster recovery procedures
  • Large-scale configuration changes

DON'T use blackout windows for:

  • Regular container restarts (use auto-restart instead)
  • Unplanned outages (alerts should fire)
  • Individual container maintenance (stop alerts per container instead)
  • Permanent alert suppression (fix the root cause or disable alert rule)

Window Duration

Short windows (1-2 hours):

  • Specific deployment tasks
  • Quick database migrations
  • Container image updates

Medium windows (2-4 hours):

  • Infrastructure upgrades
  • Multi-step deployments
  • Batch processing jobs

Long windows (4+ hours):

  • Overnight backups
  • Data warehouse updates
  • Extended maintenance periods

Avoid:

  • Windows longer than 8 hours (hard to justify maintenance that long)
  • Overlapping windows (simplify to single window if possible)

Naming Conventions

Use descriptive, specific names:

Good names:

  • "Nightly Database Backups (02:00-04:00)"
  • "Weekend Infrastructure Upgrades"
  • "Thursday Production Deployments"

Poor names:

  • "Maintenance"
  • "Window 1"
  • "Test"

Benefits:

  • Clear purpose when reviewing active blackouts
  • Easier to communicate with team
  • Better audit trail in logs

Day Selection

Full week (all 7 days):

  • Daily backup windows
  • Continuous batch processing
  • Regular health check maintenance

Weekdays only (Mon-Fri):

  • Business hours deployments
  • Office-hours maintenance
  • Development/testing cycles

Weekends only (Sat-Sun):

  • Non-business-critical upgrades
  • Extended testing periods
  • Low-traffic maintenance

Specific day:

  • Weekly deployment day
  • Monthly patch Tuesday
  • Scheduled vendor maintenance

Monitoring and Status

Active Blackout Banner

When a blackout window is active:

  • Yellow banner appears at top of all pages
  • Text: "Blackout Window Active: [Window Name]"
  • Clock icon indicates scheduled maintenance
  • Tooltip on hover shows window details (start/end time, days)

WebSocket Updates

Blackout status changes broadcast in real-time:

  • All connected clients receive updates
  • Banner appears/disappears automatically
  • No page refresh required

Event Logging

Blackout window transitions are logged:

  • "Blackout window started: [name]"
  • "Blackout window ended: [name]"
  • "Post-blackout check: X containers checked, Y in failed state"

View logs: Event Viewer → Filter by event type "system"

Post-Blackout Behavior

Container Health Check

After each blackout window ends, DockMon:

  1. Scans all hosts for container states

  2. Identifies problematic containers:

    • State: exited
    • State: dead
    • State: paused
    • State: removing
  3. Records findings:

    • Container name and ID
    • Host name
    • State
    • Exit code (if applicable)
    • Image name
  4. Sends alerts for matched alert rules:

    • Only sends if container matches alert rule selectors
    • Includes note: "Container found in [state] after maintenance window ended"
    • Respects alert cooldown periods

Alert Rule Matching

Post-blackout alerts trigger only if:

  • Alert rule exists for state change events
  • Container matches rule's container selector
  • State matches rule's trigger states
  • Cooldown period has elapsed since last alert

Example alert rule for post-blackout checks:

Name: Post-Maintenance Container Failures
Scope: Container
Kind: State Change
Trigger States: [exited, dead]
Severity: Error
Cooldown: 300 seconds

Auto-Restart Behavior

After blackout ends:

  • Containers with desired_state: should_run are restarted
  • Auto-restart logic applies as normal
  • Retry counters reset (blackout doesn't count as failed attempt)
  • Alerts sent if restart fails

Troubleshooting

Blackout Not Activating

Symptoms:

  • Alerts still being sent during scheduled window
  • No blackout banner showing

Diagnosis:

  1. Verify window is enabled (Settings → Alerts → Blackout Windows)
  2. Check current day is in window's days array
  3. Verify current time is within window (times are in server local time)

Solutions:

  • Enable window if disabled
  • Add current day to days array
  • Adjust start/end times to match server time

Alerts Not Resuming After Blackout

Symptoms:

  • Blackout banner gone but no alerts received
  • Containers in failed state but no notifications

Diagnosis:

  1. Check alert rules are enabled (Settings → Alerts)
  2. Verify notification channels are configured (Settings → Notifications)
  3. Review alert cooldown periods (may still be in cooldown)
  4. Check container matches alert rule selectors

Solutions:

  • Enable alert rules if disabled
  • Configure notification channels
  • Wait for cooldown to expire
  • Adjust alert rule selectors to match containers

Overnight Window Not Working

Symptoms:

  • Window activates at wrong time
  • Window doesn't span midnight correctly

Diagnosis:

  1. Verify start time > end time (e.g., 23:00 > 02:00)
  2. Check both days are in days array (if window should span two days)
  3. Review event logs for window start/end times

Solutions:

  • Ensure start time is later than end time
  • Add both days to days array (e.g., Monday night to Tuesday morning = [0,1])

Common Use Cases

Nightly Backup Window

Scenario: Database backups run 02:00-04:00 every night, containers restart during backup

Configuration:

Name: Nightly Database Backups
Days: [0,1,2,3,4,5,6]  (Every day)
Start Time: 02:00
End Time: 04:00
Enabled: true

Weekly Deployment Window

Scenario: Production deployments occur Thursday evenings 20:00-23:00

Configuration:

Name: Thursday Production Deployment
Days: [3]  (Thursday)
Start Time: 20:00
End Time: 23:00
Enabled: true

Weekend Infrastructure Upgrade

Scenario: Major infrastructure upgrades happen Saturday-Sunday all day

Configuration:

Name: Weekend Infrastructure Upgrades
Days: [5,6]  (Saturday, Sunday)
Start Time: 00:00
End Time: 23:59
Enabled: true

Monthly Patch Window

Scenario: OS patches applied first Tuesday of each month, 03:00-05:00

Configuration:

Name: Monthly Patch Tuesday
Days: [1]  (Tuesday only)
Start Time: 03:00
End Time: 05:00
Enabled: true

Note: This blackout will apply to EVERY Tuesday. To limit to first Tuesday only, you'll need to manually enable/disable each month, or use external scheduling to update blackout via API.

Related Documentation

Clone this wiki locally