Skip to content

Bug Report: While testing Envbox following upgrade to Sysbox 0.6.5 on EC2 Instances issues with AL2023 crop up #147

@bjornrobertsson

Description

@bjornrobertsson

Summary

The Sysbox systemd unit files (sysbox-fs.service and sysbox-mgr.service) shipped with Sysbox 0.6.5 contain deprecated systemd configuration that causes issues on Amazon Linux 2023 (AL2023), preventing reliable Envbox deployments on EC2 instances.

Prior to this - the same scenario works with the Envbox with Sysbox 0.6.4 base.

Environment

  • OS: Amazon Linux 2023 (EC2 instances)
  • Sysbox Version: 0.6.5
  • Use Case: Supporting Envbox container runtime on AL2023

Issues Identified

Following replication and troubleshooting, the log indicates sysbox-fs starts before sysbox-mgr, but no sysbox-fs.sock file is created, however in about half the starts, the sequence shows sysbox-mgr first following a successful sysbox-fs startup and a sysbox-fs.sock file.

Looking at this, shows issues with the systemd unit files for the two services.

Deprecated StartLimitInterval Parameter

Since AL2023 uses systemd 252, the old parameters are of concern and the systemd unit files have not been updated for ~2 years.

Although not a deprecation issue, the other differences are versions of Systemd on AL2023, AL2 on 219, and going past 230 to 252 can lead to functionality changes. This may be a time-bomb for other OS systems if the upstream Systemd unit file is not fixed. Ubuntu:latest uses Systemd version 252 as well.

This issue is happening when Envbox is running on VM/Hardware/EC2 instances, which is counter to the aim of running on Kubernetes.

Duplicate Type= Declarations

Both unit files contain:

Type=simple
Type=notify

This creates redundant declarations where only Type=notify should be used.

Missing Service Dependencies and Ordering

Based on testing, the current configuration lacks proper service dependencies, leading to race conditions where sysbox-fs may start before sysbox-mgr is ready.

Missing Restart Policies

No restart policies are defined, making the services less resilient to failures.

Current Unit Files (Sysbox 0.6.5)

sysbox-fs.service

[Unit]
Description=sysbox-fs (part of the Sysbox container runtime)
PartOf=sysbox.service
After=sysbox-mgr.service

[Service]
Type=simple
Type=notify
ExecStart=/usr/bin/sysbox-fs
TimeoutStartSec=10
TimeoutStopSec=10
StartLimitInterval=0
NotifyAccess=main
OOMScoreAdjust=-500
LimitNOFILE=infinity
LimitNPROC=infinity

[Install]
WantedBy=sysbox.service

sysbox-mgr.service

[Unit]
Description=sysbox-mgr (part of the Sysbox container runtime)
PartOf=sysbox.service

[Service]
Type=simple
Type=notify
ExecStart=/usr/bin/sysbox-mgr
TimeoutStartSec=45
TimeoutStopSec=90
StartLimitInterval=0
NotifyAccess=main
OOMScoreAdjust=-500
LimitNOFILE=infinity
LimitNPROC=infinity

[Install]
WantedBy=sysbox.service

Proposed Fix

This fix can be applied to the Envbox image directly, enabling users immediately.

Simplified Patch Command

RUN sed -i \
-e '/^Type=simple$/d' \
-e 's/^StartLimitInterval=0$/StartLimitIntervalSec=0/' \
-e '/^\[Unit\]/a After=sysbox-fs.service\nRequires=sysbox-fs.service' \
/usr/lib/systemd/system/sysbox-mgr.service && \
sed -i \
-e '/^Type=simple$/d' \
-e 's/^StartLimitInterval=0$/StartLimitIntervalSec=0/' \
-e '/^\[Unit\]/a Before=sysbox-mgr.service' \
-e '/^\[Service\]/a Restart=on-failure\nRestartSec=2s\nStartLimitBurst=5\nStartLimitIntervalSec=30' \
/usr/lib/systemd/system/sysbox-fs.service

Expected Result After Patch

sysbox-fs.service

[Unit]
Before=sysbox-mgr.service
Description=sysbox-fs (part of the Sysbox container runtime)
PartOf=sysbox.service

[Service]
Restart=on-failure
RestartSec=2s
StartLimitBurst=5
StartLimitIntervalSec=30
Type=notify
ExecStart=/usr/bin/sysbox-fs
TimeoutStartSec=10
TimeoutStopSec=10
StartLimitIntervalSec=0
NotifyAccess=main
OOMScoreAdjust=-500
LimitNOFILE=infinity
LimitNPROC=infinity

[Install]
WantedBy=sysbox.service

sysbox-mgr.service

[Unit]
After=sysbox-fs.service
Requires=sysbox-fs.service
Description=sysbox-mgr (part of the Sysbox container runtime)
PartOf=sysbox.service

[Service]
Restart=on-failure
RestartSec=2s
StartLimitBurst=5
StartLimitIntervalSec=30
Type=notify
ExecStart=/usr/bin/sysbox-mgr
TimeoutStartSec=45
TimeoutStopSec=90
StartLimitIntervalSec=0
NotifyAccess=main
OOMScoreAdjust=-500
LimitNOFILE=infinity
LimitNPROC=infinity

[Install]
WantedBy=sysbox.service

Alternative Timeout/StartLimit Values for Testing

For environments experiencing timing issues, consider these alternative configurations:

Conservative (Slower but More Reliable)

# sysbox-fs.service [Service] section additions
RestartSec=5s
StartLimitBurst=3
StartLimitIntervalSec=60
TimeoutStartSec=20

# sysbox-mgr.service [Service] section additions  
RestartSec=5s
StartLimitBurst=3
StartLimitIntervalSec=60
TimeoutStartSec=60

Aggressive (Faster but Less Tolerant)

# sysbox-fs.service [Service] section additions
RestartSec=1s
StartLimitBurst=10
StartLimitIntervalSec=20
TimeoutStartSec=5

# sysbox-mgr.service [Service] section additions
RestartSec=1s
StartLimitBurst=10
StartLimitIntervalSec=20
TimeoutStartSec=30

Impact

  • Without fix: Unreliable Sysbox startup on AL2023, preventing consistent Envbox deployments
  • With fix: Proper service ordering, restart policies, and AL2023 compatibility

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions