Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dinitctl: dinitctl enable can get stuck in service activation status report because of missing SERVICEEVENT #363

Closed
mobin-2008 opened this issue Aug 8, 2024 · 6 comments · Fixed by #364
Assignees
Milestone

Comments

@mobin-2008
Copy link
Collaborator

Describe the bug
This is a really weird one. Sometimes the dinitctl enable get stuck after the "enabled" message.

To Reproduce

  1. Have a minimal service set (including boot and boot.d)
  2. Add a fail-fast service (for example a service with non-exist executable path).
  3. Create a loop with disable and enable:
#!/bin/sh

SERVICE="srv"
SOCKET="/tmp/dinitctl"

i=0
while :; do
    i=$(( i + 1 ))
    echo $i
    dinitctl disable "$SERVICE" --socket-path "$SOCKET"
    dinitctl enable "$SERVICE" --socket-path "$SOCKET"
done
  1. It will stuck after a random number of tries.

Expected behavior
It should not stuck.

Additional context
The dinitctl enables a service with this process:

  1. dinitctl loads a service
  2. dinitctl finds a waits-for.d and add the service in that directory
  3. dinitctl starts the service
  4. dinitctl checks the current state of service with SERVICESTATUS command
  5. dinit responds.
  6. if the status of service wasn't something that dinitctl expects, run wait_service_state()
  7. Report the success or failure.

The problem is dinitctl expects a SERVICEEVENT but sometimes it's missing and dinitctl will stuck on read() syscall (int r = rbuffer.fill_to(socknum, 2);).

@mobin-2008 mobin-2008 added Bug/Bugfix A-Importance: Normal C-dinitctl Things about dinitctl C-dinit Things about the main parts of dinit C-control Things about custom control protocol of Dinit labels Aug 8, 2024
@mobin-2008 mobin-2008 self-assigned this Aug 8, 2024
@mobin-2008 mobin-2008 added this to the 1.0 milestone Aug 8, 2024
@davmac314
Copy link
Owner

I can't reproduce after over 30000 iterations:

Service 'sshd' has been disabled.
Service 'sshd' has been enabled.
Service 'sshd' started.
37225
Service 'sshd' has been disabled.
Service 'sshd' has been enabled.
Service 'sshd' started.
37226

@mobin-2008
Copy link
Collaborator Author

I can't reproduce after over 30000 iterations:

Service 'sshd' has been disabled.
Service 'sshd' has been enabled.
Service 'sshd' started.
37225
Service 'sshd' has been disabled.
Service 'sshd' has been enabled.
Service 'sshd' started.
37226

The problem is that your sshd service is not fail-fast, Look at my example:

# "srv" service

type = process
command = not-exist

In this case it get stuck for me and it's really random, one time it will get stuck after ~700 tries, next time it will stuck after second try :/

@davmac314
Copy link
Owner

Yes, I was able to reproduce with a service that fails to execute.

@mobin-2008
Copy link
Collaborator Author

mobin-2008 commented Aug 9, 2024

I found out why it happens. Let's take a look at the dinitctl enable process:

  1. Enable the service
  2. Start the service
  3. Send a SERVICESTATUS
  4. wait_for_reply() (Which does ignore and skip any information packet (such as SERVICEEVENT)
  5. Process the status
  6. If the current status is not STARTED, wait_service_state().

This system looks good but there is a race condition: If the SERVICEEVENT is sent before the SERVICESTATUS, We will lost it in the step 4 and in the step 6, We will wait for our lost SERVICEEVENT. strace can confirm this problem:

write(3, "\22\1\0\0\0", 5)              = 5                          <-- SERVICESTATUS Request 
read(3, "d\21\1\0\0\0\2\3\0\0\4\22\0\2\0\0\0", 847) = 17             <-- Interesting SERVICEEVENT (EXECFAILED)
read(3, "d\21\1\0\0\0\1\0\0\0\4\22\0\2\0\0\0", 830) = 17             <-- Another interesting SERVICEEVENT (STOPPED)
read(3, "F\0\0\0\0\4\22\0\2\0\0\0", 813) = 12                        <-- SERVICESTATUS Response
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x4), ...}) = 0
write(1, "Service 'srv' has been enabled.\n", 32Service 'srv' has been enabled.
) = 32
read(3,                                                              <-- dinitctl get stuck in here

@mobin-2008
Copy link
Collaborator Author

The quick fix for this is to change this:

dinit/src/dinitctl.cc

Lines 1930 to 1934 in 3867cf1

if (enable) {
if (current != service_state_t::STARTED) {
wait_service_state(socknum, rbuffer, to_handle, to, false /* start */, verbose);
}
}

To this:

        if (enable) {
            if (current != service_state_t::STARTED && target != service_state_t::STOPPED) {
                wait_service_state(socknum, rbuffer, to_handle, to, false /* start */, verbose);
            }
            else {
                std::cerr << "Service Failed to start." << std::endl; // TODO: Show more info about the failure
                return 1;
            }
        }

@davmac314 Is it a good fix? Or we need to re-implement the SERVICEEVENT processing to be race-free?

@davmac314
Copy link
Owner

I don't think there is a race in dinit itself, it's just the events can happen in an order that's not what dinitctl is expecting. The fix you proposed might be ok, I need to have a proper look. Maybe it can be refactored a little.

@mobin-2008 mobin-2008 removed C-dinit Things about the main parts of dinit C-control Things about custom control protocol of Dinit labels Aug 10, 2024
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 11, 2024
With this change we can eliminate the "SERVICESTATUS vs SERVICEEVENT" race.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 11, 2024
With this change we can eliminate the "SERVICESTATUS vs SERVICEEVENT" race.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 11, 2024
With this change we can eliminate the "SERVICESTATUS vs SERVICEEVENT" race.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 13, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for SERVICEVENT first; If it wasn't the case,
It will be processed normally.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 15, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for SERVICEVENT first; If it wasn't the case,
It will be processed normally.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 18, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for SERVICEVENT first; If it wasn't the case,
It will be processed normally.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 18, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for SERVICEVENT first; If it wasn't the case,
It will be processed normally.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 24, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for whichever one actually comes first.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 24, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for whichever one actually comes first.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 26, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for whichever one actually comes first.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 28, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for whichever one actually comes first.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Aug 29, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for whichever one actually comes first.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Sep 5, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for whichever one actually comes first.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Sep 7, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for whichever one actually comes first.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
mobin-2008 added a commit to mobin-2008/dinit that referenced this issue Sep 7, 2024
Sometimes it's possible to get SERVICEEVENT before the SERVICESTATUS report.
This commit fixes it with waiting for whichever one actually comes first.

Fixes davmac314#363

Signed-off-by: Mobin Aydinfar <mobin@mobintestserver.ir>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants