Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows ec2 doesn`t reach the stop state in ec2 userdata script #570

Open
pharindoko opened this issue May 15, 2024 · 4 comments
Open

windows ec2 doesn`t reach the stop state in ec2 userdata script #570

pharindoko opened this issue May 15, 2024 · 4 comments

Comments

@pharindoko
Copy link
Contributor

pharindoko commented May 15, 2024

Hey @kichik,

I had one special use case which I can replicate.
While the job has been successfully completed in github, the ec2 instance and the step function job execution are still running.

runner.log

Current runner version: '2.316.1'
2024-05-15 09:41:16Z: Listening for Jobs
2024-05-15 09:41:19Z: Running job: test_config
2024-05-15 10:05:49Z: Job test_config completed with result: Canceled
./run.cmd : An error occurred: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View
permissions to perform the action.
At C:\Windows\system32\config\systemprofile\AppData\Local\Temp\EC2Launch988827203\UserScript.ps1:48 char:3
+   ./run.cmd 2>&1 | Out-File -Encoding ASCII -Append /actions/runner.l ...
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (An error occurr...orm the action.:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError



"Runner listener exit with retryable error, re-launch runner in 5 seconds."
"Restarting runner..."
        1 file(s) copied.



? Connected to GitHub



Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner
registrations are automatically deleted for runners that have not connected to the service recently.
"Runner listener exit with terminated error, stop the service, no retry needed."
"Exiting runner..."

What`s the problem:

The machine is still running and we waste money until we recognize it. (yes additional alerting in this case would make sense too but I haven`t yet in place.)

Proposal:

It would be great to have a try catch block around the action statement in powershell

to ensure the machine get`s terminated

Stop-Computer -ComputerName localhost -Force

@pharindoko pharindoko changed the title windows ec2 doesn`t r windows ec2 doesn`t reach the stop state in ec2 userdata script May 15, 2024
@kichik
Copy link
Member

kichik commented May 15, 2024

I'm not PowerShell expert, but I do believe we are already doing that. Are you sure these are the logs of the right instance? It seems like a log of a runner that the idle reaper terminated. In that case, the step function execution should have also been aborted.

@pharindoko
Copy link
Contributor Author

Yes I'm very sure that it's the right instance. We were able to replicate the issue running the same job again.
It's clear that we should fix this job in anyway - but it still would be nice to see that the machine is stopped ecen when an error appears in the action function.

Try catch in powershell: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_try_catch_finally?view=powershell-7.4

@kichik
Copy link
Member

kichik commented May 15, 2024

Would you be able pull up the user data log from that machine so I can better understand what exactly failed there? It should be in C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log. As far as I understand PowerShell, executing a script (like run.cmd executed by action()) doesn't raise exceptions. Either way I'd like to both fix the error and possibly add try/catch.

@pharindoko
Copy link
Contributor Author

Would you be able pull up the user data log from that machine so I can better understand what exactly failed there? It should be in C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log. As far as I understand PowerShell, executing a script (like run.cmd executed by action()) doesn't raise exceptions. Either way I'd like to both fix the error and possibly add try/catch.

couldn`t find the UserdataExecution.log ...

aws mentions it here ....

You can't find the user data logs

The log files for EC2Launch, EC2Launch v2, and EC2Config contain the output from the standard output and standard error streams. You can access the log files at the following locations:

    EC2Launch v2: C:\ProgramData\Amazon\EC2Launch\log\agent.log
    EC2Launch: C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log
    EC2Config: C:\Program Files\Amazon\Ec2ConfigService\Logs\Ec2ConfigLog.txt

guess we use ec2launch v2 and I found the agent.log.
will provide it to you...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants