Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installer hangs at step1 finished #307

Closed
bottkars opened this issue Jul 5, 2017 · 14 comments
Closed

installer hangs at step1 finished #307

bottkars opened this issue Jul 5, 2017 · 14 comments

Comments

@bottkars
Copy link
Contributor

bottkars commented Jul 5, 2017

hi guys, just getting an issue right now that step1 in single deployment hangs when finished:
https://gist.github.com/bottkars/9f986e96903a4f0dbe275e651f0dfad5#file-step1-issue
even it states successfull, comes not back.

@ksteinfeldt
Copy link
Contributor

ksteinfeldt commented Jul 5, 2017

Yes, I ran into this issue yesterday. If you sudo docker ps -all you'll see that there is a dead container in there. I was able to continue by deleting the container before it timed out. The installation will create a new container and continue.

An issue has been found in the last update and :latest has been rolled back. This bug may be included in that.

@adrianmo
Copy link
Contributor

adrianmo commented Jul 5, 2017

Can you try it again? This might be related to #305

@bottkars
Copy link
Contributor Author

bottkars commented Jul 5, 2017

ok, killing the container seems to work.

@bottkars bottkars closed this as completed Jul 5, 2017
@bottkars bottkars reopened this Jul 5, 2017
@bottkars
Copy link
Contributor Author

bottkars commented Jul 5, 2017

well, at least starts step1 again and stops at the same point. it never starts . after rmi and rm, it finally worked

@adrianmo
Copy link
Contributor

adrianmo commented Jul 5, 2017

@bottkars great - the problem was the Docker image. You did well by removing the image and pulling it again.

@adrianmo adrianmo closed this as completed Jul 5, 2017
@bottkars
Copy link
Contributor Author

bottkars commented Jul 5, 2017

LOL, still had my headaches on the next run, as mine is the old one :-)

emccorp/ecs-install latest 8dd6810f4905 8 days ago

When will a new image made available ?
btw, lloks like the docker hub repo for 3.0.0 is messed. Manifest for latest is not available ( guess it point to 3.0.0.2 which has been removed today ?! )

@bottkars
Copy link
Contributor Author

bottkars commented Jul 5, 2017

looks fine now , but we need to fix the installer image
image

@adrianmo
Copy link
Contributor

adrianmo commented Jul 5, 2017

@bottkars What problem are you experiencing with the installer image?

Also, adding @padthaitofuhot as the owner of that.

@adrianmo adrianmo reopened this Jul 5, 2017
@adrianmo adrianmo closed this as completed Jul 5, 2017
@bottkars
Copy link
Contributor Author

bottkars commented Jul 5, 2017

@adrianmo @padthaitofuhot, the installer image still dies ( sais remove and then dead) in step 1 without finishing. it still takes multiple kills to finally get it to work.reproducable

@padthaitofuhot padthaitofuhot reopened this Jul 5, 2017
@padthaitofuhot
Copy link
Contributor

Don't close this until we're confident the issue has been dealt with.

@padthaitofuhot padthaitofuhot self-assigned this Jul 5, 2017
@padthaitofuhot
Copy link
Contributor

@bottkars Which branch (master or develop) are you seeing this issue in?

@bottkars
Copy link
Contributor Author

bottkars commented Jul 5, 2017

i am using master currently

@bottkars bottkars closed this as completed Jul 5, 2017
@bottkars bottkars reopened this Jul 5, 2017
@padthaitofuhot
Copy link
Contributor

Okay, I think I've dramatically replicated this issue:

CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS                     PORTS               NAMES
836bb0aa2657        emccorp/ecs-install:latest   "/usr/local/bin/en..."   7 minutes ago       Dead                                           confident_banach
6431d815db81        emccorp/ecs-install:latest   "/bin/echo echo 'D..."   7 minutes ago       Exited (0) 7 minutes ago                       ecs-install-data
416dd787e510        emccorp/ecs-install:latest   "/usr/local/bin/en..."   16 minutes ago      Dead                                           festive_wing
87942ee17b64        9c11e187a9e9                 "/usr/local/bin/en..."   22 minutes ago      Dead                                           pensive_goldwasser

It appears the container namespace is dying after the container ENTRYPOINT script, entrypoint.sh, exits, but before control is passed back to the Docker daemon for clean-up.

TASK [common_baseline_check : Common | Make sure block device(s) are at least 100GB] ***

TASK [common_baseline_check : Common | Make sure block device(s) are unpartitioned] ***
ok: [192.168.2.221] => (item=/dev/vda)

TASK [common_baseline_check : fail] ********************************************

TASK [common_baseline_check : Common | Check for listening layer 4 ports] ******
changed: [192.168.2.221]

TASK [common_baseline_check : Common | Report any conflicts with published ECS ports] ***

TASK [common_baseline_check : Common | Report any conflicts with internal ECS ports] ***

PLAY RECAP *********************************************************************
192.168.2.221              : ok=11   changed=1    unreachable=0    failed=0

Playbook run took 0 days, 0 hours, 0 minutes, 15 seconds
+ cond_incr_rc 0
+ [ 0 -lt 0 ]
+ simple_shutdown aria2c opentracker
+ processlist=aria2c opentracker
+ wait
+ retry condkill aria2c -INT+
retry condkill opentracker -INT+
local tries=0
+ local+  tries=0condkill
 aria2c -INT
+ condkill opentracker+  -INT
process=aria2c
+ signal=-INT+
process=opentracker
+ signal=-INT
+ + pgreppgrep opentracker aria2c

+ [ -z  ]
+ return 0
+ [ -z  ]
+ return 0
+ cond_incr_rc 0
+ [ 0 -lt 0 ]
+ wait
+ retry condkill aria2c -TERM
+ local tries=0
+ condkill aria2c -TERM
+ process=aria2c
+ signal=-TERM
+ retry condkill opentracker -TERM
+ local tries=0
+ condkill opentracker -TERM
+ process=opentracker
+ signal=-TERM
+ pgrep aria2c
+ pgrep opentracker
+ [ -z  ]
+ [ -z  ]+
return 0
+ return 0
+ cond_incr_rc 0
+ [ 0 -lt 0 ]
+ wait
+ retry condkill aria2c -KILL
+ local tries=0
+ condkill aria2c -KILL
+ process=aria2c
+ + retrysignal=-KILL
 condkill opentracker -KILL
+ local tries=0
+ condkill opentracker -KILL
+ process=opentracker
+ signal=-KILL
+ pgrep aria2c
+ pgrep opentracker
+ [ -z  ]
+ [ -z +  ]return
 0
+ return 0
+ cond_incr_rc 0
+ [ 0 -lt 0 ]
+ exit 0

So now I'm going to look at what's going on with Docker that's leading to this outcome.

@padthaitofuhot
Copy link
Contributor

Until we have a long-term solution, the work-around to unstick Docker is to run the following command in another shell session when Docker gets stuck:

sudo docker rm -f  $(sudo docker ps -a -f 'status=dead' --format '{{.ID}}')

If that doesn't unstick the process, please let me know.

@padthaitofuhot padthaitofuhot added this to the Install Node 2.3.0 milestone Jul 12, 2017
@padthaitofuhot padthaitofuhot mentioned this issue Jul 13, 2017
3 tasks
@padthaitofuhot padthaitofuhot added this to Current Patch In-Progress in Installer 2.x Jul 13, 2017
@padthaitofuhot padthaitofuhot moved this from Current Patch In-Progress to Needs Review in Installer 2.x Jul 13, 2017
@padthaitofuhot padthaitofuhot moved this from Needs Review to Done in Installer 2.x Jul 17, 2017
@padthaitofuhot padthaitofuhot added this to Triage in Upstream Jul 26, 2017
@padthaitofuhot padthaitofuhot removed this from Done in Installer 2.x Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Upstream
Triage
Development

No branches or pull requests

4 participants