You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a couple of nuisances with our current worker strategy, that I think would be helped by moving most of what we've done to an orchestration system like ECS.
Log streams are currently generated by instance, so the logs from all N workers get interleaved (making it hard to find errors)
When a worker crashes, it never gets replaced.
Updating the container images is a right pain. docker kill, docker rm, docker pull, find / -name part-001, (cloud-init script) /path/to/part-001. vs. pushing a new launch template and having fresh images in a couple of minutes.
Ugly names for the ASGs (means we have to "discover" the names to do adjust desired sizes), like tf-asg-tf-serratus-dl-20200304125312000001, this is currently necessary, so that all instances get replaced when we change the user_data in the launch configuration, ECS would deal with sending the correct arguments to our scripts.
There are a couple things to work out though, first:
will we use Daemon or Replication jobs? Daemon doesn't solve 1, but replication doesn't solve 4. We need a way to force all images to be replaced if we change them.
ECS + Cloudwatch Logs
...and more, maybe?
The text was updated successfully, but these errors were encountered:
There are a couple of nuisances with our current worker strategy, that I think would be helped by moving most of what we've done to an orchestration system like ECS.
docker kill
,docker rm
,docker pull
,find / -name part-001
, (cloud-init script)/path/to/part-001
. vs. pushing a new launch template and having fresh images in a couple of minutes.tf-asg-tf-serratus-dl-20200304125312000001
, this is currently necessary, so that all instances get replaced when we change the user_data in the launch configuration, ECS would deal with sending the correct arguments to our scripts.There are a couple things to work out though, first:
The text was updated successfully, but these errors were encountered: