New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fargate task with multiple containers is not working properly #425

Closed
alihalabyah opened this Issue Feb 12, 2018 · 9 comments

Comments

Projects
None yet
2 participants
@alihalabyah

alihalabyah commented Feb 12, 2018

Hello AWS Team,

I was exploring Fargate using ecs-cli, I have both the docker-compose.yml and ecs-params.yml files inside my path with correct network, logging configurations and multiple services/containers are described with existing ECR images, the task runs after few minutes fine, however there are no logs generated and the actual service is not working on the exposed port, on the other hand when I try to keep one container per task it works fine and I can see the logs.

Can you please check this since it's a blocker for us moving forward?

@SoManyHs

This comment has been minimized.

Show comment
Hide comment
@SoManyHs

SoManyHs Feb 12, 2018

Contributor

Hi @alihalabyah,

Sorry to hear you're having issues. Could you provide the docker-compose.yml and ecs-params.yml files you are using to describe the multiple containers?

Thanks!

Contributor

SoManyHs commented Feb 12, 2018

Hi @alihalabyah,

Sorry to hear you're having issues. Could you provide the docker-compose.yml and ecs-params.yml files you are using to describe the multiple containers?

Thanks!

@alihalabyah

This comment has been minimized.

Show comment
Hide comment
@alihalabyah

alihalabyah Feb 13, 2018

Hi @SoManyHs

Sure, they are below:

docker-compose.yml

version: '2'

services:

  elasticsearch:
    image: ecr-image-url
    ports:
      - "9200:9200"
    logging:
      driver: awslogs
      options: 
        awslogs-group: company
        awslogs-region: us-east-1
        awslogs-stream-prefix: elasticsearch
    environment:
      ES_JAVA_OPTS: "-Xmx256m -Xms256m"

  logstash:
    image: ecr-image-url
    ports:
      - "5000:5000"
    logging:
      driver: awslogs
      options: 
        awslogs-group: company
        awslogs-region: us-east-1
        awslogs-stream-prefix: logstash
    environment:
      LS_JAVA_OPTS: "-Xmx256m -Xms256m"

  kibana:
    image: ecr-image-url
    ports:
      - "5601:5601"
    logging:
      driver: awslogs
      options: 
        awslogs-group: company
        awslogs-region: us-east-1
        awslogs-stream-prefix: kibana

ecs-params.yml

version: 1
task_definition:
  task_execution_role: ecsExecutionRole
  ecs_network_mode: awsvpc
  task_size:
    mem_limit: 0.5GB
    cpu_limit: 256
run_params:
  network_configuration:
    awsvpc_configuration:
      subnets:
        - "subnet-0000000"
        - "subnet-0000000"
      security_groups:
        - "sg-00000000"
      assign_public_ip: ENABLED

alihalabyah commented Feb 13, 2018

Hi @SoManyHs

Sure, they are below:

docker-compose.yml

version: '2'

services:

  elasticsearch:
    image: ecr-image-url
    ports:
      - "9200:9200"
    logging:
      driver: awslogs
      options: 
        awslogs-group: company
        awslogs-region: us-east-1
        awslogs-stream-prefix: elasticsearch
    environment:
      ES_JAVA_OPTS: "-Xmx256m -Xms256m"

  logstash:
    image: ecr-image-url
    ports:
      - "5000:5000"
    logging:
      driver: awslogs
      options: 
        awslogs-group: company
        awslogs-region: us-east-1
        awslogs-stream-prefix: logstash
    environment:
      LS_JAVA_OPTS: "-Xmx256m -Xms256m"

  kibana:
    image: ecr-image-url
    ports:
      - "5601:5601"
    logging:
      driver: awslogs
      options: 
        awslogs-group: company
        awslogs-region: us-east-1
        awslogs-stream-prefix: kibana

ecs-params.yml

version: 1
task_definition:
  task_execution_role: ecsExecutionRole
  ecs_network_mode: awsvpc
  task_size:
    mem_limit: 0.5GB
    cpu_limit: 256
run_params:
  network_configuration:
    awsvpc_configuration:
      subnets:
        - "subnet-0000000"
        - "subnet-0000000"
      security_groups:
        - "sg-00000000"
      assign_public_ip: ENABLED
@alihalabyah

This comment has been minimized.

Show comment
Hide comment
@alihalabyah

alihalabyah commented Feb 14, 2018

@SoManyHs Any updates?

@SoManyHs

This comment has been minimized.

Show comment
Hide comment
@SoManyHs

SoManyHs Feb 15, 2018

Contributor

Hi @alihalabyah,

I'm still investigating this, but one high-level observation is that your memory limits seem a bit low, especially if you're trying to run multiple containers. Have you tried tweaking that value?

Also, could you copy which ECS CLI command you're running?

Contributor

SoManyHs commented Feb 15, 2018

Hi @alihalabyah,

I'm still investigating this, but one high-level observation is that your memory limits seem a bit low, especially if you're trying to run multiple containers. Have you tried tweaking that value?

Also, could you copy which ECS CLI command you're running?

@alihalabyah

This comment has been minimized.

Show comment
Hide comment
@alihalabyah

alihalabyah Feb 15, 2018

@SoManyHs

Looks like you are right! I have updated the CPU and memory limits and at least I can see logs in CloudWatch now, however, seems like I have another problem, the MAX_MAP_COUNT should be increased on the host part but I'm not sure how this is supposed to be done on Fargate

The error I see right now for ElasticSearch is:

[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

The suggested solutions are either to run a sysctl command to update the max map count or to permanently modify it on the host which I could not find how

Take a look at:

spujadas/elk-docker#92 (comment)
spujadas/elk-docker#92 (comment)

And

http://elk-docker.readthedocs.io/#overriding-variables

I have two extra questions related to the above:

  1. Is it a normal behavior that the task goes stopped with all containers inside even if only one of them has a problem?

  2. Is there a way to override system configs for Fargate such as the max map, etc?

alihalabyah commented Feb 15, 2018

@SoManyHs

Looks like you are right! I have updated the CPU and memory limits and at least I can see logs in CloudWatch now, however, seems like I have another problem, the MAX_MAP_COUNT should be increased on the host part but I'm not sure how this is supposed to be done on Fargate

The error I see right now for ElasticSearch is:

[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

The suggested solutions are either to run a sysctl command to update the max map count or to permanently modify it on the host which I could not find how

Take a look at:

spujadas/elk-docker#92 (comment)
spujadas/elk-docker#92 (comment)

And

http://elk-docker.readthedocs.io/#overriding-variables

I have two extra questions related to the above:

  1. Is it a normal behavior that the task goes stopped with all containers inside even if only one of them has a problem?

  2. Is there a way to override system configs for Fargate such as the max map, etc?

@alihalabyah

This comment has been minimized.

Show comment
Hide comment
@alihalabyah

alihalabyah commented Feb 16, 2018

@SoManyHs Any updates?

@SoManyHs

This comment has been minimized.

Show comment
Hide comment
@SoManyHs

SoManyHs Feb 16, 2018

Contributor

Hi @alihalabyah,

Thanks for your patience. As you mentioned, the workarounds for the max_map_count error appear to be setting max_memory_map directly on the host (which may result in undesirable side effects, or using the sysctl flag on on the docker run command. Unfortunately, neither of these options are not supported in Fargate since it involves interacting with the container instance itself.

An alternative might be to use AWS ElasticSearch service, or try running your containers using the EC2 launch-type.

As for your question "Is it a normal behavior that the task goes stopped with all containers inside even if only one of them has a problem?"

This will happen if one container in your task definition is marked essential. If an essential container fails for any reason, all other containers in that task will also be stopped. (See the docs on ECS TaskDefinition for more information).

Contributor

SoManyHs commented Feb 16, 2018

Hi @alihalabyah,

Thanks for your patience. As you mentioned, the workarounds for the max_map_count error appear to be setting max_memory_map directly on the host (which may result in undesirable side effects, or using the sysctl flag on on the docker run command. Unfortunately, neither of these options are not supported in Fargate since it involves interacting with the container instance itself.

An alternative might be to use AWS ElasticSearch service, or try running your containers using the EC2 launch-type.

As for your question "Is it a normal behavior that the task goes stopped with all containers inside even if only one of them has a problem?"

This will happen if one container in your task definition is marked essential. If an essential container fails for any reason, all other containers in that task will also be stopped. (See the docs on ECS TaskDefinition for more information).

@SoManyHs SoManyHs self-assigned this Feb 16, 2018

@SoManyHs

This comment has been minimized.

Show comment
Hide comment
@SoManyHs

SoManyHs Feb 16, 2018

Contributor

@alihalabyah Here's another link that might help you debug why your container stopped: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/stopped-task-errors.html

Hope that helps!

Contributor

SoManyHs commented Feb 16, 2018

@alihalabyah Here's another link that might help you debug why your container stopped: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/stopped-task-errors.html

Hope that helps!

@SoManyHs

This comment has been minimized.

Show comment
Hide comment
@SoManyHs

SoManyHs Feb 27, 2018

Contributor

Hi @alihalabyah, did any of the most recent suggestions work for you? Since the original problem with memory limit seems to have solved the original issue, I'm going to go ahead and close this issue. If you're still having problems with `max_memory_map, please feel free to open as a new issue. Thank you!

Contributor

SoManyHs commented Feb 27, 2018

Hi @alihalabyah, did any of the most recent suggestions work for you? Since the original problem with memory limit seems to have solved the original issue, I'm going to go ahead and close this issue. If you're still having problems with `max_memory_map, please feel free to open as a new issue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment