Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

firecamp-service-cli ignores memory parameters #64

Closed
ddt7 opened this issue Jun 19, 2018 · 14 comments
Closed

firecamp-service-cli ignores memory parameters #64

ddt7 opened this issue Jun 19, 2018 · 14 comments

Comments

@ddt7
Copy link

ddt7 commented Jun 19, 2018

running the firecamp.template
logging to bastion host
running firecamp-service-cli with -reserve-memory=768
still the service task definition has: "memoryReservation": 1024,
And when testing vs t2.micro there isn't left 1024MB

@JuniusLuo
Copy link
Contributor

Which service are you creating? Could you please post all parameters?

The reserve-memory could be overwritten for some services using JVM, such as Cassandra/Kafka/ZooKeeper/ElasticSearch. For example, Cassandra JVM heap Xms and Xmx are set to the same. If Cassandra JVM heap size is set to 1024MB, the reserved memory is set to 1024MB as well. This avoids the JVM memory get swapped out to disk, which will have big impact on JVM performance.

@ddt7
Copy link
Author

ddt7 commented Jun 20, 2018

I am running cassandra as follows:
./firecamp-service-cli -region=us-east-1 -cluster=casdb -op=create-service -service-type=cassandra -service-name=t1 -replicas=3 -volume-size=5 -journal-volume-size=1 -max-memory=770 -reserve-memory=512 -cas-heap-size=256 -jmx-user=jmxuser -jmx-passwd=changeme

@JuniusLuo
Copy link
Contributor

Thanks for posting the detail cli. I could not reproduce it. Could you please post the screen shot for the container definition?

256MB is too small for a 3 nodes Cassandra. The Cassandra container image includes jolokia agent for monitoring. The JVM for a 3 nodes Cassandra needs to be at least 768MB. You will need to use the t2.small instance for test.

@ddt7
Copy link
Author

ddt7 commented Jun 20, 2018

for test i can give it 768MB
here is the container defenition
{
"executionRoleArn": null,
"containerDefinitions": [
{
"dnsSearchDomains": null,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "casdb-t1-9a7d8bff89984c4d543654344b58b284",
"awslogs-region": "us-east-1"
}
},
"entryPoint": null,
"portMappings": [
{
"hostPort": 7000,
"protocol": "tcp",
"containerPort": 7000
},
{
"hostPort": 7001,
"protocol": "tcp",
"containerPort": 7001
},
{
"hostPort": 7199,
"protocol": "tcp",
"containerPort": 7199
},
{
"hostPort": 9042,
"protocol": "tcp",
"containerPort": 9042
},
{
"hostPort": 9160,
"protocol": "tcp",
"containerPort": 9160
},
{
"hostPort": 8778,
"protocol": "tcp",
"containerPort": 8778
}
],
"command": null,
"linuxParameters": null,
"cpu": 256,
"environment": [
{
"name": "VERSION",
"value": "latest"
}
],
"ulimits": null,
"dnsServers": null,
"mountPoints": [
{
"readOnly": null,
"containerPath": "/data",
"sourceVolume": "9a7d8bff89984c4d543654344b58b284"
},
{
"readOnly": null,
"containerPath": "/journal",
"sourceVolume": "journal_9a7d8bff89984c4d543654344b58b284"
}
],
"workingDirectory": null,
"dockerSecurityOptions": null,
"memory": null,
"memoryReservation": 1024,
"volumesFrom": [],
"image": "cloudstax/firecamp-cassandra:3.11",
"disableNetworking": null,
"healthCheck": null,
"essential": true,
"links": null,
"hostname": null,
"extraHosts": null,
"user": null,
"readonlyRootFilesystem": null,
"dockerLabels": null,
"privileged": false,
"name": "casdb-t1-container"
}
],
"placementConstraints": [],
"memory": null,
"taskRoleArn": null,
"compatibilities": [
"EC2"
],
"taskDefinitionArn": "arn:aws:ecs:us-east-1:709846101695:task-definition/casdb-t1:1",
"family": "casdb-t1",
"requiresAttributes": [
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
},
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
}
],
"requiresCompatibilities": [],
"networkMode": "host",
"cpu": null,
"revision": 1,
"status": "ACTIVE",
"volumes": [
{
"name": "9a7d8bff89984c4d543654344b58b284",
"host": {
"sourcePath": "9a7d8bff89984c4d543654344b58b284"
}
},
{
"name": "journal_9a7d8bff89984c4d543654344b58b284",
"host": {
"sourcePath": "journal_9a7d8bff89984c4d543654344b58b284"
}
}
]
}

@JuniusLuo
Copy link
Contributor

Looks you are using the latest release. Tried on my testbed.

./firecamp-service-cli -region=us-east-1 -cluster=t1 -op=create-service -service-type=cassandra -service-name=t1 -replicas=1 -volume-size=1 -journal-volume-size=1 -max-memory=770 -reserve-memory=512 -cas-heap-size=384 -jmx-user=jmxuser -jmx-passwd=changeme

Both memory and memoryReservation are set correctly. While, in your output, memory is null and memoryReservation is 1024.

{
  "executionRoleArn": null,
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "t1-t1-2c4c4536f9b042fd771e2d7788e3ad67",
          "awslogs-region": "us-east-1"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 7000,
          "protocol": "tcp",
          "containerPort": 7000
        },
        {
          "hostPort": 7001,
          "protocol": "tcp",
          "containerPort": 7001
        },
        {
          "hostPort": 7199,
          "protocol": "tcp",
          "containerPort": 7199
        },
        {
          "hostPort": 9042,
          "protocol": "tcp",
          "containerPort": 9042
        },
        {
          "hostPort": 9160,
          "protocol": "tcp",
          "containerPort": 9160
        },
        {
          "hostPort": 8778,
          "protocol": "tcp",
          "containerPort": 8778
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 256,
      "environment": [
        {
          "name": "VERSION",
          "value": "latest"
        }
      ],
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [
        {
          "readOnly": null,
          "containerPath": "/data",
          "sourceVolume": "2c4c4536f9b042fd771e2d7788e3ad67"
        },
        {
          "readOnly": null,
          "containerPath": "/journal",
          "sourceVolume": "journal_2c4c4536f9b042fd771e2d7788e3ad67"
        }
      ],
      "workingDirectory": null,
      "dockerSecurityOptions": null,
      "memory": 770,
      "memoryReservation": 512,
      "volumesFrom": [],
      "image": "cloudstax/firecamp-cassandra:3.11",
      "disableNetworking": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "privileged": false,
      "name": "t1-t1-container"
    }
  ],
  "placementConstraints": [],
  "memory": null,
  "taskRoleArn": null,
  "compatibilities": [
    "EC2"
  ],
  "taskDefinitionArn": "arn:aws:ecs:us-east-1:497621646529:task-definition/t1-t1:1",
  "family": "t1-t1",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    }
  ],
  "requiresCompatibilities": [],
  "networkMode": "host",
  "cpu": null,
  "revision": 1,
  "status": "ACTIVE",
  "volumes": [
    {
      "name": "2c4c4536f9b042fd771e2d7788e3ad67",
      "host": {
        "sourcePath": "2c4c4536f9b042fd771e2d7788e3ad67"
      }
    },
    {
      "name": "journal_2c4c4536f9b042fd771e2d7788e3ad67",
      "host": {
        "sourcePath": "journal_2c4c4536f9b042fd771e2d7788e3ad67"
      }
    }
  ]
}

@JuniusLuo
Copy link
Contributor

Not sure why your testbed has this weird behavior. Could you please check the system?

  1. on the worker node, run sudo docker plugin ls
  2. on the bastion node, delete the previous cli and get the latest cli again. https://s3.amazonaws.com/cloudstax/firecamp/releases/latest/packages/firecamp-service-cli.tgz
  3. Try again using the latest cli.

@ddt7
Copy link
Author

ddt7 commented Jun 21, 2018

I ran it again btw my script took it from the same s3 latest link, it was ok with memory BUT i now task creation fail

Status reason CannotCreateContainerError: API error (500): create 66b84f2254c64e2b7430388fe1201c2c: VolumeDriver.Create: Create, GetServiceAttr error DB RecordNotFound req {66b84f2254c64e2b7430388fe1201c2c map[]}
false

task definition
{
"executionRoleArn": null,
"containerDefinitions": [
{
"dnsSearchDomains": null,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "casdb-t1-66b84f2254c64e2b7430388fe1201c2c",
"awslogs-region": "us-east-1"
}
},
"entryPoint": null,
"portMappings": [
{
"hostPort": 7000,
"protocol": "tcp",
"containerPort": 7000
},
{
"hostPort": 7001,
"protocol": "tcp",
"containerPort": 7001
},
{
"hostPort": 7199,
"protocol": "tcp",
"containerPort": 7199
},
{
"hostPort": 9042,
"protocol": "tcp",
"containerPort": 9042
},
{
"hostPort": 9160,
"protocol": "tcp",
"containerPort": 9160
},
{
"hostPort": 8778,
"protocol": "tcp",
"containerPort": 8778
}
],
"command": null,
"linuxParameters": null,
"cpu": 256,
"environment": [
{
"name": "VERSION",
"value": "latest"
}
],
"ulimits": null,
"dnsServers": null,
"mountPoints": [
{
"readOnly": null,
"containerPath": "/data",
"sourceVolume": "66b84f2254c64e2b7430388fe1201c2c"
},
{
"readOnly": null,
"containerPath": "/journal",
"sourceVolume": "journal_66b84f2254c64e2b7430388fe1201c2c"
}
],
"workingDirectory": null,
"dockerSecurityOptions": null,
"memory": 770,
"memoryReservation": 512,
"volumesFrom": [],
"image": "cloudstax/firecamp-cassandra:3.11",
"disableNetworking": null,
"healthCheck": null,
"essential": true,
"links": null,
"hostname": null,
"extraHosts": null,
"user": null,
"readonlyRootFilesystem": null,
"dockerLabels": null,
"privileged": false,
"name": "casdb-t1-container"
}
],
"placementConstraints": [],
"memory": null,
"taskRoleArn": null,
"compatibilities": [
"EC2"
],
"taskDefinitionArn": "arn:aws:ecs:us-east-1:709846101695:task-definition/casdb-t1:2",
"family": "casdb-t1",
"requiresAttributes": [
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
},
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
}
],
"requiresCompatibilities": [],
"networkMode": "host",
"cpu": null,
"revision": 2,
"status": "ACTIVE",
"volumes": [
{
"name": "66b84f2254c64e2b7430388fe1201c2c",
"host": {
"sourcePath": "66b84f2254c64e2b7430388fe1201c2c"
}
},
{
"name": "journal_66b84f2254c64e2b7430388fe1201c2c",
"host": {
"sourcePath": "journal_66b84f2254c64e2b7430388fe1201c2c"
}
}
]
}

@JuniusLuo
Copy link
Contributor

Not sure how you create the service. Looks like the service does not exist in the system.

@ddt7
Copy link
Author

ddt7 commented Jun 25, 2018

Hey i tries again with less memory and it worked. You run 2 more services manager and catalog which each take 128MB, thus 770 was probably on the edge, i tried 600 and it runs : )

@JuniusLuo
Copy link
Contributor

Good to know that it worked :) Yes, we split the catalog service out. So there are 2 services each takes 128MB, other than 1 service with 256MB before.

@ddt7
Copy link
Author

ddt7 commented Jun 27, 2018

Where can i find more documentation in order to add a service like RabbitMQ to firecamp, unless you have soon plan to do it?
And where can i find more documentation on the cassandra/elasitic search/redis concerning how does scaling up or down with number of nodes works?

@JuniusLuo
Copy link
Contributor

For adding a new service, you could refer to Cassandra service. You will need to add a few things:

  1. Generate the create service request and the service initialization request if the service requires the additional initialization after all replicas containers are running. Could refer to cascatalog.go
    The service detail configuration parameters will be stored in the service configuration file or the member configuration file. Refer to genServiceConfigs() and GenReplicaConfigs() functions in cascatalog.go

  2. Add the service dockerfile and entrypoint.sh. Could refer to https://github.com/cloudstax/firecamp/tree/master/catalog/cassandra/3.11/dockerfile. The Dockerfile could refer to the image in the docker hub, which usually mounts such as "VOLUME /var/lib/cassandra" in the official cassandra image. While, docker does not allow to overwrite the volume in the parent image. To make sure data is not written to the temporary volume, it is better to remove the default volume, explicitly specify the volume directory, and check whether the volume is mounted at entrypoint.sh.

  3. Add the service creation function to the catalog service. Could refer to the catalog cassandra functions. The function simply checks the requests, creates the service in the FireCamp management service, and runs the initialization task if necessary.

@JuniusLuo
Copy link
Contributor

For scaling up the service, could refer to scaling up Cassandra. Currently you need to use nodetool to check the scaling is completely done. Scaling down is currently not supported. Scaling down requires Cassandra to recover the down replica from other nodes, which could be a heavy operation if the system has lots of data. The scaling for ElasticSearch and Redis are not supported yet.

@jazzl0ver
Copy link
Collaborator

For future reference: I was able to fix "create ...: VolumeDriver.Create: Create, GetServiceAttr error DB RecordNotFound req {... map[]}" error by deleting task definition for that service and re-create the service from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants