Cassandra Tutorial 6: Setting up Cassandra Cluster in EC2 Part 2 Multi AZs with Ec2Snitch

jeanpaulazara edited this page Mar 7, 2017 · 1 revision

Status extreme rough draft, mostly just notes at this point.

Scope

Multi AZs, Ec2Snitch

Modifying CloudFormation

vpc.json - renamed subnetPrivate to subnetPrivate1, and added subnetPrivate2.

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "Setup VPC for Cassandra Cluster for Cassandra Database",

  "Outputs": {
...
    "subnetPrivate2Out": {
      "Description": "Subnet Private Id",
      "Value": {
        "Ref": "subnetPrivate2"
      },
      "Export": {
        "Name": {
          "Fn::Sub": "${AWS::StackName}-subnetPrivate2"
        }
      }
    },
  "Resources": {
...
    "subnetPrivate2": {
      "Type": "AWS::EC2::Subnet",
      "Properties": {
        "CidrBlock": "10.0.2.0/24",
        "AvailabilityZone": "us-west-2b",
        "VpcId": {
          "Ref": "vpcMain"
        },
        "Tags": [
          {
            "Key": "cloudgen",
            "Value": "cassandra-test"
          },
          {
            "Key": "Name",
            "Value": "Private subnet2"
          }
        ]
      }
...
    "subnetPrivate2AclAssociation": {
      "Type": "AWS::EC2::SubnetNetworkAclAssociation",
      "Properties": {
        "NetworkAclId": {
          "Ref": "networkACL"
        },
        "SubnetId": {
          "Ref": "subnetPrivate2"
        }
      }
    },
...
   "subnetPrivate2RouteTableAssociation": {
      "Type": "AWS::EC2::SubnetRouteTableAssociation",
      "Properties": {
        "RouteTableId": {
          "Ref": "routeTablePrivate"
        },
        "SubnetId": {
          "Ref": "subnetPrivate"
          "Ref": "subnetPrivate2"
        }
      }
    },
...

Notice that private subnet is in the AZ us-west-2b (subnet 1 is in AZ us-west-2a), and it has the CIDR block 10.0.2.0/24 (subnet 1 has CIDR block 10.0.3.0/24).

vpc.json - subnetPrivate2 in different AvailabilityZone

    "subnetPrivate2": {
      "Type": "AWS::EC2::Subnet",
      "Properties": {
        "CidrBlock": "10.0.2.0/24",
        "AvailabilityZone": "us-west-2b",
        "VpcId": {
          "Ref": "vpcMain"
        },

Update the stack

update-vpc-cloudformation.sh

#!/usr/bin/env bash
set -e

source bin/ec2-env.sh

aws --region ${REGION} s3 cp cloud-formation/vpc.json s3://$CLOUD_FORMER_S3_BUCKET
aws --region ${REGION} cloudformation update-stack --stack-name ${ENV}-vpc-cassandra \
--template-url "https://s3-us-west-2.amazonaws.com/$CLOUD_FORMER_S3_BUCKET/vpc.json" \

Running update-vpc-cloudformation.sh

bin/update-vpc-cloudformation.sh 
upload: cloud-formation/vpc.json to s3://cloudurable-cloudformer-templates/vpc.json
{
    "StackId": "arn:aws:cloudformation:us-west-2:821683928919:stack/dev-vpc-cassandra/39987f10-0303-11e7-b054-50a686be7356"
}

Side Note: Update failed

The update failed because we renamed subnetPrivate to subnetPrivate1. And then there was a conflict with the CIDR addresses. We went into the AWS CloudFormation GUI and deleted the stack manually. The moral of that story is if you plan to have multiple subnets (future subnets), then get the naming down correctly so you can update (faster) instead of delete and re-create the stack (much longer).

Side Note: Delete failed too

Deleting the CloudFormation stack that has a NATGateway takes a long time. It is like watching a complete baseball game with no beer, and 6 extra innings. Also I stop my instances at the end of a workday and they were preventing a security group or two from getting deleted which were in turn preventing the VPC and subnets from being deleted. I had to go terminate the instances. Also sometimes manually deleting a VPC through the AWS GUI console gives more detailed error messages than deleting through the CloudFormation GUI. After manually attempting a VPC delete, fixing the problem of instances being stopped instead of terminated, I then went back and ran delete from the CloudFormation GUI and it worked.

Creating the VPC with CloudFormation (recreate)

run-vpc-cloudformation.sh - create our cloudformation stack with an extra AZ

#!/usr/bin/env bash
set -e

source bin/ec2-env.sh

aws --region ${REGION} s3 cp cloud-formation/vpc.json s3://$CLOUD_FORMER_S3_BUCKET
aws --region ${REGION} cloudformation create-stack --stack-name ${ENV}-vpc-cassandra \
--template-url "https://s3-us-west-2.amazonaws.com/$CLOUD_FORMER_S3_BUCKET/vpc.json" 

Run run-vpc-cloudformation.sh to recreate the VPC

bin/run-vpc-cloudformation.sh 
upload: cloud-formation/vpc.json to s3://cloudurable-cloudformer-templates/vpc.json
{
    "StackId": "arn:aws:cloudformation:us-west-2:821683928919:stack/dev-vpc-cassandra/f40d2dc0-0379-11e7-8ebb-503aca41a035"
}

get output variable and edit bin/ec2-env.sh to match.

$ aws cloudformation describe-stacks --stack-name dev-vpc-cassandra | jq .Stacks[].Outputs[]
{
  "Description": "Subnet Private Id",
  "OutputKey": "subnetPrivate1Out",
  "OutputValue": "subnet-7bf75abc"
}
{
  "Description": "Subnet Private Id",
  "OutputKey": "subnetPrivate2Out",
  "OutputValue": "subnet-e2e56abc"
}
{
  "Description": "Subnet Public Id",
  "OutputKey": "subnetPublicOut",
  "OutputValue": "subnet-7af75abc"
}
{
  "Description": "Cassandra Database Node security group for Cassandra Cluster",
  "OutputKey": "securityGroupCassandraNodesOut",
  "OutputValue": "sg-48ffdabc"
}
{
  "Description": "Security Group Bastion for managing Cassandra Cluster Nodes with Ansible",
  "OutputKey": "securityGroupBastionOut",
  "OutputValue": "sg-4affdabc"
}

Modify env to match.

bin/ec2-env.sh - Modify subnets and security groups to match

#!/bin/bash
set -e


export REGION=us-west-2
export ENV=dev
export KEY_PAIR_NAME="cloudurable-$REGION"
export PEM_FILE="${HOME}/.ssh/${KEY_PAIR_NAME}.pem"
export SUBNET_PUBLIC=subnet-7af75abc
export SUBNET_PRIVATE1=subnet-7bf75abc
export SUBNET_PRIVATE2=subnet-e2e56abc
export CLOUD_FORMER_S3_BUCKET=cloudurable-cloudformer-templates

export BASTION_NODE_SIZE=t2.small
export BASTION_SECURITY_GROUP=sg-4affdabc
export BASTION_AMI=ami-6db33abc
export BASTION_EC2_INSTANCE_NAME="bastion.${ENV}.${REGION}"
export BASTION_DNS_NAME="bastion.${ENV}.${REGION}.cloudurable.com."

export CASSANDRA_NODE_SIZE=m4.large
export CASSANDRA_AMI=ami-6db33abc
export CASSANDRA_SECURITY_GROUP=sg-48ffdabc
...

Notice that SUBNET_PRIVATE was renamed to SUBNET_PRIVATE1 and that we added SUBNET_PRIVATE2. Now it is just a matter of plugging in our CloudFormation output variables into the right locations.

Our bin/create-ec2-instance-cassandra.sh only handled one private subnet in one AZ. Now we need to change it to switch on AZ so we can deploy to subnetprivate1 in AZ a (us-west-2a) or subnetprivate2 in AZ b (us-west-2b). To do this AZ switch we will add an extra argument to our bash script as follows.

bin/create-ec2-instance-cassandra.sh - pass AZ

#!/bin/bash
set -e

source bin/ec2-env.sh


# Set the private IP to 10.0.1.10 (the seed node), if first arg empty
if [ -z "$1" ]
    then
        PRIVATE_IP_ADDRESS=10.0.1.10
    else
        PRIVATE_IP_ADDRESS=$1
fi

# Set the AZ to a if empty
if [ -z "$2" ]
    then
        AZ="a"
    else
        AZ=$2
fi

if [ "$AZ" == "a" ]
    then
        SUBNET_PRIVATE="$SUBNET_PRIVATE1"
    else
        SUBNET_PRIVATE="$SUBNET_PRIVATE2"
fi

instance_id=$(aws ec2 run-instances --image-id "$CASSANDRA_AMI" --subnet-id  "$SUBNET_PRIVATE" \
 --instance-type ${CASSANDRA_NODE_SIZE} --private-ip-address ${PRIVATE_IP_ADDRESS}  \
 --iam-instance-profile "Name=$CASSANDRA_IAM_PROFILE" \
 --security-group-ids "$CASSANDRA_SECURITY_GROUP" \
 --user-data file://resources/user-data/cassandra \
 --key-name "$KEY_PAIR_NAME" | jq --raw-output .Instances[].InstanceId)

## For debugging only...
#  --associate-public-ip-address ADD this param to run-instances if you add Cassandra to pub subnet

echo "Cassandra Database: Cassandra Cluster Node ${instance_id} is being created"

aws ec2 wait instance-exists --instance-ids "$instance_id"

aws ec2 create-tags --resources "${instance_id}" --tags Key=Name,Value="${CASSANDRA_EC2_INSTANCE_NAME}" \
Key=Cluster,Value="Cassandra" Key=Role,Value="Cassandra_Database_Cluster_Node" Key=Env,Value="DEV"

echo "Cassandra Node ${instance_id} was tagged waiting for status ready"

aws ec2 wait instance-status-ok --instance-ids "$instance_id"

Create the bastion and three nodes as follows.

Create bastion to DevOps/DBA Cassandra Cluster

 bin/create-ec2-instance-bastion.sh 
bastion i-067f664ebccf23e03 is being created
i-067f664ebccf23e03 was tagged waiting to login
IP ADDRESS 55.222.233.79 bastion.dev.us-west-2.cloudurable.com.

{
"Changes":[
    {
        "Action": "UPSERT",
        "ResourceRecordSet": {
                "Type": "A",
                "Name": "bastion.dev.us-west-2.cloudurable.com.",
                "TTL": 300,
                "ResourceRecords": [{
                    "Value": "55.222.233.79"
                }]
        }
    }
]
}

IP ADDRESS 55.222.233.79
The authenticity of host '55.222.233.79' can't be established.
ECDSA key fingerprint is SHA256:ABC123.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '55.222.233.79' (ECDSA) to the list of known hosts.

Create Cassandra Node which is a Cassandra seed

bin/create-ec2-instance-cassandra.sh 
Cassandra Database: Cassandra Cluster Node i-0b351a03c58245abc is being created
Cassandra Node i-0b351a03c58245abc was tagged waiting for status ready

Create Cassandra Node in AZ A which is a Cassandra seed

bin/create-ec2-instance-cassandra.sh 
Cassandra Database: Cassandra Cluster Node i-0b351a03c58245abc is being created
Cassandra Node i-0b351a03c58245abc was tagged waiting for status ready

Notice we pass no arguments which means it will be in default AZ A, and will use default IP address 10.0.1.10.

Create Cassandra Node in AWS AZ A which joins via the Cassandra Seed

bin/create-ec2-instance-cassandra.sh 10.0.1.11
Cassandra Database: Cassandra Cluster Node i-0d293603230169abc is being created
Cassandra Node i-0d293603230169abc was tagged waiting for status ready

Notice we pass one arguments which means it will be in default AZ A, and use IP address 10.0.1.11.

Lastly we launch the third server.

Create Cassandra Node in AWS AZ B which joins via the Cassandra Seed

$ bin/create-ec2-instance-cassandra.sh 10.0.2.10 b 
Cassandra Database: Cassandra Cluster Node i-0d00c98e0a2c2babc is being created
Cassandra Node i-0d00c98e0a2c2babc was tagged waiting for status ready

Notice we pass two arguments which means it is in AZ B, and uses IP address 10.0.2.10.

Let's verify that that this is the case.

Verify that Cassandra nodes see each other and form a Cassandra Cluster

$ ssh cassandra.node0
[ansible@ip-10-0-1-10 ~]$ /opt/cassandra/bin/nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.2.10  163.11 KiB  32           73.5%             a6a07305-7099-4693-ac36-65afcfaa13d6  rack1
UN  10.0.1.10  104.41 KiB  32           69.1%             c1ba5f75-3cf2-4ee2-a580-2d311a4b0275  rack1
UN  10.0.1.11  75.87 KiB  32           57.4%             60af9ad7-27b0-4cd7-a846-d0958a0a2dc4  rack1

Do you see a problem? They are all on rack1. AZs correspond to racks.

Checking with ansible

inventory.ini - add 10.0.2.10 to inventory list

[cassandra-nodes]
cassandra.node0
10.0.1.11
10.0.2.10

running ansible playbook

$ ansible-playbook playbooks/describe-cluster.yml --verbose
Using /Users/jean/github/cassandra-image/ansible.cfg as config file

PLAY [cassandra-nodes] *********************************************************

TASK [Run NodeTool Describe Cluster command] ***********************************
changed: [10.0.1.11] => {"changed": true, "cmd": ["/opt/cassandra/bin/nodetool", "describecluster"],
...
 [10.0.1.10, 10.0.2.10, 10.0.1.11]", 

...{"changed": true, "cmd": ["/opt/cassandra/bin/nodetool", "describecluster"], ...
 [10.0.2.10, 10.0.1.10, 10.0.1.11]", ...
fatal: [10.0.2.10]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
        to retry, use: --limit @playbooks/describe-cluster.retry

PLAY RECAP *********************************************************************
10.0.1.11                  : ok=1    changed=1    unreachable=0    failed=0   
10.0.2.10                  : ok=0    changed=0    unreachable=1    failed=0   
cassandra.node0            : ok=1    changed=1    unreachable=0    failed=0

Ok. The clusters look setup but we can't seem to ssh into 10.0.2.10.

testing ssh

$ ssh 10.0.1.11 
Last login: Tue Mar  7 22:13:15 2017 from ip-10-0-0-62.us-west-2.compute.internal
[ansible@ip-10-0-1-11 ~]$ exit
logout
Shared connection to 10.0.1.11 closed.

$ ssh 10.0.2.10 
ssh: connect to host 10.0.2.10 port 22: Operation timed out

Ok so we can log into 10.0.1.11 but not 10.0.2.0.

To fix this we have to modify our ssh config as follows.

~/.ssh/config and/or ${projectDir}/ssh/ssh.config - entry per subnet

Host 10.0.2.*
    ForwardAgent yes
    IdentityFile ~/.ssh/test_rsa
    ProxyCommand ssh bastion  -W  %h:%p
    User ansible
    ControlMaster auto
    ControlPath ~/.ssh/ansible-%r@%h:%p
    ControlPersist 5m

Host 10.0.1.*
    ForwardAgent yes
    IdentityFile ~/.ssh/test_rsa
    ProxyCommand ssh bastion  -W  %h:%p
    User ansible
    ControlMaster auto
    ControlPath ~/.ssh/ansible-%r@%h:%p
    ControlPersist 5m

Or you can do this

~/.ssh/config and/or ${projectDir}/ssh/ssh.config - entry for all subnets

Host 10.0.*
    ForwardAgent yes
    IdentityFile ~/.ssh/test_rsa
    ProxyCommand ssh bastion  -W  %h:%p
    User ansible
    ControlMaster auto
    ControlPath ~/.ssh/ansible-%r@%h:%p
    ControlPersist 5m

Both work, but with the first approach you will have to remember to modify it again when you add another AZ. I prefer the first approach.

Now let's retest.

run ansible playbook - playbooks/decribe-cluster.yml

$ ansible-playbook playbooks/describe-cluster.yml 

PLAY [cassandra-nodes] *********************************************************

TASK [Run NodeTool Describe Cluster command] ***********************************
changed: [10.0.1.11]
changed: [cassandra.node0]
changed: [10.0.2.10]

PLAY RECAP *********************************************************************
10.0.1.11                  : ok=1    changed=1    unreachable=0    failed=0   
10.0.2.10                  : ok=1    changed=1    unreachable=0    failed=0   
cassandra.node0            : ok=1    changed=1    unreachable=0    failed=0   

It can reach all nodes.

Using EC2Snitch

The cassandra-cloud utility which we used in the last tutorial. Has an option called -snitch. The snitch options allows you to describe the Cassandra snitch type, examples, GossipingPropertyFileSnitch, PropertyFileSnitch, Ec2Snitch, etc. The cassandra-cloud utility defaults to "SimpleSnitch". If we instead specify Ec2Snitch, Cassandra will recognize AWS AZs as Cassandra racks. TODO MORE EXPLANATION HERE.

resources/user-data/cassandra - modify user-data for EC2 instances

#!/bin/bash
set -e

export BIND_IP=`curl http://169.254.169.254/latest/meta-data/local-ipv4`

/opt/cloudurable/bin/cassandra-cloud -cluster-name test \
                -client-address  ${BIND_IP} \
                -cluster-address  ${BIND_IP} \
                -cluster-seeds 10.0.1.10,10.0.2.10 \
                -snitch Ec2Snitch


/bin/systemctl restart  cassandra

Notice we are now passing the Ec2Snitch as the snitch and we also added 10.0.2.10 as a seed server.

Now let's start four servers.

bin/create-ec2-entire-cassandra-cluster-instancs.sh

# Create Cassandra seed nodes in AWS AZ a and AWS AZ b.
bin/create-ec2-instance-cassandra.sh 10.0.1.10 a
bin/create-ec2-instance-cassandra.sh 10.0.2.10 b


# Create two more Cassandra database servers in AZ a and b.
bin/create-ec2-instance-cassandra.sh 10.0.1.11 a
bin/create-ec2-instance-cassandra.sh 10.0.2.11 b

We are creating two seed servers (10.0.1.10, and 10.0.2.10) in two different AZs. Try to use at least three AZs for real prod applications with at least three servers per AZ. Try to have 1 seed node per AZ so an outage will not prevent you adding nodes to your Cassandra Cluster.

Verify the cassandra cluser server is up with ansible ad hoc

$ ansible cassandra.node0 -a "/opt/cassandra/bin/nodetool status"
cassandra.node0 | SUCCESS | rc=0 >>
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.1.10  88.08 KiB  32           45.9%             d4a0d593-9dac-40c8-8d0d-9146a923e9fe  2a
UN  10.0.2.10  107.21 KiB 32           50.1%             c8f0d0e8-b4de-44ce-b9a2-b8ecc1781484  2b
UN  10.0.2.11  95.09 KiB  32           43.7%             c3dabbfd-b53d-48cc-a408-6cb931935535  2b
UN  10.0.1.11  81.01 KiB  32           60.3%             2c4638f6-4d1c-4c1c-82d1-df1efce116e0  2a

The above runs the /opt/cassandra/bin/nodetool status on the Cassandra seed node for AZ A (us-west-2). Notice that the AWS region (us-west-2) becomes a Cassandra Datacenter. Also notice that there are two racks corresponding to the two AZs that we launched our Cassandra nodes into, namely, 2a, and 2b, which correspond to AWS AZs us-west-2a and us-west-2b. The Ec2Snitch maps the Cassandra Datacenters to AWS regions and Cassandra Racks to AWS AZs.

Just to be sure, let's add 10.0.2.11 to the inventory.ini under cassandra-nodes and run the status check on all of the servers using ansible.

Use ansible to run "nodetool status" on all Cassandra Database nodes

10.0.1.11 | SUCCESS | rc=0 >>
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.1.10  88.08 KiB  32           45.9%             d4a0d593-9dac-40c8-8d0d-9146a923e9fe  2a
UN  10.0.2.10  107.21 KiB 32           50.1%             c8f0d0e8-b4de-44ce-b9a2-b8ecc1781484  2b
UN  10.0.2.11  95.09 KiB  32           43.7%             c3dabbfd-b53d-48cc-a408-6cb931935535  2b
UN  10.0.1.11  81.01 KiB  32           60.3%             2c4638f6-4d1c-4c1c-82d1-df1efce116e0  2a

10.0.2.10 | SUCCESS | rc=0 >>
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.1.10  88.08 KiB  32           45.9%             d4a0d593-9dac-40c8-8d0d-9146a923e9fe  2a
UN  10.0.2.10  107.21 KiB 32           50.1%             c8f0d0e8-b4de-44ce-b9a2-b8ecc1781484  2b
UN  10.0.2.11  95.09 KiB  32           43.7%             c3dabbfd-b53d-48cc-a408-6cb931935535  2b
UN  10.0.1.11  81.01 KiB  32           60.3%             2c4638f6-4d1c-4c1c-82d1-df1efce116e0  2a

cassandra.node0 | SUCCESS | rc=0 >>
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.1.10  88.08 KiB  32           45.9%             d4a0d593-9dac-40c8-8d0d-9146a923e9fe  2a
UN  10.0.2.10  107.21 KiB 32           50.1%             c8f0d0e8-b4de-44ce-b9a2-b8ecc1781484  2b
UN  10.0.2.11  95.09 KiB  32           43.7%             c3dabbfd-b53d-48cc-a408-6cb931935535  2b
UN  10.0.1.11  81.01 KiB  32           60.3%             2c4638f6-4d1c-4c1c-82d1-df1efce116e0  2a

10.0.2.11 | SUCCESS | rc=0 >>
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.1.10  88.08 KiB  32           45.9%             d4a0d593-9dac-40c8-8d0d-9146a923e9fe  2a
UN  10.0.2.10  107.21 KiB 32           50.1%             c8f0d0e8-b4de-44ce-b9a2-b8ecc1781484  2b
UN  10.0.2.11  95.09 KiB  32           43.7%             c3dabbfd-b53d-48cc-a408-6cb931935535  2b
UN  10.0.1.11  81.01 KiB  32           60.3%             2c4638f6-4d1c-4c1c-82d1-df1efce116e0  2a

Looks like they all agree on status of the Cassandra Cluster. We can see that each Cassandra Database node is up and in the normal state.

About us

Cloudurable™: streamline DevOps for Cassandra running on AWS provides AMIs, CloudWatch Monitoring, CloudFormation templates and monitoring tools to support Cassandra in production running in EC2. We also teach advanced Cassandra courses which teaches how one could develop, support and deploy Cassandra to production in AWS EC2 for Developers and DevOps.

More info

Please take some time to read the Advantage of using Cloudurable™.

Cloudurable provides:

AWS Cassandra Deployment Guides

Cassandra Cluster/DevOps/DBA tutorial

Kafka training, Kafka consulting, Cassandra training, Cassandra consulting, Spark training, Spark consulting

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.