Apache Cassandra is a database management system that replicates large amounts of data across many servers, avoiding a single point of failure and reducing latency.
Popular open stacks on Kubernetes packaged by Google.
A Kubernetes StatefulSet manages all Cassandra Pods in this application. Each Pod runs a single instance of Cassandra.
All Pods are behind a Service object. This Cassandra application is meant to be an internal database, and access to the Cassandra instances is not authenticated by default. The Cassandra service is also not exposed to external traffic. If you want to expose Cassandra outside your cluster, you must configure authentication and other layers of protection, such as firewalls.
You can use the Cassandra Service to discover the current number of Pods and their addresses.
Get up and running with a few clicks! Install this Cassandra app to a Google Kubernetes Engine cluster using Google Cloud Marketplace. Follow the on-screen instructions.
You can use Google Cloud Shell or a local workstation to complete the steps below.
You'll need the following tools in your development environment. If you are
using Cloud Shell, gcloud
, kubectl
, Docker and Git are installed in your
environment by default.
Configure gcloud
as a Docker credential helper:
gcloud auth configure-docker
Create a new cluster from the command line:
export CLUSTER=cassandra-cluster
export ZONE=us-west1-a
gcloud container clusters create "$CLUSTER" --zone "$ZONE"
Configure kubectl
to connect to the new cluster:
gcloud container clusters get-credentials "$CLUSTER" --zone "$ZONE"
Clone this repo and the associated tools repo.
git clone --recursive https://github.com/GoogleCloudPlatform/click-to-deploy.git
An Application resource is a collection of individual Kubernetes components, such as Services, Deployments, and so on, that you can manage as a group.
To set up your cluster to understand Application resources, run the following command:
kubectl apply -f "https://raw.githubusercontent.com/GoogleCloudPlatform/marketplace-k8s-app-tools/master/crd/app-crd.yaml"
You need to run this command once.
The Application resource is defined by the Kubernetes SIG-apps community. The source code can be found on github.com/kubernetes-sigs/application.
Navigate to the cassandra
directory:
cd click-to-deploy/k8s/cassandra
Choose an instance name and
namespace
for the app. In most cases, you can use the default
namespace.
export APP_INSTANCE_NAME=cassandra-1
export NAMESPACE=default
Set the number of replicas for Cassandra:
# Setting a single node in Cassandra cluster means single point of failure.
# For production environments, consider at least 3 replicas.
export REPLICAS=3
For the persistent disk provisioning of the Cassandra StatefulSets, you will need to:
-
Set the StorageClass name. Check your available options using the command below:
kubectl get storageclass
- Or check how to create a new StorageClass in Kubernetes Documentation
-
Set the persistent disk's size. The default disk size is "5Gi".
export DEFAULT_STORAGE_CLASS="standard" # provide your StorageClass name if not "standard"
export PERSISTENT_DISK_SIZE="5Gi"
Enable Stackdriver Metrics Exporter:
NOTE: Your GCP project must have Stackdriver enabled. If you are using a non-GCP cluster, you cannot export metrics to Stackdriver.
By default, application does not export metrics to Stackdriver. To enable this
option, change the value to true
.
export METRICS_EXPORTER_ENABLED=false
Set up the image tag:
It is advised to use stable image reference which you can find on Marketplace Container Registry. Example:
export TAG="4.1"
Configure the container images:
export IMAGE_CASSANDRA="marketplace.gcr.io/google/cassandra4"
export IMAGE_METRICS_EXPORTER="marketplace.gcr.io/google/cassandra/prometheus-to-sd:${TAG}"
If you use a different namespace than the default
, run the command below to
create a new namespace:
kubectl create namespace "$NAMESPACE"
Use helm template
to expand the template. We recommend that you save the
expanded manifest file for future updates to the application.
helm template chart/cassandra \
--name "${APP_INSTANCE_NAME}" \
--namespace "${NAMESPACE}" \
--set cassandra.image.repo="${IMAGE_CASSANDRA}" \
--set cassandra.image.tag="${TAG}" \
--set cassandra.replicas="${REPLICAS}" \
--set cassandra.persistence.storageClass="${DEFAULT_STORAGE_CLASS}" \
--set cassandra.persistence.size="${PERSISTENT_DISK_SIZE}" \
--set metrics.image="${IMAGE_METRICS_EXPORTER}" \
--set metrics.exporter.enabled="${METRICS_EXPORTER_ENABLED}" \
> "${APP_INSTANCE_NAME}_manifest.yaml"
Use kubectl
to apply the manifest to your Kubernetes cluster:
kubectl apply -f "${APP_INSTANCE_NAME}_manifest.yaml" --namespace "${NAMESPACE}"
To get the GCP Console URL for your app, run the following command:
echo "https://console.cloud.google.com/kubernetes/application/${ZONE}/${CLUSTER}/${NAMESPACE}/${APP_INSTANCE_NAME}"
To view your app, open the URL in your browser.
If your deployment is successful, you can check the status of your Cassandra cluster.
On one of the Cassandra containers, run the nodetool status
command.
nodetool
is a Cassandra utility for managing a cluster. It is part of the
Cassandra container image.
kubectl exec "${APP_INSTANCE_NAME}-cassandra-0" --namespace "${NAMESPACE}" -c cassandra -- nodetool status
You can connect to the Cassandra service without exposing your cluster for external access, using the following options:
-
From a container in your Kubernetes cluster, connect using the hostname
$APP_INSTANCE_NAME-cassandra-0.$APP_INSTANCE_NAME-cassandra-svc.$NAMESPACE.svc.cluster.local
-
Use port forwarding to access the service. In a separate terminal, run the following command:
kubectl port-forward "${APP_INSTANCE_NAME}-cassandra-0" 9042:9042 --namespace "${NAMESPACE}"
Then, in your main terminal, start
cqlsh
:cqlsh --cqlversion=3.4.5 In the response, you see the Cassandra welcome message: ```shell Use HELP for help. cqlsh>
By default, the application does not have an external IP address.
If you want to expose your Cassandra cluster using an external IP address, first configure access control.
To configure Cassandra as an external service, run the following command:
envsubst '${APP_INSTANCE_NAME}' < scripts/external.yaml > scripts/external.yaml
kubectl apply -f scripts/external.yaml --namespace "${NAMESPACE}"
An external IP address is provisioned for the Service. It might take a few minutes for the IP address to be available.
Get the external IP address of the Cassandra service using the following command:
CASSANDRA_IP=$(kubectl get svc $APP_INSTANCE_NAME-cassandra-external-svc \
--namespace $NAMESPACE \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $CASSANDRA_IP
Connect cqlsh
to the external IP address, using the following command:
CQLSH_HOST=$CASSANDRA_IP cqlsh --cqlversion=3.4.5
The application is configured to expose its metrics through JMX Exporter in the Prometheus format. For more detailed information on setting up the plugin, see the JMX Exporter documentation.
You can access the metrics at [POD_IP]:9404/metrics
, where [POD_IP]
is the
IP address from the Kubernetes headless service
$APP_INSTANCE_NAME-cassandra-svc
.
Prometheus can be configured to automatically collect the application's metrics. Follow the steps in Configuring Prometheus.
You configure the metrics in the
scrape_configs
section.
The deployment includes a
Prometheus to Stackdriver (prometheus-to-sd
)
container. If you enabled the option to export metrics to Stackdriver, the
metrics are automatically exported to Stackdriver and visible in
Stackdriver Metrics Explorer.
The name of each metric starts with the application's name, which you define in
the APP_INSTANCE_NAME
environment variable.
The exporting option might not be available for GKE on-prem clusters.
Note: Stackdriver has quotas for the number of custom metrics created in a single GCP project. If the quota is met, additional metrics might not show up in the Stackdriver Metrics Explorer.
You can remove existing metric descriptors using Stackdriver's REST API.
By default, the Cassandra app is deployed using 3 replicas. To change the number of replicas, use the following command:
kubectl scale statefulsets "$APP_INSTANCE_NAME-cassandra" \
--namespace "$NAMESPACE" --replicas=[NEW_REPLICAS]
where [NEW_REPLICAS]
is the new number.
To scale down the number of replicas, use scripts/scale-down.sh, or manually scale down the cluster.
To manually remove Cassandra nodes from your cluster, and then remove pods from Kubernetes, start from the highest-numbered pod. For each node, do following:
- On the Cassandra container, Run
nodetool decommission
. - Scale down the StatefulSet by one, using the
kubectl scale sts
command. - Wait until the Pod is removed from the cluster.
- Remove any PersistentVolumes and PersistentVolumeClaims that belong to that replica.
Repeat these steps until the Cassandra cluster has the number of Pods that you want.
To scale down using the script, run the following command:
scripts/scale_down.sh --desired_number 3 \
--namespace "${NAMESPACE}" \
--app_instance_name "${APP_INSTANCE_NAME}"
For more information about scaling StatefulSets, see the Kubernetes documentation.
These steps back up your Cassandra data, database schema, and token information.
Set your installation name and Kubernetes namespace:
export APP_INSTANCE_NAME=cassandra-1
export NAMESPACE=default
The script scripts/backup.sh
does the following:
- Uploads the
make_backup.sh
script to each container. - Runs the script to create a backup package, using the
nodetool snapshot
command. - Downloads the backup to your machine.
After you run the script, the backup-$NODENUMBER.tar.gz
file contains the
backup for each node.
Run the script using the following options:
scripts/backup.sh --keyspace demo \
--namespace "${NAMESPACE}" \
--app_instance_name "${APP_INSTANCE_NAME}"
This script generates one backup file for each Cassandra node. For your whole cluster, one schema and one token ring is backed up.
Set your installation name and Kubernetes namespace:
export APP_INSTANCE_NAME=cassandra-1
export NAMESPACE=default
To restore Cassandra, you use the sstableloader
tool. The restore process is
automated in scripts/restore.sh
. Your source and
destination clusters can have a different number of nodes.
In the directory that contains your backup files, run the restore script:
scripts/restore.sh --keyspace demo \
--namespace "${NAMESPACE}" \
--app_instance_name "${APP_INSTANCE_NAME}"
The script recreates the schema and uploads data to your cluster.
For background information on rolling updates for Cassandra, see the Upgrade Guide.
Before updating, we recommend backing up your data.
Set your installation name and Kubernetes namespace:
export APP_INSTANCE_NAME=cassandra-1
export NAMESPACE=default
Assign the new image to your StatefulSet definition:
IMAGE_CASSANDRA=[NEW_MAGE_REFERENCE]
kubectl set image statefulset "${APP_INSTANCE_NAME}-cassandra" \
--namespace "${NAMESPACE}" "cassandra=${IMAGE_CASSANDRA}"
After this operation, the StatefulSet has a new image configured for the containers. However, because of the OnDelete update strategy on the StatefulSet, the pods will not automatically restart.
To start the rolling update, run the scripts/upgrade.sh
script. The script takes down and updates one replica at a time.
scripts/upgrade.sh --namespace "${NAMESPACE}" \
--app_instance_name "${APP_INSTANCE_NAME}"
-
In the GCP Console, open Kubernetes Applications.
-
From the list of applications, click Cassandra.
-
On the Application Details page, click Delete.
Set your installation name and Kubernetes namespace:
export APP_INSTANCE_NAME=cassandra-1
export NAMESPACE=default
NOTE: We recommend using a
kubectl
version that is the same as the version of your cluster. Using the same versions ofkubectl
and the cluster helps avoid unforeseen issues.
To delete the resources, use the expanded manifest file used for the installation.
Run kubectl
on the expanded manifest file:
kubectl delete -f ${APP_INSTANCE_NAME}_manifest.yaml --namespace $NAMESPACE
If you don't have the expanded manifest, delete the resources using types and a label:
kubectl delete application,statefulset,service \
--namespace $NAMESPACE \
--selector app.kubernetes.io/name=$APP_INSTANCE_NAME
By design, the removal of StatefulSets in Kubernetes does not remove PersistentVolumeClaims that were attached to their Pods. This prevents your installations from accidentally deleting stateful data.
To remove the PersistentVolumeClaims with their attached persistent disks, run
the following kubectl
commands:
for pv in $(kubectl get pvc --namespace $NAMESPACE \
--selector app.kubernetes.io/name=$APP_INSTANCE_NAME \
--output jsonpath='{.items[*].spec.volumeName}');
do
kubectl delete pv/$pv --namespace $NAMESPACE
done
kubectl delete persistentvolumeclaims \
--namespace $NAMESPACE \
--selector app.kubernetes.io/name=$APP_INSTANCE_NAME
Optionally, if you don't need the deployed application or the GKE cluster, delete the cluster using this command:
gcloud container clusters delete "$CLUSTER" --zone "$ZONE"