Cassandra backup/restore #29

jazzl0ver · 2018-02-08T17:38:07Z

Just would like to discuss ideas on the best way to have that implemented. I thought to integrate Netflix's Priam, but it doesn't seem to work as a backup/restore solution only.
Another cool tool is https://github.com/pearsontechnology/cassandra_snap. However, it needs ssh access to each instance and requires to enlist all nodes to take backup from, rather than figure that out automatically.
What are your thoughts?

JuniusLuo · 2018-02-14T19:54:18Z

Both Priam and cassandra_snap leverages Cassandra Snapshot. See Priam Backup.
We may be able to run Priam as the sidecar container of the service container. The sidecar container could access the same volume with the service container. AWS ECS supports multiple containers in one TaskDefinition. Kubernetes supports Pod, which could also run multiple containers. So the sidecar container will work for both AWS ECS and Kubernetes. While, currently Docker Swarm does not have this ability. Some prototype will be required to see if Priam works in the sidecar container. Of course, developing our own sidecar container is another option.

We could also consider to leverage EBS Snapshot. FireCamp Cassandra enables the remote JMX. We could run one or multiple job containers, which runs nodetool to connect to one or multiple Cassandra containers and flush the memtables to disk. After the flush finishes for all replicas, another job container(s) could be run to take snapshot of the EBS volumes. This is also eventual consistency with taking snapshot of all Cassandra replicas, and relies on Cassandra's built-in consistency mechanisms to resume consistency for the restored snapshot.
AWS EBS Snapshots are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved.
The restore will only take 3 steps: 1) stop all containers. 2) restore the EBS to snapshot. 3) start containers.

jazzl0ver · 2018-02-15T16:45:59Z

Thanks for sharing your thoughts!

I might be wrong, but according to Netflix/Priam#649, Priam can't be used just as a backup solution.

What do you think of having a cronjob on the each C* node which will launch nodetool snapshot followed by a aws ec2 create-snapshot and aws ec2 delete-snapshot (for old snapshots) for the volumes of that node? This job could be created/altered/deleted by a command to firecamp-manageserver. Besides time of backup we might set snapshot volumes tags (with, for example, the node information), retention time, email or SNS topic for alerts in case of issues, etc.
It would be great to automate the restore by a command to the manage server as well, giving the time of available backup. A list command could display the available recovery points (based on snapshot tags and datetime).

Having this implemented would also simplify launching a new C* from existing backup.

JuniusLuo · 2018-02-15T18:28:11Z

Yes, Priam is more than backup/recovery. Didn't check the detail design/implementation. As you posted, it might not be able to only use the backup function.

The cronjob may not be the best option. The nodes in one cluster may run multiple services. Different services will have different requirements for backup. The cronjob will end up to handle all services. It would be better to use the job container, which could be triggered on demand. The job container could launch nodetool flush and then call aws api to create EBS snapshot. Every service could have its own job container.
We should use nodetool flush instead of nodetool snapshot. The nodetool snapshot will create the hardlinks to the SSTables. If you don't delete the hardlinks by yourselves, the SSTable will never get deleted. The disk will fill up.

Yes, we could automate the restore. A list command will be part of the general data management framework.

We will evaluate other services as well for the general data management framework design.

jazzl0ver · 2018-02-15T19:05:44Z

By C* node I meant a container where C* daemon is running, not an EC2 instance. Sorry for misleading.
And, yes, of course flush, not snapshot. My bad.

Looks like the separate container (within the same task) indeed better than cronjob from the different services backup management point of view: each service might have a backup job container with its own logic.

jazzl0ver · 2019-05-29T15:35:57Z

Created two scripts to backup and restore:
https://gist.github.com/jazzl0ver/c6859e1615a0f97b8704052db0745e25
https://gist.github.com/jazzl0ver/c87c5ebfd76c07b56ffe8448f40e737b

nagaraj07 · 2019-05-31T18:22:14Z

Hi Jazz, Can we get similar script for backup and restore for Kafka and mongodb service ? Thanks, Nagaraj

…

On Wed, 29 May, 2019, 21:05 jazzl0ver ***@***.*** wrote: Created two scripts to backup and restore: https://gist.github.com/jazzl0ver/c6859e1615a0f97b8704052db0745e25 https://gist.github.com/jazzl0ver/c87c5ebfd76c07b56ffe8448f40e737b — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#29?email_source=notifications&email_token=ADMERTT3IVSSS5E4ZUFZTZTPX2PF5A5CNFSM4EP273Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWPXMMY#issuecomment-496989747>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADMERTX6CNIRP3LNCGWJYZTPX2PF5ANCNFSM4EP273QQ> .

jazzl0ver · 2019-06-03T11:21:06Z

I don't use mongodb, so no luck here. Regarding Kafka - why do you need to back it up?

nagaraj07 · 2019-06-03T11:23:38Z

Yes agreed, Kafka anyways data auto expires. It will be good if we could get something for mongodb service. Thanks, Nagaraj

…

On Mon, 3 Jun, 2019, 16:51 jazzl0ver ***@***.*** wrote: I don't use mongodb, so no luck here. Regarding Kafka - why do you need to back it up? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#29?email_source=notifications&email_token=ADMERTWX6KXMQ65OBZRWMXTPYT5CFA5CNFSM4EP273Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWZDBRY#issuecomment-498217159>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADMERTRTLMTI6HFL5U7RZ43PYT5CFANCNFSM4EP273QQ> .

jazzl0ver · 2019-06-03T11:28:21Z

I'm not a part of CloudStax team, so I can't spend time on services we don't use. Sorry about that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cassandra backup/restore #29

Cassandra backup/restore #29

jazzl0ver commented Feb 8, 2018 •

edited

JuniusLuo commented Feb 14, 2018

jazzl0ver commented Feb 15, 2018

JuniusLuo commented Feb 15, 2018

jazzl0ver commented Feb 15, 2018

jazzl0ver commented May 29, 2019

nagaraj07 commented May 31, 2019 via email

jazzl0ver commented Jun 3, 2019

nagaraj07 commented Jun 3, 2019 via email

jazzl0ver commented Jun 3, 2019

Cassandra backup/restore #29

Cassandra backup/restore #29

Comments

jazzl0ver commented Feb 8, 2018 • edited

JuniusLuo commented Feb 14, 2018

jazzl0ver commented Feb 15, 2018

JuniusLuo commented Feb 15, 2018

jazzl0ver commented Feb 15, 2018

jazzl0ver commented May 29, 2019

nagaraj07 commented May 31, 2019 via email

jazzl0ver commented Jun 3, 2019

nagaraj07 commented Jun 3, 2019 via email

jazzl0ver commented Jun 3, 2019

jazzl0ver commented Feb 8, 2018 •

edited