A curated list of awesome Apache Cassandra packages and resources. Maintained by Rahul Singh of Anant. Feel free contact me if you'd like to collaborate on this and other awesome lists. Awesome Cassandra , Awesome Solr, Awesome Lucene
- Apache Cassandra Manage massive amounts of data, fast, without losing sleep.
Cassandra Use Cases
- Datastax Academy: Brief Introduction to Apache Cassandra
- Kaa application based on Raspberry Pi and DHT11 sensor - Cassandra IoT usecase with Raspberry Pi and a DHT11 Sensor
- Simple NodeJS Express 4 Cassandra Application - MySubscribers is a very simple application (Start of an application) which allows you to create, read, update and delete users/subscribers. This application was only created to aid the YouTube course
- An Odyssey of Cassandra - This is an old article republished but talks about transitioning from SQL to NoSQL with Cassandra.
- Datastax Enterrpise - Most widely used commercial distribution of Apache Cassandra, integrated with Apache Spark (for SparkSQL, analytics), Apache Solr (for secondary index), Apache TinkerPop based Graph stored in Cassandra, and OpsCenter.
- DDACS - Datastax Distribution of Apache Cassandra, a production ready distribution with a bulk loader supported by Datastax.
- Elassandra: Elassandra = Elasticsearch as a Cassandra secondary index.
- ScyllaDB- NoSQL data store using the seastar framework, compatible with Apache Cassandra
- YugaByte Database - YugaByteDB is a transactional, high-performance database for building distributed cloud services. It supports Cassandra-compatible and Redis-compatible APIs, with PostgreSQL in Beta.
- Microsoft Azure Cosmos DB: Apache Cassandra API - Azure Cosmos DB provides the Cassandra API (preview) for applications that are written for Apache Cassandra that need premium capabilities.
- Installing the Cassandra / Spark OSS Stack
- Install Cassandra and Spark: quick user guide for integration with Cassandra and Spark
- The Cassandra Query Language
- tghe LIMIT Clause in Apache Cassandra might not work as you think
- Building a Performant API using Go and Cassandra
- Cassandra Data Copy Tool- Java tool to copy data from one cassandra table to another
- Spring Data Cassandra Examples - Examples for the Spring Data Cassandra Project.
- Introduction to Spark & Cassandra
- From Cassandra to S3, with Spark
- Import CSV files with spark - How to import a file from S3 into cassandra using spark
- Using Apache Cassandra — A few things before you start - Great advice to read before diving deep into Cassandra.
- Top 5 reasons to use the Apache Cassandra Database - Few good reasons why you'd want to consider Apache Cassandra.
Cassandra from Relational
- RDBMS to NoSQL: Your roadmap to understanding whether NoSQL is right for you.
- MySQL to C*: mysql to cassandra migration guide
- Real-Time Replication from MySQL to Cassandra
- Cassandra Schemas for Beginners (like me) - Great article for new developers to Cassandra.
- Cassandra and Relational database schema comparison – Query vs relationship modeling
- Cassandra Query Language: CQL vs SQL
Cassandra Data Modeling
- Basic Rules Of Cassandra Data Modeling: Picking the right data model is the hardest part of using Cassandra. If you have a relational background, CQL will look familiar, but the way you use it can be very different.
- Cassandra Query Language : CQL vs. SQL
- CQL: This is not the SQL you are Looking For
- A Deep Look at the CQL Where Clause
- killrvideo-sample-schema - Sample Cassandra CQL Schema for a Youtube clone.
- Spring Data Cassandra Examples
- Common Problems in Cassandra Data Models - Presentation and Article on wide partions, tombstones, and data skew.
- Casandra Time Series Data Modeling for Massive Scale
- Cassandra Data Modeling Notes - Simple notes on how to estimate the size of your cluster.
- The Gossip Protocol - Inside Apache Cassandra. - Good visual explanation of how Cassandra keeps consistent.
- Introduction To The Apache Cassandra 3.x Storage Engine - The 3.x storage engine makes it easier for Cassandra to get bytes off disk.
- Dropping columns in Apache Cassandra 3.0
- Hinted Handoff and GC Grace Demystified - Tuning the balance between GC Grace and Hinted Handoff.
- Deletes an Tombstones - Explains how deletes create tombstones in Cassandra and what they are.
- About Deletes and Tombstones in Cassandra - Deleting distributed and replicated data from a system such as Apache Cassandra is far trickier than in a relational database.
- Null bindings on prepared statements and undesired tombstone creation - Good follow up to the last article on Tombstones.
- Undetecetable tombstones in Apache Cassandra - Indepth analysis of cell and range tombstones.
- Common Problems with Cassandra Tombstones - "Large Number of Tombstones Causes Latency and Heap Pressure"
- Curious Case of Tombstones - How someone dealt with tombstone issues and reclaimed space in their cluster.
- Understanding the Nuance of Compaction in Apache Cassandra - Overview of how Cassandra manages data on disk.
- Guide to Cassandra Thread Pools - This guide provides a description of the different thread pools and how to monitor them. Includes what to alert on, common issues and solutions. Old but very useful reference.
- Cassandra Architecture and Operations - A high level overview in one page of how Cassandra works.
- Resources for Monitoring Datastax, Cassandra, Spark, & Solr Performance
- How to Monitor Cassandra: A guide to help you monitor Cassandra performance and work metrics regardles of which monitoring tool you choose to use.
- Cassandra metrics and their use in Grafana
- Monitoring Cassandra with Prometheus
- Monitoring Cassandra With Grafana And Influx DB
- Cassandra Monitoring - Introduction (1/2)
- Cassandra Monitoring - Graphite/InfluxDB & Grafana on Docker (1/2)
- Monitoring Cassandra using Intel Snap and Grafana - This blog post describes how to monitor Apache Cassandra using the Intel Snap open source telemetry framework.
- Running commands cluster-wide without any management tool - Some tips and tricks to do basic Cluster operations without tools like Chef, Ansible, or Salt.
- Limiting Nodetool Parallel Threads - Little known tool to do nodetool operations with less resources.
- Bootstrapping Cassandra Nodes - Indepth article on how to add nodes to a running Cassandra cluster.
- Node Replacement without Bootstrapping - How to avoid the long bootstrapping process.
- Cassandra Backup and Restore - Backup in AWS using EBS Volumes - Indepth article about Backup and recovery in AWS.
- Backup Strategies for Cassandra - Good comparison of different backup and restoration strategies for Cassandra.
- Cassandra backup util - https://github.com/instaclustr/cassandra-backup
- sstable tools - A toolkit for parsing, creating and doing other fun stuff with Cassandra 3.x SSTables.
- cassandra-sstable-tools - Tools for working with sstables
Cassandra Performance Tuning
- Jon Haddad: Cassandra Summit Recap - Diagnosing Problems in Production
- A Deeper Dive - Diagnosing DSE Performance Issues with Ttop and Multidump - A good review of how to look deeper into Cassandra threads.
- Ryan Svihla's Cassandra 2.0 checklist
- Amy's Cassandra 2.1 tuning guide
- Secret HotSpot option improving GC pauses on large heaps
- DSE 5.1: Tuning Java Resource
- Analyzing Cassandra Performance with Flame Graphs - Visually examining Cassandra performance visually using Flamegraphs.
- Garbage Collection Tuning for Cassandra - Optimizing garbage collection for better performance.
- Cassandra Node Diagnostics Tools - Monitoring and audit power kit for Apache Cassandra.
- TWCS part 1 - how does it work and when should you use it? - Best suited for time series data that expires, Time Window Compaction Strategy comes with some caveats.
- Performing User Defined Compactions in Apache Cassandra - This is a process by which we tell Cassandra to create a compaction task for one or more tables explicitly.
- Graphing cassandra-stress - Benchmarking schemas and configuration changes using the cassandra-stress tool, before pushing such changes out to production is one of the things every Cassandra developer should know and regularly practice.
- Modeling real life workloads with cassandra-stress is hard -
- Gatling DSE Stress
- Gatling DSE Plugin for Gatling Load injector - This project is a plugin for the Gatling load injector. It adds CQL support in Gatling for Datastax Enterprise. It allows for benchmarking Datastax Enterprise features, including DSE Graph Fluent API.
- Gatling DSE Stress Simulation Catalog - The goal of the repo is to provide a sample of the Gatling DSE Stress Framework's usage. Feel free to submit a pull request with example simulations.
- Securing Apache Cassandra with Application Level Encryption - Discusses how to do application level data encryption to properly manage secure information in Cassandra.
- Hardening Cassandra Step by Step: Part 1 - Inter-Node Encryption (And a Gentle Intro to Certificates)
- LDAP Authenticator for Apache Cassandra - This is a pluggable authentication implementation for Apache Cassandra, providing a way to authenticate and create users based on a configured LDAP server.
- Encrypting EC2 ephemeral volumes with LUKS and AWS KMS - The example used here is Cassandra data stored on ephemeral disks.
- Docker Meet Cassandra. Cassandra Meet Docker Article reviewing how to setup a complete Cassandra application with monitoring on Docker.
- Example code from the Docker Meet Cassandra Article
- Docker-Cassandra: A set of scripts and config files to run a Cassandra cluster from Docker.
- Cassandra & Zeppelin Notebook on Docker: Docker-Compose script for Cassandra + Zeppelin setup.
- Packer: Cassandra Image - Cassandra Image using Packer for Docker and EC2 AMI. Covers managing EC2 Cassandra clusters with Ansible.
- Cassandra Docker - This is the Instaclustr public docker image for Apache Cassandra. It contains docker images for Cassandra 3.0 and 3.11.1.
- Cassandra / Elassandra Docker - Apache Cassandra and Elassandra docker images.
- Kubernetes Cassandra Operator - The Cassandra operator manages Cassandra clusters deployed to Kubernetes and automates tasks related to operating an Cassandra cluster.
- Running Cassandra on DC/OS (Mesos) - This blog will show how to setup DC/OS in the Amazon cloud, how to install Apache Cassandra on a DC/OS cluster, and finally new ways to interact with and Apache Cassandra after it is installed.
- How To Setup A Highly Available Multi-AZ Cassandra Cluster On AWS EC2
- CloudFormation Cassandra AWS- A cassandra cluster for development using Cloud Formation
- tlp-cluster, a tool for launching Cassandra clusters in AWS - A provisioning tool for Apache Cassandra designed for developers looking to both benchmark and test the correctness of Apache Cassandra. It assists with builds and starting instances on AWS.
Integrating with Cassandra
- Building a Streaming Data Hub with Elasticsearch, Kafka and Cassandra
- Docker container for Kafka - Spark streaming - Cassandra - This Dockerfile sets up a complete streaming environment for experimenting with Kafka, Spark streaming (PySpark), and Cassandra.
- sample KafkaSparkCassandra - Introductory sample scala app using Apache Spark Streaming to accept data from Kafka and write a summary to Cassandra.
- sample Spark Cassandra with SSL - Simple sample job illustrating the use of Spark to execute Apache Spark analytics with Cassandra with SSL connection.
- DataStax Spark Cassandra Connector: This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.
- Stratio Deep (deprecated): Deep is a thin integration layer between Apache Spark and several NoSQL datastores. We actually support Apache Cassandra and MongoDB, but in the near future we will add support for sever other datastores.
- sample Spark Job Server Cassandra - Simple sample job illustrating the use of Spark Jobserver to execute Apache Spark analytics with Cassandra.
- fluxcapacitor/pipeline: End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark ML, GraphX, Spark Streaming, Kafka, NiFi, Cassandra, ElasticSearch, Redis, Tachyon, HDFS, Zeppelin, iPython/Jupyter Notebook, Tableau, Twitter Algebird.
Search / Secondary Indexes
- Tuning DSE Search Tuning DSE Search – Indexing latency and query latency
- Elassandra: Elassandra = Elasticsearch as a Cassandra secondary index.
- Cassandra Lucene Index: Lucene based secondary indexes for Cassandra
- OLD - Solandra: Solandra is a real-time distributed search engine built on Apache Solr and Apache Cassandra.
- cassandra-trigger - Cassandra trigger to push realtime updates to elasticsearch
- express-cassandra - Cassandra ORM/ODM/OGM for Node.js with optional support for Elassandra & JanusGraph
- DataStax Java Driver: A Java client driver for Apache Cassandra.
- DataStax C++ Driver: A modern, feature-rich, and highly tunable C/C++ client library for Apache Cassandra (1.2+) and DataStax Enterprise (3.1+) using exclusively Cassandra's native protocol and Cassandra Query Language v3. http://datastax.github.io/cpp-driver/
- DataStax Python Driver: A modern, feature-rich and highly-tunable Python client library for Apache Cassandra (2.1+) using exclusively Cassandra's binary protocol and Cassandra Query Language v3.
- DataStax Ruby Driver : A Ruby client driver for Apache Cassandra. This driver works exclusively with the Cassandra Query Language version 3 (CQL3) and Cassandra's native protocol.
- DataStax NodeJS Driver: A modern, feature-rich and highly tunable Node.js client library for Apache Cassandra (1.2+) and DataStax Enterprise (3.1+) using exclusively Cassandra's binary protocol and Cassandra Query Language v3.
- DataStax C# Driver A modern, feature-rich and highly tunable C# client library for Apache Cassandra (1.2+) and DataStax Enterprise (3.1+) using exclusively Cassandra's binary protocol and Cassandra Query Language v3.
- DataStax PHP Driver: DataStax PHP Driver for Apache Cassandra http://datastax.github.io/php-driver/
- Achilles: Achilles is an open source Persistence Manager for Apache Cassandra,with the features like Advanced bean mapping (compound primary key, composite partition key, timeUUID...),Native collections and map support,and so.
- phpcassa: PHP client library for Apache Cassandra
- Caffinitas: Caffinitas is an advanced object mapper for Apache Cassandra which has been especially designed to work with Datastax Java Driver 2.1+ against Apache Cassandra 2.1, 2.0 or 1.2.
- Spring Data for Apache Cassandra - Spring Data for Apache Cassandra offers a familiar interface to those who have used other Spring Data modules in the past.
- gocql - Package gocql implements a fast and robust Cassandra client for the Go programming language.
- OLD - Netflix Astyanax: Astyanax is a high level Java client for Apache Cassandra, based on Thrift protocol. Not maintained.
- DbSchema - Cassandra Designer - DbSchema: Cassandra Diagram Designer & GUI Admin Tool which can do Cassandra amongst other databases.
- DBEaver - Free Universal Database Tool - A third party tool for dealing with all sorts of databases including Cassandra.
- RazorSQL - Multi DB Manager Tool - A multi-db tool for Linux, Mac, and Windows that works with Apache Cassandra.
- KDM - The Kashlev Data Modeler - An automated big data modeling tool for Apache Cassandra
- Cassandra Reaper: Automated repairs for Apache Cassandra. Supports all versions.
- cstar perf Apache Cassandra performance testing platform
- Spark Cassandra Stress A tool for testing the DataStax Spark Connector against Apache Cassandra or DSE
- trireme: Migration tool providing support for Apache Cassandra, DataStax Enterprise Cassandra, & DataStax Enterprise Solr.
- cqlmigrate - Cassandra CQL migration tool. cqlmigrate is a library for performing schema migrations on a cassandra cluster.
- cassandra-migration-tool-java - Cassandra migration tool for java is a lightweight tool used to execute schema and data migration on Cassandra database.
- cassalog - Cassalog is a schema change management library and tool for Apache Cassandra that can be used with applications running on the JVM.
- cdeploy - cdeploy is a simple tool to manage your Cassandra schema migrations in the style of dbdeploy.
- Web: Cassandra Calculator: A simple calculator to see how size / replication factor affect the system's consistency.
- Cassandra-web - A web interface for Apache Cassandra https://github.com/rohitsakala/CassandraRestfulAPI
- CassanddraRestfulAPI - CassandraRestfulAPI project exposes the cassandra data tables with the help of Restful API.
- Netflix: Staash - A language-agnostic as well as storage-agnostic web interface for storing data into persistent storage systems, the metadata layer abstracts a lot of storage details and the pattern automation APIs take care of automating common data access patterns.
- cql-vim - Cassandra CQL Syntax Highlighter for Vim
- Presto - Distributed SQL Query Engine for Big Data. Presto allows querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores.
- Sstable Tools - A toolkit for parsing, creating and doing other fun stuff with Cassandra 3.x SSTables.
- cassandra-exporter - Simple Tool to Export / Import Cassandra Tables into JSON
- Cassandra SStable Tools - A few different tools combined into one that helps admins get summaries, metadata, partition info, cell info.
- Cassandra-Client - A simple gui tool for browsing tables and data in Cassandra.
- CQL Data Modeler - A very useful tool to test out a CQL schema and visualize what the partition would like in relationship to the columns and rows.
Admin / Monitor
- DataStax OpsCenter: Simplified management for DataStax Enterprise and Cassandra database clusters.
- Cassandra Cluster Admin: Cassandra Cluster Admin is a GUI tool to help people administrate their Apache Cassandra cluster.
- Cassandra StatD Agent: Java Agent for Cassandra integration with StatsD
- Cassandra Scripts: Python based cassandra ops scripts to monitor cfstats.
- Cassandra-Tools: Python Fabric scripts to help automate the launching and managing of cluster testing on AWS.
- Cassandra Opstools: Generic scripts to review and monitor cassandra, from Spotify.
- CCM: Cassandra Cluster Manager): A script/library to create, launch and remove an Apache Cassandra cluster on localhost.
- Cassandra Nagios: Perl Based scripts to get metrics for monitoring using Jolokia.
- Cassandra Log Tools: Simple scripts for working with Apache Cassandra logs.
- Cassandra CFStats to CSV Parser: Converts the output of CFStats to CSV.
- Netflix-PriamCo-Process for backup/recovery, Token Management, and Centralized Configuration management for Cassandra.
- CStar - Apache Cassandra cluster orchestration tool for the command line.
- ctop - This is a very simple console tool for monitoring column families read/write activities at remote cassandra host.
Queues / Schedulers
- CMB: A highly available, horizontally scalable queuing and notification service compatible with AWS SQS and SNS
- CassieQ: A Distributed queue built off of Cassandra.
- Cherami : Distributed, scalable, durable, and highly available message queue system.
- scheduler : A Scala library for scheduling arbitrary code to run at an arbitrary time.
- cassandra-log4j-appender: Cassandra appenders for Log4j
Open Source Applications
- Twissandra - Twissandra is an example project, created to learn and demonstrate how to use Cassandra. Running the project will present a website that has similar functionality to Twitter.
- FiloDB - High-performance distributed analytical database + Spark SQL queries + built for streaming.
- ChronoServer - A test server for sampling how long it takes mobile & web clients to make various types of requests to a server doing common request patterns.
- Apache Cassandra Documentation Definitive documentation for all published versions.
- DataStax Documentation Documentation and Drivers from DataStax
- DataStax Academy: Free online courses on Cassandra
- Apache Cassandra Users Mailing List
- Apache Cassandra Developers Mailing List
- Apache Cassandra Commits Mailing List
- Datastax Academy Slack
- Cassandra Slack
- StackOverflow: Cassandra
- StackOverflow: cql
- StackOverflow: spark-cassandra-connector
- Quora: Cassandra
- Meetups: Cassandra
- Datastax Academy
- Codecentric: Cassandra
- Pythian: Cassandra
- Cassandra Zone - Findings and musings on Apache Cassandra
- DOAN DuyHai's Blog: Cassandra
- Amy Tobert
- Christopher Batey: Cassandra
- Distributed Bytes:Cassandra
- The Netflix Tech Blog
- Ryan Svilha
- Best Practices for Running Cassandra on AWS
- Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) | C* Summit 2016
- GumGum: Multi-Region Cassandra in AWS
- Tuning the Spark Cassandra Connector - Great talk by Russell Spitzer maintainer of the Spark Cassandra connector.
- Cassandra DataTables Using Restful API A case on how to create a performant API using Python / Flash.
- HAPI Cassandra A simple REST API with hapi nodejs framework on top of a Apache Cassandra database
- GumGum: Multi-Region Cassandra in AWS
- CQL: This is not the SQL you are Looking For
- Hardening cassandra for compliance or paranoia
- Securing Cassandra
- Tuning the Spark Cassandra Connector - Slides by Russell Spitzer maintainer of the Spark Cassandra connector.