Cassandra Query Language (CQL) v2.0
Table of Contents
To aid in specifying the CQL syntax, we will use the following conventions in this document:
- Language rules will be given in a BNF-like notation, looking like this:
<Start> ::= <CQL_Statement>* ;
- Nonterminal symbols in syntax rules will have
- Terminal symbols will be shown in
- As an additional shortcut notation to BNF, optional symbols (that can occur zero or one times) will be followed by a
?marker. Optional symbols that can occur zero or any number of times will be followed by a
*marker. Multiple symbols may be grouped together in
(parentheses)to signify that they are all optional or repeatable together.
- In a few cases where meaning is more easily conveyed through prose, we explain a symbol and its expansion without BNF.
- Sample code will also be shown in a code block:
SELECT sample_usage FROM cql;
- References to keywords or pieces of CQL code in running text will be shown in a
CQL consists of statements. As in SQL, some statements directly make changes to data, some look up data, and some change the way data is stored.
All statements end with a semicolon.
<CQL_Statement> ::= <statementBody> ";" ; <statementBody> ::= <useStatement> | <selectStatement> | <dataChangeStatement> | <schemaChangeStatement> ; <dataChangeStatement> ::= <insertStatement> | <updateStatement> | <batchStatement> | <deleteStatement> | <truncateStatement> ; <schemaChangeStatement> ::= <createKeyspaceStatement> | <createColumnFamilyStatement> | <createIndexStatement> | <dropKeyspaceStatement> | <dropColumnFamilyStatement> | <dropIndexStatement> | <alterTableStatement> ;
String literals and identifiers (including keyspace and column family names) are case-sensitive, but CQL keywords are not. We show CQL keywords in this document in
UPPERCASE merely as a convention to aid readability.
Literal values can be expressed in several ways in CQL.
<term> ::= "KEY" | <identifier> | <stringLiteral> | <integer> | <float> | <uuid> ;
<identifier>is a letter followed by any sequence of letters, digits, or the underscore (
<stringLiteral>is encased in
'single quotes'. A single quote itself can be represented in a string literal by doubling it, as in SQL:
'Single quote -> '' <-'.
<integer>consists of an optional minus sign (
-) followed by one or more digits (
<uuid>can be expressed in the canonical UUID form: 32 hex digits (
a-f, case insensitive), separated by dashes (
-) after the 8th, 12th, 16th, and 20th digits. Example:
<float>is a series of one or more decimal digits, followed by a period (
.), and one or more decimal digits following. Note that there is no provision for “e” notation, no optional
+sign, and the forms
42.are not accepted. Use
- Whitespace is not significant except to separate terms, and inside string literals.
Comments in CQL can begin with a double dash (
--) or a double slash (
//) and extend to the end of the line. Multiline comments are enclosed in
/* ... */.
Data Storage Types
<storageType> ::= "ascii" | "bigint" | "blob" | "boolean" | "counter" | "decimal" | "double" | "float" | "int" | "text" | "timestamp" | "uuid" | "varchar" | "varint" ;
The following table gives additional information on the available data types.
|ascii||ASCII character string|
|bigint||64-bit signed long|
|blob||Arbitrary bytes (no validation)|
|boolean||true or false|
|counter||Counter column (64-bit long)|
|double||64-bit IEEE-754 floating point|
|float||32-bit IEEE-754 floating point|
|int||32-bit signed int|
|text||UTF8 encoded string|
|timestamp||A timestamp. See Working with dates below for more information.|
|uuid||Type 1 or type 4 UUID|
|varchar||UTF8 encoded string|
Note: In addition to the recognized types listed above, it is also possible to supply a string containing the name of a class (a sub-class of
AbstractType loadable by Cassandra). The class name should either be fully qualified, or relative to the
Working with dates
Values serialized with the
timestamp type are encoded as 64-bit signed integers representing a number of milliseconds since the standard base time known as “the epoch”: January 1 1970 at 00:00:00 GMT.
Timestamp types can be input in CQL as simple long integers, giving the number of milliseconds since the epoch, as defined above.
Timestamp types can also be input as string literals in any of the following ISO 8601 formats, each representing the time and date Jan 2, 2003, at 04:05:00 AM, GMT.:
+0000 above is an RFC 822 4-digit time zone specification;
+0000 refers to GMT. US Pacific Standard Time is
-0800. The time zone may be omitted if desired— the date will be interpreted as being in the time zone under which the coordinating Cassandra node is configured.
There are clear difficulties inherent in relying on the time zone configuration being as expected, though, so it is recommended that the time zone always be specified for timestamps when feasible.
The time of day may also be omitted, if the date is the only piece that matters:
In that case, the time of day will default to 00:00:00, in the specified or default time zone.
<useStatement> ::= "USE" <term> ;
USE statement consists of the
USE keyword, followed by a valid keyspace name. Its purpose is to assign the per-connection, current working keyspace. All subsequent keyspace-specific actions will be performed in the context of the keyspace selected, unless otherwise specified, until another USE statement is issued or the connection terminates.
<selectStatement> ::= "SELECT" <whatToSelect> "FROM" ( <name> "." )? <name> ( "USING" "CONSISTENCY" <consistencylevel> )? ( "WHERE" <selectWhereClause> )? ( "LIMIT" <integer> )? ; <whatToSelect> ::= <term> ( "," <term> )* | ("FIRST" <integer> )? "REVERSED"? <columnRange> | "COUNT" "(" <countTarget> ")" ; <columnRange> ::= <term> ".." <term> | "*" ; <countTarget> ::= "*" | "1" ; <name> ::= <identifier> | <stringLiteral> | <integer> ; <selectWhereClause> ::= <relation> ( "AND" <relation> )* | <term> "IN" "(" <term> ( "," <term> )* ")" ; <relation> ::= <term> <relationOperator> <term> ; <relationOperator> ::= "=" | "<" | ">" | "<=" | ">=" ;
SELECT Name, Occupation FROM People WHERE key IN (199, 200, 207); SELECT FIRST 3 REVERSED 'time199'..'time100' FROM Events; SELECT COUNT(*) FROM system.Migrations;
SELECT is used to read one or more records from a Cassandra column family. It returns a result-set of rows, where each row consists of a key and a collection of columns corresponding to the query.
SELECT col1, col2 FROM ... SELECT range_lo..range_hi FROM ... SELECT * FROM ... SELECT FIRST 4 REVERSED range_hi..range_lo FROM ...
SELECT expression determines which columns will appear in the results and can take a few different forms, as shown above. The simplest is a comma-separated list of column names. Note that column names in Cassandra can be specified with string literals or integers, in addition to identifiers.
It is also possible to specify a range of column names. The range notation consists of start and end column names, separated by two periods (
..). The set of columns returned for a range is start and end inclusive. A single star (
*) may be used as a range to request “all columns”.
When using a range, it is sometimes useful to limit the number of columns that can be returned as part of each row (since Cassandra is schemaless, it is not necessarily possible to determine ahead of time how many columns will be in the result set). To accomplish this, use the
FIRST clause with an integer to specify an upper limit on the number of columns returned per row. The default limit is 10,000 columns.
REVERSED option causes the sort order of the columns returned to be reversed. This affects the
FIRST clause; when limiting the columns returned, the columns at the end of the range will be selected instead of the ones at the beginning of the range.
SELECT expression may also be
COUNT(*). In this case, the result will be only one value: the number of rows which matched the query.
It is worth noting that unlike the projection in a SQL SELECT, there is no guarantee that the results will contain all of the columns specified, because Cassandra is schemaless.
SELECT ... FROM MyApp.LocationSnapshots ...; SELECT ... FROM EventTimeline ...;
FROM clause is used to specify the Cassandra column family applicable to a
SELECT query. Unlike other operations on column families, the keyspace in which the column family exists may also be specified by giving its name before the column family name, and separating them by a dot (
.). If the keyspace is not specified, the current keyspace will be used, as per normal.
SELECT ... USING CONSISTENCY QUORUM;
Following the column family clause is an optional consistency level specification.
SELECT ... WHERE KEY = 11194251 AND startdate = '2011-10-08-0500'; SELECT ... WHERE KEY >= 'AM' and KEY =< 'AZ' AND module = 17; SELECT ... WHERE keyalias IN ('key1', 'key2', 'key3', ...);
WHERE clause provides for filtering the rows that appear in results. The clause can filter on a key name, or range of keys, and in the case of indexed columns, on column values. Key filters are specified using the
KEY keyword or key alias name, followed by a relational operator (one of
<=), and then a term value. When terms appear on both sides of a relational operator it is assumed the filter applies to an indexed column. With column index filters, the term on the left of the operator must be the name of the indexed column, and the term on the right is the value to filter on.
Note: The greater-than and less-than operators (
<) result in key ranges that are inclusive of the terms. There is no supported notion of “strictly” greater-than or less-than; these operators are merely supported as aliases to
SELECT ... WHERE favoriteArtist = 'The Mighty Mighty Bosstones' LIMIT 90000;
LIMIT option to a
SELECT expression limits the number of rows returned by a query.
LIMIT defaults to 10,000 when left unset.
<insertStatement> ::= "INSERT" "INTO" <name> "(" <term> "," <term> ( "," <term> )* ")" "VALUES" "(" <term> "," <term> ( "," <term> )* ")" ( "USING" <usingOption> ( "AND" <usingOption> )* )? ; <usingOption> ::= "CONSISTENCY" <consistencylevel> | "TIMESTAMP" <integer> | "TTL" <integer> ;
INSERT INTO NerdMovies (KEY, 11924) VALUES ('Serenity', 'Nathan Fillion') USING CONSISTENCY LOCAL_QUORUM AND TTL 86400;
INSERT is used to write one or more columns to a record in a Cassandra column family. No results are returned.
The first column name in the
INSERT list must be the name of the column family key. Also, there must be more than one column name specified (Cassandra rows are not considered to exist with only a key and no associated columns).
Unlike in SQL, the semantics of
UPDATE are identical. In either case a record is created if none existed before, and updated when it does. For information on query modifiers and types, see the
UPDATE section below.
<updateStatement> ::= "UPDATE" <name> ( "USING" <usingOption> ( "AND" <usingOption> )* )? "SET" <assignment> ( "," <assignment> )* "WHERE" <updateWhereClause> ; <assignment> ::= <term> "=" <term> | <term> "=" <term> "+" <term> | <term> "=" <term> "-" <term> ; <updateWhereClause> ::= <term> "=" <term> | <term> "IN" "(" <term> ( "," <term> )* ")" ;
UPDATE NerdMovies USING CONSISTENCY ALL AND TTL 400 SET 'A 1194' = 'The Empire Strikes Back', 'B 1194' = 'Han Solo' WHERE KEY = B70DE1D0-9908-4AE3-BE34-5573E5B09F14; UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523;
UPDATE is used to write one or more columns to a record in a Cassandra column family. No results are returned. The row key can be specified using the
KEY keyword or by a key alias set per column family.
Statements begin with the
UPDATE keyword followed by a Cassandra column family name.
UPDATE Foo USING CONSISTENCY EACH_QUORUM ...
Following the column family identifier is an optional
USING clause, which can specify the consistency level for the update, or the timestamp and/or the TTL for the new columns.
UPDATE Foo USING TIMESTAMP=1318452291034 ...
UPDATE supports setting client-supplied optional timestamp for modification.
UPDATE Foo USING TTL=6800 ...
UPDATE supports setting a time to live (TTL), in seconds, for each of the added columns.
Specifying Columns and Row
UPDATE ... SET col1 = val1, col2 = val2 WHERE KEY = key1; UPDATE ... SET col3 = val3 WHERE KEY IN (key1, key2, key3); UPDATE ... SET col4 = 22 WHERE keyalias = key4;
Rows are created or updated by supplying column names and values, after the
SET keyword, in term assignment format. Multiple columns can be set by separating the name/value pairs using commas. Each update statement requires a precise set of row keys to be specified using a
WHERE clause and the
KEY keyword or key alias.
Updating Counter Columns
UPDATE ... SET name1 = name1 + <value> ...; UPDATE ... SET name1 = name1 - <value> ...;
Counter columns can be incremented or decremented by an arbitrary numeric value though the assignment of an expression that adds or substracts the value.
<deleteStatement> ::= "DELETE" ( <term> ( "," <term> )* )? "FROM" <name> ( "USING" <deleteOption> ( "AND" <deleteOption> )* )? "WHERE" <updateWhereClause> ; <deleteOption> ::= "CONSISTENCY" <consistencylevel> | "TIMESTAMP" <integer> ;
DELETE col1, col2, col3 FROM Planeteers USING CONSISTENCY ONE WHERE KEY = 'Captain'; DELETE FROM MastersOfTheUniverse WHERE KEY IN ('Man-At-Arms', 'Teela');
DELETE is used to perform the removal of one or more columns from one or more rows. The key can be given using the
KEY keyword or by the key alias set per column family.
DELETE keyword is an optional comma-delimited list of column name terms. When no column names are specified, the remove applies to the entire row(s) matched by the WHERE clause.
The column family name follows the list of column names and the keyword
Following the column family identifier is an optional consistency level specification.
DELETE ... WHERE KEY = 'some_key_value'; DELETE ... WHERE keyalias IN (key1, key2);
WHERE clause is used to determine to which row(s) a
DELETE applies. The first form allows the specification of a single keyname using the
KEY keyword (or by key alias) and the
= operator. The second form allows a list of keyname terms to be specified using the
IN notation and a parenthesized list of comma-delimited keyname terms.
<truncateStatement> ::= "TRUNCATE" <name> ;
TRUNCATE accepts a single argument for the column family name, and permanently removes all data from said column family.
<batchStatement> ::= "BEGIN" "BATCH" ( "USING" <usingOption> ( "AND" <usingOption> )* )? <batchStatementMember> ( ";" <batchStatementMember> )* "APPLY" "BATCH" ; <batchStatementMember> ::= <insertStatement> | <updateStatement> | <deleteStatement> ;
BEGIN BATCH USING CONSISTENCY QUORUM AND TTL 8640000 INSERT INTO users (KEY, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE KEY = 'user2' INSERT INTO users (KEY, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE key = 'user2' INSERT INTO users (KEY, password, name) VALUES ('user4', 'ch@ngem3c', 'Andrew') APPLY BATCH;
BATCH supports setting a client-supplied optional global timestamp which will be used for each of the operations included in the batch.
A single consistency level is used for the entire batch. It appears after the
BEGIN BATCH statement, and uses the standard consistency level specification. Batched statements default to
CONSISTENCY.ONE when left unspecified.
Only data modification statements (specifically,
DELETE) are allowed in a
BATCH is not an analogue for SQL transactions.
NOTE: While there are no isolation guarantees,
UPDATE queries are atomic within a given record.
<createKeyspaceStatement> ::= "CREATE" "KEYSPACE" <name> "WITH" <optionName> "=" <optionVal> ( "AND" <optionName> "=" <optionVal> )* ; <optionName> ::= <identifier> | <optionName> ":" <identifier> | <optionName> ":" <integer> ; <optionVal> ::= <stringLiteral> | <identifier> | <integer> ;
CREATE KEYSPACE Excelsior WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor = 1; CREATE KEYSPACE Excalibur WITH strategy_class = 'NetworkTopologyStrategy' AND strategy_options:DC1 = 1 AND strategy_options:DC2 = 3;
CREATE KEYSPACE statement creates a new top-level namespace (aka “keyspace”). Valid names are any string constructed of alphanumeric characters and underscores. Names which do not work as valid identifiers or integers should be quoted as string literals. Properties such as replication strategy and count are specified during creation using the following accepted keyword arguments:
|strategy_class||yes||The name of the replication strategy class which should be used for the new keyspace. Some often-used classes are
|strategy_options||no||Most strategies require additional arguments which can be supplied by appending the option name to the
<createColumnFamilyStatement> ::= "CREATE" "COLUMNFAMILY" <name> "(" <term> <storageType> "PRIMARY" "KEY" ( "," <term> <storageType> )* ")" ( "WITH" <optionName> "=" <cfOptionVal> ( "AND" <optionName> "=" <cfOptionVal> )* )? ; <optionName> ::= <identifier> | <optionName> ":" <identifier> | <optionName> ":" <integer> ;
CREATE COLUMNFAMILY Fish (KEY blob PRIMARY KEY);
CREATE COLUMNFAMILY FastFoodEatings (user text PRIMARY KEY)
WITH comparator=timestamp AND default_validation=int;
CREATE COLUMNFAMILY MonkeyTypes (
KEY uuid PRIMARY KEY,
) WITH comment=‘Important biological records’
AND read_repair_chance = 1.0;
CREATE COLUMNFAMILYstatements create new column family namespaces under the current keyspace. Valid column family names are strings of alphanumeric characters and underscores, which begin with a letter.
h3(#keytypes). Specifying Key Type
CREATE … (KEY ascii PRIMARY KEY, … ) …
When creating a new column family, you must specify the key type. The list of possible types is identical to column comparators/validators (see Data Storage Types), except it probably does not make sense to use
counterfor a key. It’s important to note that the key type you use must be compatible with the partitioner in use. For example,
CollatingOrderPreservingPartitionerboth require UTF-8 keys. If you use an identifier for the primary key name, instead of the
KEYkeyword, a key alias will be set automatically.
h3. Specifying Column Types (optional)
CREATE … ( … , name1 type1, name2 type2, … ) …
It is possible to assign columns a type during column family creation. Columns configured with a type are validated accordingly when a write occurs, and intelligent CQL drivers and interfaces will be able to decode the column values correctly when receiving them. Column types are specified as a parenthesized, comma-separated list of column term and type pairs. See Data Storage Types for the list of recognized types.
h3. Column Family Options (optional)
CREATE COLUMNFAMILY … WITH keyword1 = arg1 AND keyword2 = arg2;
A number of optional keyword arguments can be supplied to control the configuration of a new column family.
|. keyword|. default|. description|
|comparator|text|Determines the storage type of column names (which itself determines the sorting and validation of column names). Valid values are listed in the Data Storage Types table above.|
|comment|none|A free-form, human-readable comment.|
|row_cache_provider|SerializingCacheProvider if JNA is present, otherwise ConcurrentHashMapCacheProvider|A factory for the cache with which to back the row cache.|
|row_cache_size|0|Number of rows whose entire contents to cache in memory.|
|key_cache_size|200000|Number of keys per SSTable whose locations are kept in memory in “mostly LRU” order.|
|read_repair_chance|1.0|The probability with which read repairs should be invoked on non-quorum reads.|
|gc_grace_seconds|864000|Time to wait before garbage collecting tombstones (deletion markers).|
|default_validation|text|Determines the default storage type of column values (which itself determines the validation for column values). This option does not affect the types of columns which were defined in a
CREATE COLUMNFAMILYstatement— only new columns. Valid values are listed in the Data Storage Types table above.|
|min_compaction_threshold|4|Minimum number of SSTables needed to start a minor compaction.|
|max_compaction_threshold|32|Maximum number of SSTables allowed before a minor compaction is forced.|
|row_cache_save_period_in_seconds|0|Number of seconds between saving row caches.|
|key_cache_save_period_in_seconds|14400|Number of seconds between saving key caches.|
|compaction_strategy_options|none|CompactionStrategy specific options such as “sstable_size_in_mb” for LeveledCompactionStrategy and “min_sstable_size” for SizeTieredCompactionStrategy|
|compression_parameters|none|Compression parameters such as "sstablecompressor" and "chunklengthkb"|
h2. CREATE INDEX
::= “CREATE” “INDEX” ? “ON”
CREATE INDEX userIndex ON NerdMovies (user);
CREATE INDEX ON Mutants (abilityId);
CREATE INDEXstatement is used to create a new, automatic secondary index on the given column family, for the named column. A name for the index itself can be specified before the
ONkeyword, if desired. A single column name must be specified inside the parentheses. It is not necessary for the column to exist on any current rows (Cassandra is schemaless), but the column must already have a type (specified during the
CREATE COLUMNFAMILY, or added afterwards with
h2. DROP KEYSPACE
::= “DROP” “KEYSPACE”
bc(sample). DROP KEYSPACE MyTwitterClone;
DROP KEYSPACEstatement results in the immediate, irreversible removal of a keyspace, including all column families in it, and all data contained in those column families.
h2. DROP COLUMNFAMILY
::= “DROP” “COLUMNFAMILY”
bc(sample). DROP COLUMNFAMILY worldSeriesAttendees;
DROP COLUMNFAMILYstatement results in the immediate, irreversible removal of a column family, including all data contained in it.
h2. DROP INDEX
::= “DROP” “INDEX”
bc(sample). DROP INDEX cf_col_idx;
DROP INDEXstatement is used to drop an existing secondary index.
h2. ALTER COLUMNFAMILY
::= “ALTER” “COLUMNFAMILY”
::= “ALTER” “TYPE”
ALTERstatement is used to manipulate column family column metadata. It allows you to add new columns, drop existing columns, or change the data storage type of existing columns. No results are returned.
Specify the name of the column family to be changed after the
ALTER COLUMNFAMILYkeywords, and the name of the column to be changed, added, or dropped after the keyword corresponding to the type of change desired (
h3. Changing the type of a typed column
ALTER COLUMNFAMILY addamsFamily ALTER lastKnownLocation TYPE uuid;
ALTER COLUMNFAMILY ... ALTERchanges the expected storage type for a column. The column must already have a type in the column family metadata. The column may or may not already exist in current rows— but be aware that no validation of existing data is done. The bytes stored in values for that column will remain unchanged, and if existing data is not deserializable according to the new type, this may cause your CQL driver or interface to report errors.
h3. Adding a typed column
ALTER COLUMNFAMILY addamsFamily ADD gravesite varchar;
ALTER COLUMNFAMILY ... ADDvariant adds a typed column to a column family. The column must not already have a type in the column family metadata. The same warnings from the above
ALTERsection, about there being no validation of existing data, apply here as well.
h3. Dropping a typed column
ALTER COLUMNFAMILY addamsFamily DROP gender;
ALTER COLUMNFAMILY ... DROPstatement removes the type of a column from the column family metadata. Note that this does not remove the column from current rows; it just removes the metadata saying that the bytes stored under that column are expected to be deserializable according to a certain type.
h2. Common Idioms
h3(#consistency). Specifying Consistency
… USING CONSISTENCY LOCAL_QUORUM …
Consistency level specifications are made up the keywords
USING CONSISTENCY, followed by a consistency level identifier. Valid consistency level identifiers are as listed above. When not specified,
USING CONSISTENCY ONEis the default.
Consult your Cassandra documentation for information about how consistency levels work.
Versioning of the CQL language adheres to the Semantic Versioning guidelines. Versions take the form X.Y.Z where X, Y, and Z are integer values representing major, minor, and patch level respectively. There is no correlation between Cassandra release versions and the CQL language version.
|. version|. description|
|Major|The major version must be bumped when backward incompatible changes are introduced. This should rarely (if ever) occur.|
|Minor|Minor version increments occur when new, but backward compatible, functionality is introduced.|
|Patch|The patch version is incremented when bugs are fixed.|
Wed, 12 Oct 2011 16:53:00 -0500 – paul cannon
* Rework whole doc, adding syntax specifics and additional explanations
Fri, 09 Sep 2011 11:43:00 -0500 – Jonathan Ellis
* add int data type
Wed, 07 Sep 2011 09:01:00 -0500 – Jonathan Ellis
* Updated version to 2.0; Documented row-based count()
* Updated list of supported data types
Wed, 10 Aug 2011 11:22:00 -0500 – Eric Evans
* Improved INSERT vs. UPDATE wording.
* Documented counter column incr/descr.
Sat, 01 Jun 2011 15:58:00 -0600 – Pavel Yaskevich
* Updated to support ALTER (CASSANDRA-1709)
Tue, 22 Mar 2011 18:10:28 -0700 – Eric Evans <email@example.com>
* Initial version, 1.0.0