# Difference between partition key, composite key and clustering key in Cassandra

http://stackoverflow.com/questions/24949676/difference-between-partition-key-composite-key-and-clustering-key-in-cassandra

In [1]:
%load_ext cql

In [2]:
%%cql
DROP KEYSPACE demo 

'No results.'

In [3]:
%%cql
CREATE KEYSPACE demo 
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

'No results.'

In [4]:
%keyspace demo

Using keyspace demo


There is a lot of confusion around this, I will try to make it as simple as possible.

The primary key is a general concept to indicate one or more columns used to retrieve data from a Table.

The primary key may be **SIMPLE**

In [6]:
%%cql 
create table stackoverflow (
      key text PRIMARY KEY,
      data text      
  );

'No results.'

That means that it is made by a single column.

In [11]:
%cql insert into stackoverflow (key, data) VALUES ('han', 'solo');
%cql select * from stackoverflow where key='han';

key,data
han,solo


But the primary key can also be **COMPOSITE** (aka **COMPOUND**), generated from more columns.

In [8]:
%%cql
create table stackoverflow2 (
  key_part_one text,
  key_part_two int,
  data text,
  PRIMARY KEY(key_part_one, key_part_two)      
);

'No results.'

In a situation of **COMPOSITE** primary key, the "first part" of the key is called **PARTITION KEY** (in this example key_part_one is the partition key) and the second part of the key is the **CLUSTERING KEY** (key_part_two)

In [12]:
%cql insert into stackoverflow2 (key_part_one, key_part_two, data) VALUES ('ronaldo', 9, 'football player');
%cql insert into stackoverflow2 (key_part_one, key_part_two, data) VALUES ('ronaldo', 10, 'ex-football player');
%cql select * from stackoverflow2 where key_part_one = 'ronaldo';

key_part_one,key_part_two,data
ronaldo,9,football player
ronaldo,10,ex-football player


But you can query with all key ...

In [16]:
%cql select * from stackoverflow2 where key_part_one = 'ronaldo' and key_part_two  = 10;

key_part_one,key_part_two,data
ronaldo,10,ex-football player


**Important note:** the partition key is the minimum-specifier needed to perform a query using where clause. If you have a+ composite partition key, like the following

eg: PRIMARY KEY((col1, col2), col10, col4))

You can perform query only passing at least both col1 and col2, these are the 2 columns that defines the partition key. The "general" rule to make query is you have to pass at least all partition key columns, then you can add each key in the order they're set.

**so the valid queries are** (excluding secondary indexes)

- col1 and col2
- col1 and col2 and col10
- col1 and col2 and col10 and col 4

**Invalid:**

- col1 and col2 and col4
- anything that does not contain both col1 and col2

In [9]:
%%cql
create table stackoverflow3 (
  key_part_one text,
  key_part_two int,
  data text,
  PRIMARY KEY(key_part_one, key_part_two)      
)
WITH CLUSTERING ORDER BY (key_part_two DESC);

'No results.'

In [13]:
%cql insert into stackoverflow3 (key_part_one, key_part_two, data) VALUES ('ronaldo', 9, 'football player');
%cql insert into stackoverflow3 (key_part_one, key_part_two, data) VALUES ('ronaldo', 10, 'ex-football player');
%cql select * from stackoverflow3 where key_part_one = 'ronaldo';

key_part_one,key_part_two,data
ronaldo,10,ex-football player
ronaldo,9,football player


**Please note that the both partition and clustering key can be made by more columns**

In [10]:
%%cql
create table stackoverflow4 (
  k_part_one text,
  k_part_two int,
  k_clust_one text,
  k_clust_two int,
  k_clust_three uuid,
  data text,
  PRIMARY KEY((k_part_one,k_part_two), k_clust_one, k_clust_two, k_clust_three)      
);

'No results.'

Behind these names ...

- The **Partition Key** is responsible for **data distribution** across your nodes.
- The **Clustering Key** is responsible for **data sorting** within the partition.
- The **Primary Key** is equivalent to the **Partition Key** in a single-field-key table.
- The **Composite/Compound Key** is just a **multiple-columns** key