In this demo, we are going to walk through the basics of creating a table with a good Primary Key in Apache Cassandra, inserting rows of data, and doing a simple SQL query to validate the information. 

In [1]:
import cassandra

In [2]:
from cassandra.cluster import Cluster

In [3]:
try:
    cluster = Cluster(['127.0.0.1'])
    session = cluster.connect()
except Exception as e:
    print(e)

Create keyspace to do our work in:

In [4]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS udacity
    WITH REPLICATION = 
    {'class' : 'SimpleStrategy', 'replication_factor' : 1}
    """
    )
    
except Exception as e:
    print(e)

Connect to keyspace:

In [5]:
try:
    session.set_keyspace('udacity')
except Exception as e:
    print(e)

Let's imagine we would like to start creating a new music library of albums.

We want to ask 1 question of our data:

1. Give me every album in my music library that was released in a given year <br>
```select * from music_library WHERE year=1970```

![title](demo_2.png)

How should we model this data? What should be our Primary Key and Partition Key? Since our data is looking for the YEAR let's start with that. Is Partitioning our data by year a good idea? In this case our data is very small, but if we had a larger data set of albums, partitions by YEAR may be a fine choice. We would need to validate from our dataset. We want an equal spread of the data. 

Table Name: music library<br>
column 1: Year<br>
column 2: Artist Name<br>
column 3: Album Name <br>
column 4: City<br>
PRIMARY KEY (year)

In [12]:
query = "CREATE TABLE IF NOT EXISTS music_library "
query = query + "(year int, artist_name text, album_name text, city text, PRIMARY KEY (year))"
try:
    session.execute(query)
except Exception as e:
    print(e)

Let's insert data into the table:

In [13]:
query = "INSERT INTO music_library (year, artist_name, album_name, city)"
query = query + " VALUES (%s, %s, %s, %s)"

try:
    session.execute(query, (1970, "The Beatles", "Let It Be", "Liverpool"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1965, "The Beatles", "Rubber Soul", "Oxford"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1965, "The Who", "My Generation", "London"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1966, "The Monkees", "The Monkees", "Los Angeles"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1970, "The Carpenters", "Close To You", "San Diego"))
except Exception as e:
    print(e)

Let's validate our Data Model - Did it work? If we look for albums from 1965 we should expect to see 2 rows:<br>
```select * from music_library WHERE YEAR=1965```

In [14]:
query = "SELECT * FROM music_library WHERE YEAR = 1965"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print(row.year, row.artist_name, row.album_name, row.city)

1965 The Who My Generation London


We should have had two rows generated as output instead of one. This did not work because we did not create a unique primary key. Multiple rows can have the same primary key of YEAR

Now let's focus on making the PRIMARY KEY unique. Look at our dataset, do we have anything that is unique for each row? If we choose CITY and ALBUM NAME, it will not get us the query we need, which is looking for albums in a particular year. 

Let's choose a composite key of YEAR and ALBUM NAME. This is assuming that an album name is unique to the year it was released (good bet). But this is just a demo, and you will need to understand your dataset fully (no betting!)

In [15]:
query = "CREATE TABLE IF NOT EXISTS music_library_1 "
query = query + "(year int, artist_name text, album_name text, city text, PRIMARY KEY (year, album_name))"
try:
    session.execute(query)
except Exception as e:
    print(e)

In [16]:
query = "INSERT INTO music_library_1 (year, artist_name, album_name, city)"
query = query + " VALUES (%s, %s, %s, %s)"

try:
    session.execute(query, (1970, "The Beatles", "Let It Be", "Liverpool"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1965, "The Beatles", "Rubber Soul", "Oxford"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1965, "The Who", "My Generation", "London"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1966, "The Monkees", "The Monkees", "Los Angeles"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1970, "The Carpenters", "Close To You", "San Diego"))
except Exception as e:
    print(e)

Let's validate our Data Model - Did it work? If we look for albums from 1965 we should expect to see 2 rows:

In [17]:
query = "SELECT * FROM music_library_1 WHERE YEAR = 1965"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print(row.year, row.artist_name, row.album_name, row.city)

1965 The Who My Generation London
1965 The Beatles Rubber Soul Oxford


Success!

Drop tables:

In [18]:
query = "DROP TABLE music_library"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)

query = "DROP TABLE music_library_1"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)

In [19]:
# Close session and cluster connection:
session.shutdown()
cluster.shutdown()