# Creating a Table with Apache Cassandra

### The agenda is to create a table, insert into rows and query to validate the information.

Python wrapper called cassandra will be used. Install by 

```bash
pip install cassandra-driver
```

In [24]:
# Import Apache Cassandra

import cassandra

## Let's create a connection to the databasse

By creating the connection will give us the priviledge to connect to the database and create our seession that we will use to execuse queries

#### Note: This block of code will be standard in all notebook

In [25]:
from cassandra.cluster import Cluster

try:
    cluster = Cluster(['127.0.0.1']) # If you have a locally installed Apache Cassandra instance
    session = cluster.connect()
except Exception as e:
    print(e)

### Let's test the connection

We are trying to do a select * on a table we have not yet created yet, we should expect to see a nicely handled error

In [26]:
try:
    session.execute("""select * from music_library""")
except Exception as e:
    print(e)

Error from server: code=2200 [Invalid query] message="No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename"


### Let's create a keyspace to do our work in

Know that on a one node local instance Replication Stratgety will be the stratgey and replication factor, more info in subsequence video

In [27]:
try:
    session.execute("""
    create keyspace if not exists udacity
    with replication =
    { 'class' : 'SimpleStrategy', 'replication_factor' : 1}"""
                   )
except Exception as e:
    print(e)

### Connect to the keyspace. Compare this to how we had create a new session in PostgreSQL.

In [28]:
try:
    session.set_keyspace('udacity')
except Exception as e:
    print(e)

#### Unlike RDBM, We can't model our data and create our table with out more information NoSQL 

## What queries will I be performing on this data?


##### In this case I would like to be able to get every album that was released in a particular year.

```bash
select * from music_library WHERE year=1970
```

### Because of this I need to be able to do a WHERE on YEAR. YEAR will become my partition key, and artist name will be my clustering column to make each Primary key unique. Remember there are no duplicate in Apache Cassandra.

In [29]:
query = "CREATE TABLE IF NOT EXISTS music_library"
query = query + "(year int, artist_name text, album_name text, PRIMARY KEY (year, artist_name))"

try:
    session.execute(query)
except Exception as e:
    print(e)

### Let's check if the table was created

```bash
select count (*)
```

This query shouldn't be tried on a large datasets, this is for demo sake

In [30]:
query = "select count(*) from music_library"
try:
    count = session.execute(query)
except Exception as e:
    print(e)
    
print(count.one())

Row(count=0)


### Let's insert two rows

Note the syntax here

In [31]:
query = "INSERT INTO music_library (year, artist_name, album_name)"
query = query + "VALUES (%s, %s, %s)"

try:
    session.execute(query, (1970, "The Beatles", "Rubber Soul"))
except Exception as e:
    print(e)
    

try:
    session.execute(query, (1970, "The Beatles", "Let it Be"))
except Exception as e:
    print(e)

### Validate if the data was inserted 



In [32]:
query = 'SELECT * FROM music_library'

try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:  # the for loop is for printing, it will not be required if executing in cqlsh
    print(row.year, row.album_name, row.artist_name)

1970 Let it Be The Beatles


### Let's Validate our data model with our original query

select * from music_library WHERE year=1970

In [33]:
query = 'select * from music_library WHERE year=1970'

try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:  # the for loop is for printing, it will not be required if executing in cqlsh
    print(row.year, row.album_name, row.artist_name)

1970 Let it Be The Beatles


### For the sake of the demo, let's drop the table

In [34]:
query = "drop table music_library"

try:
    rows = session.execute(query)
except Exception as e:
    print(e)

### And Finally close the session and cluster connection

In [35]:
session.shutdown()
cluster.shutdown()