# Apache Cassandra 

### In this demo we are going to walk through the basics of creating a table in Apache Cassanrda, inserting rows of data, and doing a simple SQL query to validate the information.

#### We will use a python wrapper / driver called cassandra to run the Apache Cassandra queries. This library can be installed by:

! pip install cassandra-driver 


<font color='red'>*Note - Have not installed Apache Cassandra locally, or any of the relevant libraries. This code is notes only, it will not run*</font>

#### Import Apache Cassandra Python package 

In [None]:
import cassandra

### Create connection to database

This connects to out local instance of Apache Cassanrda. This connection will reach out to the database and ensure we have the correct access rights to connect, otherwise returns an error.

In [None]:
from cassandra.cluster import Cluster 

try:
    cluster = Cluster(['127.0.0.1']) # If you have a locally installed Apache Cassandra instance 
    session = clsuter.connect()
except Exception as e:
    print(e)

### Test our Connection

We are trying to do a select * on a table we have not yet created. We should expect to see a nicely handled error.

In [None]:
try:
    session.execute(""" select * from music_library""")
except Exception as e:
    print(e)

*would now get an error printed here, because that table does not actually exist yet*

### Lets create a keyspace to do our work

Remember, keyspace is like our database equivalent for a NoSQL setup

*Note: Ignore the Replication Strategy and Factor Information for now. These will be discussed in later lessons. Just know that on a one node local instance this will be the strategy and replication factor*

In [None]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS udacity 
    WITH REPLICATION =
    { 'class' : 'SimpleStrategy', 'replication_factor' : 1}
    """)
except Exception as e:
    print(e)

#### Connect to our keyspace. Compare this to how we had to create a new session in PostgreSQL.

In [None]:
try:
    session.set_keyspace('udacity')
    
except Exception as e:
    print(e)

#### We are working with Apache Cassandra a NoSQL database. We can't model our data and create our table without mor information.

## What queries will I be performing on this data?

#### In this case I would like to be able to get every album that was released in a particular year.

`SELECT * FROM music_library WHERE year=1970`

#### Because of this, I need to be able to do a WHERE clause on YEAR. So, YEAR will become my partition key, and artist anem will be my clustering column to make each Primary Key unique. Remember there are no duplicates in Apache Cassandra.

`Table Name: music_library`

`column 1: Album Name`

`column 2: Artist Name`

`column 3: Year`

`PRIMARY KEY(Year, Artist Name)`

### Now translate this information into a Create Table Statement

In [None]:
query = """CREATE TABLE IF NOT EXISTS music_library"""
query = query + " (year int, artist_name text, album_name text, PRIMARY KEY (year, artist_name))"
try:
    session.execute(query)
except Exception as e:
    print(e)

##### Now execute a select statement to ensure the table was created, even with the result being 0 because the table is empty

Note: Depending on the version of Apache Cassandra you have installed, this might throw an "ALLOW FILTERING" error instead of a result of "0". This is to ve expected, as this type of query should not be performed on large datasets, we are just doing it here for the sake of the demo.

In [None]:
check_query = "select count(*) from music_library"
try:
    count = session.execute(check_query)
except Exception as e:
    print(e)
    
print(count.one())

### Let's insert two rows of data

Note the syntax here

In [None]:
insert_query = "INSERT INTO music_library (year, artist_name, album_name)"
insert_query = insert_query + " VALUES (%s, %s, %s)"

try:
    session.execute(insert_query, (1970, "The Beatles", "Let it Be"))
except Exception as e:
    print(e)
    
try:
    session.execute(insert_query, (1965, "The Beatles", "Rubber Soul"))
except Exception as e:
    print(e)

### Validate your data was inserted into the table.

Note: The for loop is used for printing the results. If executing queries in the cqlsh, this would not be required.

Note: Depending on version of Apache Cassandra installed, this might throw an error "ALLOW FILTERING" instead of printing the 2 rows inserted into the table. This is to be expected, as this type of query should not be performed on large datasets, we are only doing this for the sake of the demo.

In [None]:
test_query = "select * from music_library"
try:
    rows = session.execute(test_query)
except Exception as e:
    print(e)
    
for row in rows:
    print(row.year, row.album_name, row.artist_name)

### Lets validate our Data Model with our original query

`select * from music_library WHERE year=1970`

In [None]:
my_query = "select * from music_library where year=1970"
try:
    rows = session.execute(my_query)
except Exception as e:
    print(e)
    
for row in rows:
    print(row.year, row.album_name, row.artist_name)

### Finally, close the session and cluster connection 

In [None]:
session.shutdown()
cluster.shutdown() 

#### End