# SQL Tutorial

SQL is a language for interacting with relational databases. In practical terms, a relational database is a fast and efficient data storage and retrieval system that can encode relationships between different kinds of data stored withing the database, in addition to the data itself. 

In a relational database data is stored in tables. Each row in a table corresponds to one object, and each column corresponds to one of the object's features. For example, a store might maintain a PRODUCTS table whose rows correspond to products available in the store, and columns to inventory number, price, description, number of items available in the store, etc. Moreover, tables can have various relationships between them. For example, there may be an ORDERS table in the example store's database as well. This table would store an order total, order data, as well as pointers to all the products in the order. This last bit is what makes the database relational: there is a many-to-many relationship between rows of orders and rows of products (although other relationship types can be stored in databases as well).

SQL, which stands for Structured Query Language, is a programmatic way of accessing the rows of a database through queries. These queries get executed on a particular database program. There are many relational database programs, such as MySQL, PostgreSQL, SQLite, and they each implement a slightly different set of features and use a slightly different version of SQL (they all try to implement the same standards). For the purposes of this tutorial, however, these differences won't matter much. 

## SQLite Example Database

We will demonstrate some features of relational databases using SQLite. Typically, SQL database systems require some setup (running a server, accepting connections through which queries for the data can be sent, etc.), so since SQLite stores its data in a local file and requires minimal setup, it is a good choice for a first encounter with databases.

We will use an example database called Chinook, available at http://chinookdatabase.codeplex.com. It stores data of a fake digital media store, including tables for artists, albums, media tracks, invoices and customers. The data is modeled after somebody's iTunes library, and customer data is fictitious.

To connect to this database, you could open a terminal prompt, and run 

    $ sqlite3 Chinook_Sqlite.sqlite
    
You would be greeted with a command prompt for the database into which you could input various commands. In this tutorial, we will introduce the Python interface for sqlite instead. The queries themselves are identical, but Python allows for nicer printing and other niceties coming with a full programming interface. 

This is also a good time to mention some useful software for working with SQLite databases. In particular, SQLite Manager is a free FireFox addon which can be used to easitly query a SQLite database. Other database systems have similar software. For example, MySQL Workbench.

### Database Schema

The database schema for our example database is visible in the image below. Each table is listed in a gray box, columns are listed underneath each table's name, and relationships between tables are indicated by arrows.

![caption](ChinookDatabaseSchema1.1.png)

You can get basic information from the database itself, of course, by using queries to modify tables, create new tables, change existing columns in tables, etc. However, we will start with queries for selecting data of interest from the database.

####  `SELECT` and `SELECT ... WHERE` for picking out rows of interest

First, let's make a database connection and learn how to issue commands.

In [1]:
# Connecting to database
import sqlite3
connection = sqlite3.connect('Chinook_Sqlite.sqlite')

Now that we have a connection to the local database, let'we will need a "cursor" through which we can interact with the database.

In [2]:
c = connection.cursor()

Let's now pick out all artists from the Artist database and print out a list of all artists neatly.

In [3]:
# The string 'SELECT * FROM Artist' is a SELECT query. 
# You don't have to capitalize it like this, but it improves 
# readability. The * here means that we want to pick ALL columns
# and FROM Artist states that we want them from table Artist.
c.execute('SELECT * FROM Artist')

# Get the results
artist_rows = c.fetchall()

# Neatly print results:
for r in artist_rows:
    print '%3i. %s' % r

  1. AC/DC
  2. Accept
  3. Aerosmith
  4. Alanis Morissette
  5. Alice In Chains
  6. Antônio Carlos Jobim
  7. Apocalyptica
  8. Audioslave
  9. BackBeat
 10. Billy Cobham
 11. Black Label Society
 12. Black Sabbath
 13. Body Count
 14. Bruce Dickinson
 15. Buddy Guy
 16. Caetano Veloso
 17. Chico Buarque
 18. Chico Science & Nação Zumbi
 19. Cidade Negra
 20. Cláudio Zoli
 21. Various Artists
 22. Led Zeppelin
 23. Frank Zappa & Captain Beefheart
 24. Marcos Valle
 25. Milton Nascimento & Bebeto
 26. Azymuth
 27. Gilberto Gil
 28. João Gilberto
 29. Bebel Gilberto
 30. Jorge Vercilo
 31. Baby Consuelo
 32. Ney Matogrosso
 33. Luiz Melodia
 34. Nando Reis
 35. Pedro Luís & A Parede
 36. O Rappa
 37. Ed Motta
 38. Banda Black Rio
 39. Fernanda Porto
 40. Os Cariocas
 41. Elis Regina
 42. Milton Nascimento
 43. A Cor Do Som
 44. Kid Abelha
 45. Sandra De Sá
 46. Jorge Ben
 47. Hermeto Pascoal
 48. Barão Vermelho
 49. Edson, DJ Marky & DJ Patife Featuring Fernanda Porto
 50. Metallica
 

That was quite a list! We can limit the size of the output by printing out only the first 10 rows with the LIMIT clause:

In [4]:
# We just added LIMIT 10 here...
c.execute('SELECT * FROM Artist LIMIT 10')

# Get the results
artist_rows = c.fetchall()

# Neatly print results:
for r in artist_rows:
    print '%3i. %s' % r

  1. AC/DC
  2. Accept
  3. Aerosmith
  4. Alanis Morissette
  5. Alice In Chains
  6. Antônio Carlos Jobim
  7. Apocalyptica
  8. Audioslave
  9. BackBeat
 10. Billy Cobham


Let's look at the last artist returned:

In [5]:
print r

(10, u'Billy Cobham')


We see the two entries corresponding to the two columns of Artist. The first column, ArtistId, is a *unique key* used by the databse to uniquely identify that database entry. When creating a table, the user specifies which column to use as a primary key. The second column, Name, is just a string. 

In other words, each column has a type. In SQLite are NULL, INTEGER, REAL, TEXT and BLOB (blob of binary data). MySQL and PostreSQL have more data types.  

You can read more about SQLite types here: http://www.tutorialspoint.com/sqlite/sqlite_data_types.htm

Next, let's get all albums by artist Alice In Chains.

In [6]:
c.execute('SELECT * FROM Album WHERE ArtistId = 5')

# Get and print the results:
rows = c.fetchall()
for r in rows:
    print r

(7, u'Facelift', 5)


#### `JOIN`s for capturing table relationships

It is a bit annoying having to know the ArtistId of 'Alice In Chains'. 
We can use a `JOIN` query to identify two tables based on a shared key.
We will actually only talk about `INNER` joins here, although there are 
other kinds as well (`OUTER` and `CROSS`) which you can read more about here: http://www.tutorialspoint.com/sqlite/sqlite_using_joins.htm
        
Once we've joined two tables, we end up with a "temporary" table on which 
we can use a `SELECT ... WHERE` query. Let's get albums by "Led Zeppelin" (whose ArtistId we would have to hunt down) this way.

Let's write the query here and explain how it works.

    SELECT alb.Title 
    FROM Album alb 
    INNER JOIN Artist art on alb.ArtistId = art.ArtistId 
    WHERE art.Name = 'Led Zeppelin'
    
First, we exploit the fact that Python strings can be continued on the next line in order to make the query readable. We add quotes to each line, and leave an extra space at the end (so that the last word of a line doesn't get merged with first word of next line).

    "SELECT alb.Title "
    "FROM Album alb "
    "INNER JOIN Artist art on alb.ArtistId = art.ArtistId "
    "WHERE art.Name = 'Led Zeppelin'"
    
With the formatting out of the way, let's look at the query. 
- In the first two lines, the line `FROM Album alb` indicates that we are selecting from table Album and giving it a temporary name `alb`, and the first line just says to take column `Title` from table `alb`.
- The third line `INNER JOIN Artist art on alb.ArtistId = art.ArtistId` gives a temporary name `art` to the Artist table and then merges rows from the `alb` and `art` tables if they have the same ArtistId. So, it will take an artist (specified later as 'Led Zeppelin') and create a temporary row for each album that belongs to that artist. The new rows will have columns from both the alb ard art tables, so the first line makes sense.
- The last line `WHERE art.Name = 'Led Zeppelin'` is what we wanted: to be able to search by name.

In [7]:
c.execute(
    "SELECT alb.Title "
    "FROM Album alb "
    "INNER JOIN Artist art on alb.ArtistId = art.ArtistId "
    "WHERE art.Name = 'Led Zeppelin'"
)
rows = c.fetchall()
for r in rows:
    print r

(u'BBC Sessions [Disc 1] [Live]',)
(u'Physical Graffiti [Disc 1]',)
(u'BBC Sessions [Disc 2] [Live]',)
(u'Coda',)
(u'Houses Of The Holy',)
(u'In Through The Out Door',)
(u'IV',)
(u'Led Zeppelin I',)
(u'Led Zeppelin II',)
(u'Led Zeppelin III',)
(u'Physical Graffiti [Disc 2]',)
(u'Presence',)
(u'The Song Remains The Same (Disc 1)',)
(u'The Song Remains The Same (Disc 2)',)


Here we add one line to order the output by title. If the Album table had some release date information, we could select that column as well and sort by date, for instance.

In [8]:
c.execute(
    "SELECT alb.Title "
    "FROM Album alb "
    "INNER JOIN Artist art on alb.ArtistId = art.ArtistId "
    "WHERE art.Name = 'Led Zeppelin'"
    "ORDER BY alb.Title ASC"
)
rows = c.fetchall()
for r in rows:
    print r

(u'BBC Sessions [Disc 1] [Live]',)
(u'BBC Sessions [Disc 2] [Live]',)
(u'Coda',)
(u'Houses Of The Holy',)
(u'IV',)
(u'In Through The Out Door',)
(u'Led Zeppelin I',)
(u'Led Zeppelin II',)
(u'Led Zeppelin III',)
(u'Physical Graffiti [Disc 1]',)
(u'Physical Graffiti [Disc 2]',)
(u'Presence',)
(u'The Song Remains The Same (Disc 1)',)
(u'The Song Remains The Same (Disc 2)',)


Some queries do implicit inner joins. That is, we don't have to use `INNER JOIN` syntax, but that's what happens behind the scenes. For instance, this query selects all Artist - Album pairs:

In [9]:
c.execute('SELECT Name, Title FROM Artist, Album '
          'WHERE Artist.ArtistId = Album.ArtistId ')
for artist_album in c.fetchall():
    print artist_album[0] + ' - ' + artist_album[1]

AC/DC - For Those About To Rock We Salute You
Accept - Balls to the Wall
Accept - Restless and Wild
AC/DC - Let There Be Rock
Aerosmith - Big Ones
Alanis Morissette - Jagged Little Pill
Alice In Chains - Facelift
Antônio Carlos Jobim - Warner 25 Anos
Apocalyptica - Plays Metallica By Four Cellos
Audioslave - Audioslave
Audioslave - Out Of Exile
BackBeat - BackBeat Soundtrack
Billy Cobham - The Best Of Billy Cobham
Black Label Society - Alcohol Fueled Brewtality Live! [Disc 1]
Black Label Society - Alcohol Fueled Brewtality Live! [Disc 2]
Black Sabbath - Black Sabbath
Black Sabbath - Black Sabbath Vol. 4 (Remaster)
Body Count - Body Count
Bruce Dickinson - Chemical Wedding
Buddy Guy - The Best Of Buddy Guy - The Millenium Collection
Caetano Veloso - Prenda Minha
Caetano Veloso - Sozinho Remix Ao Vivo
Chico Buarque - Minha Historia
Chico Science & Nação Zumbi - Afrociberdelia
Chico Science & Nação Zumbi - Da Lama Ao Caos
Cidade Negra - Acústico MTV [Live]
Cidade Negra - Cidade Negra - Hi

You can also apply operators to the columns. A very simple and useful one is `COUNT`, especially when used in conjuction with the `GROUP BY` clause. Here, the last line groups rows (in effect, albums) by artist, and `COUNT` returns the number of rows in each group.

In [10]:
c.execute('SELECT Name, COUNT(Title) FROM Artist, Album '
          'WHERE Artist.ArtistId = Album.ArtistId '
          'GROUP BY Name ')
for r in c.fetchall():
    print r[0], r[1]

AC/DC 2
Aaron Copland & London Symphony Orchestra 1
Aaron Goldberg 1
Academy of St. Martin in the Fields & Sir Neville Marriner 1
Academy of St. Martin in the Fields Chamber Ensemble & Sir Neville Marriner 1
Academy of St. Martin in the Fields, John Birch, Sir Neville Marriner & Sylvia McNair 1
Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart 1
Accept 2
Adrian Leaper & Doreen de Feis 1
Aerosmith 1
Aisha Duo 1
Alanis Morissette 1
Alberto Turco & Nova Schola Gregoriana 1
Alice In Chains 1
Amy Winehouse 2
Anne-Sophie Mutter, Herbert Von Karajan & Wiener Philharmoniker 1
Antal Doráti & London Symphony Orchestra 1
Antônio Carlos Jobim 2
Apocalyptica 1
Aquaman 1
Audioslave 3
BackBeat 1
Barry Wordsworth & BBC Concert Orchestra 1
Battlestar Galactica 2
Battlestar Galactica (Classic) 1
Berliner Philharmoniker & Hans Rosbaud 1
Berliner Philharmoniker & Herbert Von Karajan 3
Berliner Philharmoniker, Claudio Abbado & Sabine Meyer 1
Billy Cobham 1
Black Label Society 2
Blac

In [11]:
c.execute(
    "INSERT INTO Artist(ArtistId, Name) "
    "VALUES (276, 'My Funk Band')"
)

<sqlite3.Cursor at 0x1069f5650>

Let's verify that the new artist was added.

In [12]:
c.execute("SELECT * FROM Artist "
          "WHERE Name = 'My Funk Band'")
for r in c.fetchall():
    print r

(276, u'My Funk Band')


Trying to add again would result in an error:

In [13]:
try:
    c.execute(
        "INSERT INTO Artist(ArtistId, Name) "
        "VALUES (276, 'My Funk Band')"
    )
except sqlite3.IntegrityError:
    print "Primary id already exists!"

Primary id already exists!


You can use the `INSERT OR IGNORE` query to avoid such warnings.

In [14]:
try:
    c.execute(
        "INSERT OR IGNORE INTO Artist(ArtistId, Name) "
        "VALUES (276, 'My Funk Band')"
    )
except sqlite3.IntegrityError:
    # This shouldn't be triggered.
    print "Primary id already exists!"

Sometimes you may not know the primary key you wish to insert. In that case, you can have SQLite automatically add a new index. This has a slight overhead, however (which doesn't matter for simple environments).

In [15]:
c.execute(
    "INSERT INTO Artist(Name) "
    "VALUES ('My Rock Band')"
)
c.execute("SELECT * FROM Artist "
          "WHERE Name = 'My Rock Band'")
for r in c.fetchall():
    print r

(277, u'My Rock Band')


Here's an example of how to combine data from one table to create a new entry in another table. We will add an album related to one of the bands added above by replacing explicitly defined `VALUES` with a `SELECT` query (that has one field (Title) manually given).

In [16]:
c.execute(
    "INSERT INTO Album (Title, ArtistId) "
    "SELECT 'First Album Title', ArtistId FROM Artist "
    "WHERE Name='My Rock Band'"
)
c.execute("SELECT Title FROM Album "
          "WHERE ArtistId = 277")
for r in c.fetchall():
    print r

(u'First Album Title',)


Finally, we can update values.

In [17]:
c.execute(
    "UPDATE Album "
    "SET Title='My Improved Rock Album' "
    "WHERE Title='First Album Title'"
)
c.execute("SELECT Title FROM Album "
          "WHERE ArtistId = 277")
for r in c.fetchall():
    print r

(u'My Improved Rock Album',)


We are done with using SQL for now, so we close the cursor and disconnect.

In [18]:
c.close()
connection.close()

To learn about table creation, and have a nice reference, you can use the official SQLite documentation, or an online resource like: http://www.tutorialspoint.com/sqlite/sqlite_create_table.htm