# Building and Organizing Complex Queries

## Introduction

In the previous two missions, we've learned a lot about joining data. We've gone from creating basic joins between two tables to making complex joins using multiple tables, subqueries, unusual join types and aggregate functions.<br>

In this mission, we're going to continue to practice constructing complex joins, while also learning how to:
* Build and format your queries for readability
* Creating named subqueries and views
* Combining data using set operations.

Just like the previous mission, we'll be working with the Chinook database. So you can easily refer to it, the schema for the Chinook database is provided again below.

![https://s3.amazonaws.com/dq-content/190/chinook-schema.svg](https://s3.amazonaws.com/dq-content/190/chinook-schema.svg)


## Writing Readable Queries

> "Code is read much more often than it is written, so plan accordingly
"Even if you don't intend anybody else to read your code, there's still a very good chance that somebody will have to stare at your code and figure out what it does: That person is probably going to be you, twelve months from now."<br><br>
—Raymond Chen

Often quoted and paraphrased, this philosophy is especially important when writing SQL, where queries can quickly get visually complex. Taking the time to write your queries to be more easily understood will take a little extra time now, but will save you time when you come back to old queries that you have written, and help your colleagues when you're working in a data team.<br>

One obvious area when it comes to writing queries is the use of **capitalization and whitespace**. Because white space doesn't have any meaning in SQL, it can be used to help convey meaning in a complex query. Let's compare the same query written twice— first without whitespace and capitalization:

```sql
select ta.artist_name artist, count(*) tracks_sold from invoice_line il
inner join (select t.track_id, ar.name artist_name from track t
inner join album al on al.album_id = t.album_id
inner join artist ar on ar.artist_id = al.artist_id) ta
on ta.track_id = il.track_id group by 1 order by 2 desc limit 10;
```

And now, with whitespace and capitalization:

```sql
SELECT
    ta.artist_name artist,
    COUNT(*) tracks_sold
FROM invoice_line il
INNER JOIN (
            SELECT
                t.track_id,
                ar.name artist_name
            FROM track t
            INNER JOIN album al ON al.album_id = t.album_id
            INNER JOIN artist ar ON ar.artist_id = al.artist_id
           ) ta
           ON ta.track_id = il.track_id
GROUP BY 1
ORDER BY 2 DESC LIMIT 10;
```

As you can see, a little time put into whitespace and capitalization pays off. A few tips to help make your queries more readable:
* If a select statement has more than one column, put each on a new line, indented from the select statement.
* Always capitalize SQL function names and keywords
* Put each clause of your query on a new line.
* Use indenting to make subqueries appear logically separate.

Another important consideration when writing readable queries is the use of **alias names and shortcuts**. Name aliases should be clear– a common convention is using the first letter of the table name, however if you feel that a query is complex you should consider using more explicit aliases. Similarly, at times lines like `GROUP BY 1` can be confusing, and explicitly naming the column will make your query more readable.<br>

If you work in a team, you might consider a SQL style guide— a great guide is available at [SQL style guide](http://www.sqlstyle.guide/), but remember is that readability is more important than consistency. If you you have a complex query and you think breaking the style guide will make it more readable, you should do it.

![https://s3.amazonaws.com/dq-content/190/SQL_style_guide.png](https://s3.amazonaws.com/dq-content/190/SQL_style_guide.png)

Throughout the rest of our SQL missions, be mindful of writing queries that are easy to read and understand. While we will continue to check answers based on the results of the queries (rather than taking into account the formatting), practicing this will make your future colleagues (and future self) thank you.<br>

Let's now learn another way to make your queries more readable: named subqueries.

In [1]:
import sqlite3
import pandas as pd
from matplotlib import pyplot as plt

%matplotlib inline
conn = sqlite3.connect("data/chinook.db")

## The With Clause

When constructing complex queries, it's useful to create an intermediate table to produce our final results. You can use subqueries to create these intermediate tables. Unfortunately, the way subqueries are written makes it harder to read— the person reading the query needs to find the subquery and read from the inside-out.<br>

One way to alleviate this is to use a **with clause**. With clauses allow you to define one or more named subqueries before the start of the main query. The main query then refers to the subquery by it's alias name, just as if it's a table in the database.<br>

The syntax for the `WITH` clause is relatively straight-forward.

```sql
WITH [alias_name] AS ([subquery])
SELECT [main_query]
```

Let's look at a simple example, a query designed to gather some info about a the tracks from a single album. First, here's our query written with a standard subquery and **no** `WITH` clause:

```sql
SELECT * FROM
    (
     SELECT
         t.name,
         ar.name artist,
         al.title album_name,
         mt.name media_type,
         g.name genre,
         t.milliseconds length_milliseconds
     FROM track t
     INNER JOIN media_type mt ON mt.media_type_id = t.media_type_id
     INNER JOIN genre g ON g.genre_id = t.genre_id
     INNER JOIN album al ON al.album_id = t.album_id
     INNER JOIN artist ar ON ar.artist_id = al.artist_id
    )
WHERE album_name = "Jagged Little Pill";
```

By moving the subquery before the main query using a WITH clause, the intent of the main query becomes much easier to understand.

```sql
WITH track_info AS
    (                
     SELECT
         t.name,
         ar.name artist,
         al.title album_name,
         mt.name media_type,
         g.name genre,
         t.milliseconds length_milliseconds
     FROM track t
     INNER JOIN media_type mt ON mt.media_type_id = t.media_type_id
     INNER JOIN genre g ON g.genre_id = t.genre_id
     INNER JOIN album al ON al.album_id = t.album_id
     INNER JOIN artist ar ON ar.artist_id = al.artist_id
    )

SELECT * FROM track_info
WHERE album_name = "Jagged Little Pill";
```

While in this example the difference is subtle, using the `WITH` statement helps a lot when your main query even has even some slight complexities. Let's get some practice using `WITH` in a more complex example.

* Create a query that shows summary data about the playlists in the Chinook database:
  * Use a `WITH` clause to create a named subquery with the following info:
    * The unique ID for the playlist.
    * The name of the playlist.
    * The name of each track from the playlist.
    * The length of the each track in seconds.
  * Your final table should have the following columns, in order:
    * `playlist_id` - the unique ID for the playlist.
    * `playlist_name` - The name of the playlist.
    * `number_of_tracks` - A count of the number of tracks in the playlist.
    * `length_seconds` - The sum of the length of the playlist in seconds.

In [8]:
pd.read_sql('select * from playlist', conn)

Unnamed: 0,playlist_id,name
0,1,Music
1,2,Movies
2,3,TV Shows
3,4,Audiobooks
4,5,90’s Music
5,6,Audiobooks
6,7,Movies
7,8,Music
8,9,Music Videos
9,10,TV Shows


In [2]:
pd.read_sql('select * from track limit 5', conn)

Unnamed: 0,track_id,name,album_id,media_type_id,genre_id,composer,milliseconds,bytes,unit_price
0,1,For Those About To Rock (We Salute You),1,1,1,"Angus Young, Malcolm Young, Brian Johnson",343719,11170334,0.99
1,2,Balls to the Wall,2,2,1,,342562,5510424,0.99
2,3,Fast As a Shark,3,2,1,"F. Baltes, S. Kaufman, U. Dirkscneider & W. Ho...",230619,3990994,0.99
3,4,Restless and Wild,3,2,1,"F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D...",252051,4331779,0.99
4,5,Princess of the Dawn,3,2,1,Deaffy & R.A. Smith-Diesel,375418,6290521,0.99


In [16]:
query = '''
    WITH subquery AS
    (
        SELECT
             p.playlist_id id,
             p.name playlist_name,
             t.name track_name,
             t.milliseconds length_milliseconds

         FROM playlist p
         INNER JOIN playlist_track pt
             ON pt.playlist_id = p.playlist_id
         INNER JOIN track t 
             ON t.track_id = pt.track_id
    )
    
    SELECT
        subquery.id playlist_id,
        subquery.playlist_name playlist_name,
        COUNT(subquery.track_name) number_of_tracks,
        SUM(subquery.length_milliseconds) / 1000.0
            length_seconds
    FROM subquery
    GROUP BY playlist_id
        
'''

pd.read_sql(query, conn)

Unnamed: 0,playlist_id,playlist_name,number_of_tracks,length_seconds
0,1,Music,3290,877683.083
1,3,TV Shows,213,501094.957
2,5,90’s Music,1477,398705.153
3,8,Music,3290,877683.083
4,9,Music Videos,1,294.294
5,10,TV Shows,213,501094.957
6,11,Brazilian Music,39,9486.559
7,12,Classical,75,21770.592
8,13,Classical 101 - Deep Cuts,25,6755.73
9,14,Classical 101 - Next Steps,25,7575.051


## Creating Views

When we use the `WITH` clause, we're creating a temporary named subquery that we can use only within that query. But what if we find ourselves using the same `WITH` with lots of different queries? It would be nice to permanently define a subquery that we can use again and again.<br>

We do this by creating a **view**, which we can then use in all future queries. An easy way to think of this is the `WITH` clause creates a temporary view. The syntax for creating a view is:

```sql
CREATE VIEW database.view_name AS
    SELECT * FROM database.table;
```

We'll be specifying the database name using `[database name].[view or table name]` syntax in instead of just `[view or table name]`. You'll need to use this in conjunction with any views because we have [manually attached the database](https://sqlite.org/lang_attach.html). If you're working with SQLite on your local machine, or in one of our Jupyter projects, you don't need to specify the database name like in the following example:

```sql
CREATE VIEW view_name AS
    SELECT * FROM table;
```

Here's an example of how to create a view called customer_2, identical to the existing customer table:


```sql
CREATE VIEW chinook.customer_2 AS
    SELECT * FROM chinook.customer;
```

In [24]:
q = '''
    CREATE VIEW customer_2 AS
        SELECT * FROM customer;
'''
conn.execute(q)

OperationalError: table customer_2 already exists

If we wanted to modify this view, and tried to redefine it, we'd get an error:

In [25]:
q = '''
    CREATE VIEW customer_2 AS
    SELECT
        customer_id,
        first_name || last_name name,
        phone,
        email,
        support_rep_id
    FROM customer;
'''
conn.execute(q)

OperationalError: table customer_2 already exists

If we wish to redefine a view, we first have to delete, or drop the existing view:

In [26]:
q = '''
    DROP VIEW customer_2;
'''
conn.execute(q)

<sqlite3.Cursor at 0x10f34d810>

In [27]:
conn.commit()

We're going to create two views that give us versions of the `customer` table where the customers in the view have specific criteria. The first is a view of all customers that live in the USA.

In [None]:
q = '''
    CREATE VIEW chinook.customer_usa AS 
     SELECT * FROM chinook.customer
     WHERE country = "USA";
'''

We have created this view for you - you can query it using your console or code editor. Once a view is created it acts exactly like a table - you don't need to specify that it's a view when you are querying it, and you can do anything with a view that you could do with a table (keeping in mind that in our interface you'll have to use `[database name].[view_name]`).<br>

Let's create a second view of customers that have purchased more than $90 from our store.

* Create a view called `customer_gt_90_dollars`:
  * The view should contain the columns from `customers`, in their original order.
  * The view should contain only customers who have purchased more than \$90 in tracks from the store.
* After the SQL query that creates the view, write a second query to display your newly created view: ```SELECT * FROM chinook.customer_gt_90_dollars;```.
  * Make sure you use a semicolon (`;`) to indicate the end of each query.

In [34]:
pd.read_sql('select customer.* from customer limit 5', conn)

Unnamed: 0,customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
0,1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
1,2,Leonie,Köhler,,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,+49 0711 2842222,,leonekohler@surfeu.de,5
2,3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3
3,4,Bjørn,Hansen,,Ullevålsveien 14,Oslo,,Norway,0171,+47 22 44 22 22,,bjorn.hansen@yahoo.no,4
4,5,František,Wichterlová,JetBrains s.r.o.,Klanova 9/506,Prague,,Czech Republic,14700,+420 2 4172 5555,+420 2 4172 5555,frantisekw@jetbrains.com,4


In [None]:
q = '''
    CREATE VIEW customer_gt_90_dollars AS 
        SELECT
            c.*
        FROM invoice i
        INNER JOIN customer c 
            ON i.customer_id = c.customer_id
        GROUP BY 1
        HAVING SUM(i.total) > 90;
'''

conn.execute(q); conn.commit()

In [47]:
pd.read_sql('SELECT * FROM customer_gt_90_dollars;', conn)

Unnamed: 0,customer_id,first_name,last_name,company,address,city,state,country,postal_code,phone,fax,email,support_rep_id
0,1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
1,3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3
2,5,František,Wichterlová,JetBrains s.r.o.,Klanova 9/506,Prague,,Czech Republic,14700,+420 2 4172 5555,+420 2 4172 5555,frantisekw@jetbrains.com,4
3,6,Helena,Holý,,Rilská 3174/6,Prague,,Czech Republic,14300,+420 2 4177 0449,,hholy@gmail.com,5
4,13,Fernanda,Ramos,,Qe 7 Bloco G,Brasília,DF,Brazil,71020-677,+55 (61) 3363-5547,+55 (61) 3363-7855,fernadaramos4@uol.com.br,4
5,17,Jack,Smith,Microsoft Corporation,1 Microsoft Way,Redmond,WA,USA,98052-8300,+1 (425) 882-8080,+1 (425) 882-8081,jacksmith@microsoft.com,5
6,20,Dan,Miller,,541 Del Medio Avenue,Mountain View,CA,USA,94040-111,+1 (650) 644-3358,,dmiller@comcast.com,4
7,21,Kathy,Chase,,801 W 4th Street,Reno,NV,USA,89503,+1 (775) 223-7665,,kachase@hotmail.com,5
8,22,Heather,Leacock,,120 S Orange Ave,Orlando,FL,USA,32801,+1 (407) 999-7788,,hleacock@gmail.com,4
9,30,Edward,Francis,,230 Elgin Street,Ottawa,ON,Canada,K2P 1L7,+1 (613) 234-3322,,edfrancis@yachoo.ca,3
