# Module 10 Part 2: Detailed Look at SQL Queries, SQL Joins and Manipulating Tables

This module consists of 2 parts:

- **Part 1** - Databases and SQL Basics
- **Part 2** - Detailed look at SQL Queries, SQL Joins, and Manipulating Tables

Each part is provided in a separate notebook file. It is recommended that you follow the order of the notebooks.

This notebook represents **Part 2** of the module. In this part, we will continue learning SQL queries. We will take a more detailed look at the queries we covered in the end of Part 1 and learn SQL Joins. We will conclude the module by looking at the advanced SQL queries and commands to manipulate database tables.

## SQL Queries -  A Closer Look

So far we've discussed syntax for the creation, updating and deletion of tables. In doing this, we've only covered queries at a high level. In this section we look more closely at what can actually be done within querying clauses. 

First, let's load the same database as we used in Part 1 of this module, Chinook database:

In [1]:
import requests
import os
import shutil
import sqlite3
import pandas as pd
from time import sleep

# Code for grabbing our sqlite file off the internet
global dump

"""
Used to download the Chinook Database.
"""
def download_file():
    global dump
    url = "https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite"
    dump = requests.get(url, stream=True).raw

'''
Used to save the downloaded Chinook Database into an sqlite database file.
'''
def save_file():
    global dump
    location = os.path.relpath("exampledb.sqlite")
    with open("exampledb.sqlite", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

In [2]:
"""
This code snippet downloads the Chinook database, connects to it, and prepares for queries to be executed.
"""
# Grabbing copy of database
download_file()
# Saving copy of database to a local file
save_file()
# Create a connection object that represents a database    
conn = sqlite3.connect("exampledb.sqlite")
# Once the connection to the database is opened, we create a Cursor object to execute queries
c = conn.cursor()

### Aliasing

When joining or selecting tables, it is important to specify **aliases** (this is done so that you do not have to write out explicitly the table and databases names).

```
-- Example : output attributes
SELECT column_name AS column_alias
FROM table_name

-- Example : table inputs
SELECT column_name
FROM table_name  AS table_alias
```

In [3]:
"""
Using an alias to query for all invoices with an id lees than 10.
In the example below, we use alias `a` for the table `invoice`.

Same as: 
SELECT invoiceid FROM invoice WHERE invoiceid < 10;
"""
c.execute("""
SELECT a.invoiceid AS invoiceid FROM invoice AS a WHERE a.invoiceid < 10;
""")

c.fetchall()

[(1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]

Aliases are also useful if there are fields with the same name in different tables (i.e. like in a self join which we will review later in this notebook), or when creating consistent functions (i.e. like in procedural SQL, or complicated queries).

**NOTE**: When you alias a table you can omit the `AS`. The following statement will return the same result as the one above which uses `AS`:

In [4]:
c.execute("""
SELECT a.invoiceid invoiceid FROM invoice a WHERE a.invoiceid < 10;
""")

c.fetchall()

[(1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]

### SELECT clause

In a `SELECT` statement, usually, columns are specified for retrieval. However, **aliasing** allows us to use short forms. Consider the query below. How would we use a WHERE clause of a self-joining table query? 

In [5]:
"""
Querying invoice table
"""
c.execute("""
SELECT * FROM invoice;
""")
# Here we use fetchmany() function to get only 2 records back. 
# See the NOTE below for more information.
  
c.fetchmany(2) 

[(1,
  2,
  '2009-01-01 00:00:00',
  'Theodor-Heuss-Straße 34',
  'Stuttgart',
  None,
  'Germany',
  '70174',
  1.98),
 (2,
  4,
  '2009-01-02 00:00:00',
  'Ullevålsveien 14',
  'Oslo',
  None,
  'Norway',
  '0171',
  3.96)]

**NOTE:** In order to reduce the size of the output, without altering the query, we've used a different fetch method for retrieving results. Refer to the following table for more details:

| Function | Description |
| ---: | :--- |
| `fetchone()` | Fetches the next row of a query result set, returning a single sequence, or None when no more data is available. |
| `fetchmany(size=cursor.arraysize)` | Fetches the next set of rows of a query result, returning a list. An empty list is returned when no more rows are available. |
| `fetchall()` | Fetches all (remaining) rows of a query result, returning a list. Note that the cursor’s `arraysize` attribute can affect the performance of this operation. An empty list is returned when no rows are available. |
For more details on these methods, please refer to the documentation for [`sqlite3` interface](https://docs.python.org/3/library/sqlite3.html#cursor-objects).

The `*` notation of the `SELECT` clause returns all *columns* of tables accessed from the `FROM` clause. In the example below, the `SELECT` statement selects columns for `a` and for `b` separately even though they are the same sets of all columns from the table `invoices`. Each table in `FROM` needs its own **alias** identifier, if it has already been previously included in the `FROM`. Thus the column names become unique to that identifier. 

The following illustrates the point further where the `invoiceid` attribute is referenced.

In [6]:
"""
The statement below will construct a self-joining query.
It grabs a join/comparison of all invoices against themselves, 
but without self-comparisons (where the ids are equal).
To simplify the query, we **alias** the tables using the alias' `a` and `b`.
"""
c.execute("""
SELECT * FROM invoice AS a, invoice AS b WHERE a.invoiceid != b.invoiceid;
""")
c.fetchmany(2)

[(1,
  2,
  '2009-01-01 00:00:00',
  'Theodor-Heuss-Straße 34',
  'Stuttgart',
  None,
  'Germany',
  '70174',
  1.98,
  2,
  4,
  '2009-01-02 00:00:00',
  'Ullevålsveien 14',
  'Oslo',
  None,
  'Norway',
  '0171',
  3.96),
 (1,
  2,
  '2009-01-01 00:00:00',
  'Theodor-Heuss-Straße 34',
  'Stuttgart',
  None,
  'Germany',
  '70174',
  1.98,
  3,
  8,
  '2009-01-03 00:00:00',
  'Grétrystraat 63',
  'Brussels',
  None,
  'Belgium',
  '1000',
  5.94)]

**NOTE**: Because SQL has a clear division between its querying language and any mutable operations, SQL do not (usually) make use of `==` to check equality conditions.

#### CASE statement

SQL has a `CASE` expression, that comes in very handy in recoding/creating new variables. It is essentially a method for creating categorical values, behaving somewhat similarly to an if-statement.

In [7]:
"""
Mapping numerical ranges to string descriptions. Could potentially be categories.
"""
c.execute("""
SELECT total, CASE WHEN total > 10  THEN 'Hey big spender!' 
    WHEN total < 5 THEN 'Dig this blender!' 
    ELSE 'Rainbow suspenders!' END
FROM invoice;
""")
c.fetchmany(5)

[(1.98, 'Dig this blender!'),
 (3.96, 'Dig this blender!'),
 (5.94, 'Rainbow suspenders!'),
 (8.91, 'Rainbow suspenders!'),
 (13.86, 'Hey big spender!')]

The `CASE` statement in SQL works like  the **functional variant** of IF-THEN-ELSE in other programming languages. It's main use is to relabel variables, regroup variables, and make continuous variables discrete.

**NOTE**: There can be more than one form of if-statements in a variant of SQL. This is because the statements are actually operations defined ***per clause***.

###### Example

Suppose, you want to classify the tracks based on their length such as:
- less than a minute - the track is short
- between 1 and 5 minutes - the track is medium
- greater than 5 minutes - the track is long. 

The following query achieves the creation of this new category.

In [8]:
c.execute("""
SELECT trackId, name, 
CASE WHEN milliseconds < 60000 THEN 'short' 
WHEN milliseconds > 60000 AND milliseconds < 300000 THEN 'medium' 
ELSE 'long' 
END category
FROM track;
""")
c.fetchmany(5)

[(1, 'For Those About To Rock (We Salute You)', 'long'),
 (2, 'Balls to the Wall', 'long'),
 (3, 'Fast As a Shark', 'medium'),
 (4, 'Restless and Wild', 'medium'),
 (5, 'Princess of the Dawn', 'long')]

**NOTE**: `category` is a new attribute name given to the `CASE` value. Otherwise an automatically generated default name would have been assigned to the resulting table's attribute. 

#### DISTINCT

If a table contains duplicate retrieved records, one way to remove them is to use `DISTINCT` in the `SELECT` statement.

In [9]:
# Try removing DISTINCT to see how large the actual results are in comparison.
c.execute("""
SELECT DISTINCT city
FROM customer;
""")
c.fetchall()

[('São José dos Campos',),
 ('Stuttgart',),
 ('Montréal',),
 ('Oslo',),
 ('Prague',),
 ('Vienne',),
 ('Brussels',),
 ('Copenhagen',),
 ('São Paulo',),
 ('Rio de Janeiro',),
 ('Brasília',),
 ('Edmonton',),
 ('Vancouver',),
 ('Mountain View',),
 ('Redmond',),
 ('New York',),
 ('Cupertino',),
 ('Reno',),
 ('Orlando',),
 ('Boston',),
 ('Chicago',),
 ('Madison',),
 ('Fort Worth',),
 ('Tucson',),
 ('Salt Lake City',),
 ('Toronto',),
 ('Ottawa',),
 ('Halifax',),
 ('Winnipeg',),
 ('Yellowknife',),
 ('Lisbon',),
 ('Porto',),
 ('Berlin',),
 ('Frankfurt',),
 ('Paris',),
 ('Lyon',),
 ('Bordeaux',),
 ('Dijon',),
 ('Helsinki',),
 ('Budapest',),
 ('Dublin',),
 ('Rome',),
 ('Amsterdam',),
 ('Warsaw',),
 ('Madrid',),
 ('Stockholm',),
 ('London',),
 ('Edinburgh ',),
 ('Sidney',),
 ('Buenos Aires',),
 ('Santiago',),
 ('Delhi',),
 ('Bangalore',)]

**NOTE**: In some SQL variants, `DISTINCT` is multi-objective and applicable to groups of attributes. In other variants, it can be strictly a single-column operation. In some variants it is strictly record specific, meaning only whole records are allowed to be unique in a query.

#### Scalar functions

SQL scalar functions return a single value, based on the input value. The functions one can expect are from the C programming language's Standard Template Library. This library is used in most systems and by most programming languages, even Python. The library gives support for all string functions, math functions, and boolean functions. This should give a sense of what is possible in not only the `SELECT` clause of SQL, but also the `WHERE`, and `HAVING` clauses as well.

| ***MySQL*** Commands | Description |
| ---: | :--- |
| `UCASE()` or `UPPER()` | Converts a field to upper case |
| `LCASE()` or `LOWER()` | Converts a field to lower case |
| `MID()` | Extract characters from a text field |
| `LEN()` | Returns the length of a text field |
| `ROUND()` | Rounds a numeric field to the number of decimals specified |
| `NOW()` | Returns the current system date and time |
| `FORMAT()` | Formats how a field is to be displayed |

In [10]:
# NOTE: Not all DBMS's have full unicode support. 
# Letters might render properly, but `lower()` and `upper()` may not behave as intended.
# Also note that there is no LEN() in SQLite, but there is LENGTH()

c.execute("""
SELECT DISTINCT UPPER(city), LOWER(city), LENGTH(city) 
FROM customer;
""")
c.fetchmany(5)

[('SãO JOSé DOS CAMPOS', 'são josé dos campos', 19),
 ('STUTTGART', 'stuttgart', 9),
 ('MONTRéAL', 'montréal', 8),
 ('OSLO', 'oslo', 4),
 ('PRAGUE', 'prague', 6)]

In [11]:
"""
Here we use `DATE()` and `TIME()` and the `'now'` string to show how to make calendar related values.
(WHERE clause operators are consistent for times and dates. i.e., `<`, `>`, `=`, etc.)
"""
# NOTE: Not all DBMS's have the same function names. 
# Functions are technically, not a part of the SQL language standard and specification.
c.execute("""
SELECT DISTINCT UPPER(SUBSTR(city, 1, 3)) AS locationCode, 
DATE('now') AS date, TIME('now') AS time, ROUND(TIME('now')) AS hour
FROM customer;
""")
c.fetchmany(5)

[('SãO', '2020-01-16', '06:51:50', 6.0),
 ('STU', '2020-01-16', '06:51:50', 6.0),
 ('MON', '2020-01-16', '06:51:50', 6.0),
 ('OSL', '2020-01-16', '06:51:50', 6.0),
 ('PRA', '2020-01-16', '06:51:50', 6.0)]

### WHERE clause

All conditions in the `WHERE` clause are boolean checks. As such, the statements are of a similar form to what is expected in previous modules.

The statement below will return all rows where invoice total is not equal to 0.99 and less than 10.

In [12]:
c.execute("""
SELECT *
FROM invoice
WHERE NOT total > 10 AND total <> 0.99;
""")
c.fetchmany(5)

[(1,
  2,
  '2009-01-01 00:00:00',
  'Theodor-Heuss-Straße 34',
  'Stuttgart',
  None,
  'Germany',
  '70174',
  1.98),
 (2,
  4,
  '2009-01-02 00:00:00',
  'Ullevålsveien 14',
  'Oslo',
  None,
  'Norway',
  '0171',
  3.96),
 (3,
  8,
  '2009-01-03 00:00:00',
  'Grétrystraat 63',
  'Brussels',
  None,
  'Belgium',
  '1000',
  5.94),
 (4,
  14,
  '2009-01-06 00:00:00',
  '8210 111 ST NW',
  'Edmonton',
  'AB',
  'Canada',
  'T6G 2C7',
  8.91),
 (7,
  38,
  '2009-02-01 00:00:00',
  'Barbarossastraße 19',
  'Berlin',
  None,
  'Germany',
  '10779',
  1.98)]

**NOTE**: `d <> a` is the same as `d != a` which is the same as `(NOT (d = a))` or `d NOT = a`. Operations like `<=`, `>=`, `!=` (and even the ones stated prior) may not be supported. It depends on the DBMS support provided. It is always a good idea to quickly reference the [documentation of the database system](https://www.sqlite.org/docs.html). Additionally, different DBMS systems have varying development support. So it is equally important to make sure your documentation applies to your version of the DBMS being used. Some modern DBMSs in popular use are *over 30 years old*.

#### String pattern matching

The `LIKE` pattern matching operator can be used in the conditional selection of the `WHERE` clause. It allows a weak form of regular expression like pattern matching. The percent sign `'%'` can be used as a wild card to match any possible string character that might appear before or after the characters specified.

In [13]:
c.execute("""SELECT * FROM artist WHERE name LIKE 'Bl%';""")
output1 = c.fetchmany(5)
c.execute("""SELECT * FROM artist WHERE name LIKE '%y';""")
output2 = c.fetchmany(5)

output1

[(11, 'Black Label Society'), (12, 'Black Sabbath'), (169, 'Black Eyed Peas')]

In [14]:
output2

[(11, 'Black Label Society'),
 (15, 'Buddy Guy'),
 (54, 'Green Day'),
 (66, 'Santana Feat. Eagle-Eye Cherry'),
 (73, 'Vinícius E Qurteto Em Cy')]

In [15]:
# Interestingly, you can implement a quick hacky word search in this fashion.
c.execute("""SELECT * FROM artist WHERE name LIKE '%the %' OR name LIKE '% A %';""")
c.fetchmany(5)

[(35, 'Pedro Luís & A Parede'),
 (64, 'Santana Feat. The Project G&B'),
 (135, 'System Of A Down'),
 (137, 'The Black Crowes'),
 (138, 'The Clash')]

**NOTE**: SQLite strings are single quoted. Depending on the DBMS, double quoted strings may not be interpreted as strings. This is the case for SQLite.

In the first example, all columns are retrieved where `name` *starts* with the substring `'Bl'`. In the second example, all columns are retrieved where `name` *ends* with `'c'`. The `LIKE` operator is separate from regular expression checks because it is implemented using only string functions. As such, the operator scales better for very large database queries. 

Consider using `LIKE` where simple regular expressions are needed.

##### LIKE Statement

The `LIKE` operator performs a basic pattern-matching using wild-card characters. *For Microsoft SQL Server*, the wild-card characters are defined as the following.

| Patterns | Descriptions |
| ---: | :--- |
| `_` (underscore)	| matches any single character |
| `%` | matches a string of one or more characters |
| `[ ]` | matches any single character within the specified range (e.g. `'[a-f]'`) or set (e.g. `'[abcdef]'`). |
| `[^]` | matches any single character not within the specified range (e.g. `'[^a-f]'`) or set (e.g. `'[^abcdef]'`). |

**NOTE**: some features of `LIKE` may vary between DBMS vendors. Similarly, `LIKE` is not a regular expression operator. DBMSs often have a separate operator or type for matching with regular expressions.

In [16]:
# SQLite only supports two wild card characters in `LIKE`,`_` and `%`.
c.execute("""SELECT * FROM artist WHERE (name NOT LIKE 'B%') AND (name NOT LIKE '___ %'); """)
c.fetchmany(5)

[(1, 'AC/DC'),
 (2, 'Accept'),
 (3, 'Aerosmith'),
 (4, 'Alanis Morissette'),
 (5, 'Alice In Chains')]

The query above finds all artists whose name does not start with `'B'` or a three letter word. Assume you are working with the `artist` table that contains `name` column, then the following table are possible `WHERE` clauses that could be conceivably used for other databases.

| Example | Explanation |
| ---: | :--- |
| `WHERE Name LIKE '_2'` | finds all two-letter artist names that end with `'2'` (e.g. U2). |
| `WHERE Name LIKE '%nic'` | finds all artists and groups whose name ends with `'nic'` (e.g. ‘Leonard Bernstein & New York Philharmonic’) |
| `WHERE Name LIKE '%nic%'` | finds all artists and groups where name includes `'nic'` anywhere in the name (for example, in addition to the names ending with `'nic'`, it will find `'Mônica Marianno'`). |
| `WHERE Name LIKE '[JT]im'` | finds three-letter names that end with `'im'` and begin with either `'J'` or `'T'` |
| `WHERE Name LIKE 'M[^a]%'` | finds all names beginning with `'M'` where the following (second) letter is not `'a'`. |

#### Membership

In this section we look at operations and checks relevant to sequences. While these were originally notational short forms to reduce query size, in more modern databases, they are vital for working with complex attribute types, such as a list, XML, or JSON node. Here we focus on notations relevant to list like structures.

##### IN Statement

Similar to the Python programming language, `IN` can be used as a way to test membership, reducing the number of equality checks for membership checks.

In [17]:
c.execute("""
SELECT * 
FROM artist
WHERE artistId IN (3,4,5);
""")
c.fetchmany(5)

[(3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains')]

##### BETWEEN Statement

`BETWEEN` is a range check. It should be noted though, that the numeric type of the attribute changes the nature of the range check. Thus, a real number would match against the real interval, rather than the integer sequence.

In [18]:
c.execute("""
SELECT CustomerId, Total
FROM invoice
WHERE total BETWEEN 0 AND 1;
""")
c.fetchall() # In case you were wondering why we discarded 0.99 from totals in a previous example.

[(37, 0.99),
 (16, 0.99),
 (54, 0.99),
 (33, 0.99),
 (12, 0.99),
 (50, 0.99),
 (29, 0.99),
 (8, 0.99),
 (46, 0.99),
 (25, 0.99),
 (4, 0.99),
 (42, 0.99),
 (21, 0.99),
 (38, 0.99),
 (17, 0.99),
 (55, 0.99),
 (34, 0.99),
 (13, 0.99),
 (51, 0.99),
 (30, 0.99),
 (9, 0.99),
 (47, 0.99),
 (26, 0.99),
 (5, 0.99),
 (43, 0.99),
 (22, 0.99),
 (1, 0.99),
 (18, 0.99),
 (56, 0.99),
 (35, 0.99),
 (14, 0.99),
 (52, 0.99),
 (31, 0.99),
 (10, 0.99),
 (48, 0.99),
 (27, 0.99),
 (6, 0.99),
 (44, 0.99),
 (23, 0.99),
 (2, 0.99),
 (40, 0.99),
 (57, 0.99),
 (36, 0.99),
 (15, 0.99),
 (53, 0.99),
 (32, 0.99),
 (11, 0.99),
 (49, 0.99),
 (28, 0.99),
 (7, 0.99),
 (45, 0.99),
 (24, 0.99),
 (3, 0.99),
 (41, 0.99),
 (20, 0.99)]

### GROUP BY clause

`GROUP BY` is very similar to the `SELECT` clause, but the effect is that the output of the clause is a table with grouped rows. The effect is similar to indexing the table with a hierarchical tree, where the remainder of the query affects not the table but the groupings. The groupings are effectively the lowest position of the hierarchy.

Grouping is a mechanism providing a method to take aggregates of aggregates (of aggregates of ...) all within a single query. Unfortunately, `GROUP BY` by itself is only good for calculating aggregate values with the `SELECT` clause.

**NOTE**: `GROUP BY` can group with multiple objectives or be grouped with nested groups, similar to an index. By this point much overlap should have been noticed with `pandas` data-frame operations. This isn't a coincidence. SQL is remarkably stable and robust, and the developers of `pandas` have made a conscious effort not to deviate from it. You might remember this diagram from `pandas` module, as an illustration of the aggregation and grouping:

<img src='split-apply-combine.png' height="500" width="500" alt="split-apply-combine">

**Picture 4.** Aggregation and grouping, illustration of a split-apply-combine concept (McKinney, 2017).


#### Aggregate functions
SQL aggregate functions return a single value, calculated from values in a column. The functions available depend largely on the variant of SQL used, but in general the following are always available.

| Function | Description (given an attribute) |
| ---: | :--- |
| `AVG()` | Returns the average value |
| `COUNT()` | Returns the number of rows |
| `FIRST()` | Returns the first value |
| `LAST()` | Returns the last value |
| `MAX()` | Returns the largest value |
| `MIN()` | Returns the smallest value |
| `SUM()` | Returns the sum |
**NOTE**: `DISTINCT()` is technically an aggregate function, as its behavior changes due to the grouping used. 

*It is usually faster to do aggregation operations like these in the database rather than returning all the records and then aggregating them in memory. This is due to grouping being computed with joins (same with aggregates and sorting. Joins are a very expensive operation, which is why as much as possible is stuffed alongside the join to make the most of a necessary performance bottleneck). We will discuss joins in more details in the next section of this module.*

```
-- General example
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name; 
```

The example below will do the following:
- Select all rows from the table `invoice` so that all calculations will be done on all records in the table
- Calculate the following amounts based on the `total` column across all rows:
    - Sum of all invoices in the table (1st number in the output)
    - Average amount of the sale (2nd number in the output)
    - Total number of invoices (3rd number in the output
- Find the highest invoice amount (4th number in the output)
- Find the smallest invoice amount (5th number in the output)


In [19]:
c.execute("""
SELECT SUM(total) AS totalSales, AVG(total) AS averageSale, 
COUNT(total) AS totalInvoices, MAX(total) AS highestSale, MIN(total) AS smallestSale
FROM invoice;
""")
c.fetchall()

[(2328.600000000004, 5.651941747572825, 412, 25.86, 0.99)]

### HAVING clause

The `HAVING` clause is strictly for conditions containing aggregate functions, or conditions concerning attributes used in `GROUP BY`. It is essentially another `SELECT` clause, but for selecting groups, or retrieving records within groups meeting aggregate thresholds and conditions.
The `HAVING` clause was added because the `WHERE` keyword could not be used with aggregate functions. With only `WHERE`, the equivalent query would consist of nested sub-queries to obtain the same results.
```
-- General Form
SELECT column_name, aggregate_function(column_name)
ROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value; 
```

In [20]:
# Ignoring minor purchases, find big spenders, and return all repeating customers.

c.execute("""
SELECT DISTINCT customerid
FROM invoice
WHERE total > 1
GROUP BY customerid
HAVING SUM(total) > 40 AND COUNT(customerid) > 5;
""")
c.fetchall()

[(6,), (7,), (24,), (25,), (26,), (28,), (37,), (44,), (45,), (46,), (57,)]

### **EXERCISE 2:** Breaking down the SQL queries

In this exercise, we will practice all the concepts we have learned so far in this module by writing SQL queries.

**Task 1:** Write an SQL query to retrieve all records from the `track` table where the composer name contains string `Smith` anywhere within the field `composer` and the length of the track is equal or greater than 6 minutes. The query should return 3 columns: composer name, name of the track, and track length

In [21]:
# Type your code here


**Task 2:** We want to know the names of companies that our customers represent. This information can be found in the table `customer`. Generate a list of all company names, the list should not contain any duplicates.

In [22]:
# Type your code here


**Task 3:** Write an SQL query to generate a list of all customers who are from Canada or Germany. The list should contain first and last names of the customer and the country they are from. The list should be sorted by the last name of the customer and the country. *(**Hint:** you might want to look into the [`ORDER BY` clause](https://www.sqlite.org/lang_select.html#orderby) to answer the last requirement of this task.)*

In [23]:
# Type your code here


**Task 4:** What if we want to know how many tracks are in each and every album and which albums have the highest number of tracks? We will need to look again at the `track` table which contains `trackid` (the primary key) and `albumid` (a foreign key). Create an SQL query which will output the top 10 album IDs that have the highest number of tracks. The output should also contain composer name.

In [24]:
# Type your code here


**Solutions**

**Task 1:** This is a simple `SELECT` statement with two conditions after the `WHERE` clause which we need to connect with an `AND`. The first condition is to find all rows where the `composer` field contains string `Smith` anywhere within the composer name. The second condition is the length of the track, `>= 6 minutes`. We have to remember that the values in this column are not in minutes but in milliseconds, hence we are searching for all records where the value of `milliseconds` is equal or greater than `36000`. Here is a query:

In [25]:
c.execute("""
SELECT composer, name, milliseconds  
FROM track
WHERE composer LIKE "%Smith%" AND milliseconds >= 360000;""")
c.fetchall()

[('Deaffy & R.A. Smith-Diesel', 'Princess of the Dawn', 375418),
 ('Adrian Smith/Steve Harris', 'Paschendale', 508107),
 ('Adrian Smith/Bruce Dickinson/Steve Harris', 'Face In The Sand', 391105),
 ('Smith/Dickinson', '2 Minutes To Midnight', 366550),
 ('Adrian Smith/Bruce Dickinson', '2 Minutes To Midnight', 386821),
 ('Adrian Smith/Steve Harris', '22 Acacia Avenue', 395572),
 ('Adrian Smith/Steve Harris', 'The Prisoner', 361299),
 ('Smith, Toby', 'Too Young To Die', 365818),
 ('Smith, Toby', 'Music Of The Wind', 383033),
 ('Smith, Toby', 'Blow Your Mind', 512339),
 ('Smith, Toby', 'Revolution 1993', 616829),
 ('Toby Smith', 'Just Another Story', 529684),
 ('Toby Smith', 'Manifest Destiny', 382197),
 ('Anthony Kiedis/Chad Smith/Flea/John Frusciante', 'Sir Psycho Sexy', 496692),
 ('Anthony Kiedis, Flea, John Frusciante, and Chad Smith',
  'Venice Queen',
  369110),
 ('Astor Campbell, Delroy "Chris" Cooper, Donovan Jackson, Dorothy Fields, Earl Chinna Smith, Felix Howard, Gordon Williams

**Task 2:** This query is utilizing the `DISTINCT` statement to make sure we have unique values. Please note that one of the values is `NULL`, this means that this column in the table contains `NULL` values. 

In [26]:
c.execute("""
SELECT DISTINCT company
FROM customer;""")
c.fetchall()

[('Embraer - Empresa Brasileira de Aeronáutica S.A.',),
 (None,),
 ('JetBrains s.r.o.',),
 ('Woodstock Discos',),
 ('Banco do Brasil S.A.',),
 ('Riotur',),
 ('Telus',),
 ('Rogers Canada',),
 ('Google Inc.',),
 ('Microsoft Corporation',),
 ('Apple Inc.',)]

**Task 3:** For this task, we will use the `SELECT` clause, `IN` statement, and `ORDER BY` clause:

In [27]:
c.execute("""
SELECT firstname, lastname, country
FROM customer
WHERE country IN ('Canada', 'Germany') order by country, lastname;
""")
c.fetchall()

[('Robert', 'Brown', 'Canada'),
 ('Edward', 'Francis', 'Canada'),
 ('Aaron', 'Mitchell', 'Canada'),
 ('Jennifer', 'Peterson', 'Canada'),
 ('Mark', 'Philips', 'Canada'),
 ('Martha', 'Silk', 'Canada'),
 ('Ellie', 'Sullivan', 'Canada'),
 ('François', 'Tremblay', 'Canada'),
 ('Leonie', 'Köhler', 'Germany'),
 ('Hannah', 'Schneider', 'Germany'),
 ('Niklas', 'Schröder', 'Germany'),
 ('Fynn', 'Zimmermann', 'Germany')]

As you can see, the customers' list is sorted by the last name first, then by the country name. All customers are either from Canada or Germany.

**Task 4:** This query will use the following clauses and statements:
- `GROUP BY` clause to group the rows by the `albumid`, because all tracks from the same album will have the same album ID number
- then we need to use `COUNT()` to calculate the number of tracks per album within each group
- the result will be sorted by the number of tracks, `ORDER BY`, in descending order, `DESC`
- and we will need the top 10 rows of the resulting output using `LIMIT 10` statement

The result will be quite interesting. As we can see below, the composer name is populated only for 4 albums out of 10. It would be interesting to see the album name, not just the album ID. The album name can be found in the `album` table. In the next section of the module we will learn how to retrieve the data from multiple tables.

In [28]:
c.execute("""
SELECT albumid, composer, COUNT(trackid)
FROM track
GROUP BY albumid
ORDER BY COUNT(trackid) DESC LIMIT 10;
""")
c.fetchall()

[(141, 'Vandenberg', 57),
 (23, None, 34),
 (73, 'Gilberto Gil', 30),
 (229, None, 26),
 (230, None, 25),
 (251, None, 25),
 (83, 'alan bergman/marilyn bergman/peggy lipton jones/quincy jones', 24),
 (231, None, 24),
 (253, None, 24),
 (24, 'Chico Science', 23)]

## SQL JOINS

**Joins** in SQL allow the user to select columns from one or more tables, and create a set (result) that could be stored in another table, view, or be used as required (for example exported out for further analysis).

ANSI-standard SQL specifies five types of joins: 

| Type | Description |
| :---: | :--- |
| `INNER JOIN` or `JOIN` | returns rows when there is a match in both tables. |
| `LEFT OUTER JOIN` | returns all rows from the left table, even if there are no matches in the right table.  Fills missing data with NULLs. |
| `RIGHT OUTER JOIN` | returns all rows from the right table, even if there are no matches in the left table. Fills missing data with NULLs. |
| `FULL OUTER JOIN` | combines left and right outer joins. The joined table will contain all records from both tables, and fill in NULLs for missing matches on either side. |
| `CROSS JOIN` | returns the Cartesian product of the sets of records from the two or more joined tables. Equivalent to using normal `FROM` clause (i.e. all possible row combination, regardless of matching attributes) |

**NOTE** `SELF JOIN` is used to join a table to itself as if the table were two tables, requiring renaming/aliasing at least one table reference in the SQL statement, if needed.

An SQL `JOIN` clause combines columns from one or more tables in a relational database.  A `JOIN` is a means for combining columns from one or more tables by using values common to each and allows us to create more descriptive views of the data.

### Join cases

Why all the different types of join cases? The reasoning boils down to query efficiency. It is not a stretch to say that most modern database management systems are literally compilers that handle outside of memory operations. As such, different joins, the skewness/size-difference of the tables being joined, the indexes on the table, all have a significant effect on performance. While the DBMS does its best to optimize queries, *joins are the most expensive operation for databases to perform* from the system performance perspective. Hence the right join case can make an order-of-magnitude difference in performance.

#### Join

A simple `JOIN` returns all rows from multiple tables where the join condition is met. In general, it is a good practice to specify the type of join you need.

The first query below will return the list of track names with the album title.

In [29]:
c.execute("""
SELECT a.TrackId, a.Name, b.Title 
FROM track a JOIN album b
ON a.AlbumId = b.AlbumId;
""")
outputInnerJoin = c.fetchall()
# The size of the join. Same as the numer of returned records.
len(outputInnerJoin)

3503

You could also write a statement like this:

In [30]:
c.execute("""
SELECT a.TrackId, a.Name, b.Title 
FROM track a, album b
WHERE a.AlbumId = b.AlbumId;
""")
outputStandardJoin = c.fetchall()
# The size of the join. Same as the numer of returned records.
len(outputStandardJoin)

3503

The result will be the same, but the second example will create all possible combinations of data from both tables, and then filter based on the condition. However, this is not always true due to the complexity of query planning and how the planner interacts with indexes on columns. The example with the (`INNER`)`JOIN` has better performance, but query planning reduces the consequences of poor join choices.

#### Left Join

`LEFT JOIN` and `RIGHT JOIN` are the same type of join. The table order is simply reversed. The main purpose for these joins is to differentiate optional records when acting over *one-to-many (1:N) or many-to-one (N:1)* relationships, respectively. Although, nothing is stopping the join from being used for many-to-many relationships. The main difference between these joins and simple joins lies in how they show missing information. Simple joins do not materialize records without matches, while `LEFT JOIN` (`RIGHT JOIN`) will display at least one record with missing match values from the second (first) table being substituted with `NULL` values.

**Example: LEFT JOIN**

In [31]:
c.execute("""
SELECT 
   artist.ArtistId, 
   albumId
FROM artist
LEFT JOIN album 
ON album.ArtistId = artist.ArtistId
ORDER BY AlbumId;
""")
outputLeftJoin = c.fetchall()
len(outputLeftJoin)

418

One album belongs to one artist. However, one artist may have zero or more albums.
We can find the artists who do not have any albums by using the `LEFT JOIN` clause. We select artists and their corresponding albums. If an artist does not have any albums, then `AlbumId` will be `NULL`.
(The `LEFT JOIN` keyword returns *all records from the left table*, and the matched records from the right table. The result is `NULL` from the right side, if there is no match.)

Thus, all artists who have no albums will be returned.

In [32]:
c.execute("""
SELECT artist.ArtistId, AlbumId
FROM artist
LEFT JOIN album ON album.ArtistId = artist.ArtistId
WHERE AlbumId IS NULL;
""")
outputLeftJoinNull = c.fetchall()
len(outputLeftJoinNull)

71

#### Self Join

The self-join is a special kind of join that allows you to join a table to itself using any `JOIN` clause. A self-join is normally used to create a result set that joins the rows with the other rows within the same table. The main purpose for such an operation would be to materialize unary relationships with itself. Because the same table cannot be referred to more than one in a query, you need to use a table alias to assign the table a different name when you use self-join.

**Example of self join**

In [33]:
c.execute("""
SELECT 
m.FirstName || ' ' || m.LastName AS manager,
e.FirstName || ' ' || e.LastName AS directReport
FROM employee e
INNER JOIN employee m 
ON m.EmployeeId = e.ReportsTo
ORDER BY manager;
""")
outputSelfJoin = c.fetchall()
outputSelfJoin

[('Andrew Adams', 'Nancy Edwards'),
 ('Andrew Adams', 'Michael Mitchell'),
 ('Michael Mitchell', 'Robert King'),
 ('Michael Mitchell', 'Laura Callahan'),
 ('Nancy Edwards', 'Jane Peacock'),
 ('Nancy Edwards', 'Margaret Park'),
 ('Nancy Edwards', 'Steve Johnson')]

The first column in the output is the manager name, the second column is the name of his/her direct report.

The `employees` table stores not only employee data but also organizational data. The `ReportsTo` column specifies the reporting relationship between employees. If an employee reports to a manager, the value of `ReportsTo` for the employees row is equal to the value of the `EmployeeId` for the managers row. In case an employee does not report to anyone, the `ReportsTo` is `NULL`.

**NOTES**: 
1. CEO does not report to anyone. If we used `LEFT JOIN` instead of `INNER JOIN`, we would see his name in the output with `NULL` in the `Manager` column. See the modified query below.
2. The `||` operator concatenates one or more strings into a single string

In [34]:
c.execute("""
SELECT 
m.FirstName || ' ' || m.LastName AS manager,
e.FirstName || ' ' || e.LastName AS directReport
FROM employee e
LEFT JOIN employee m 
ON m.EmployeeId = e.ReportsTo
ORDER BY manager;
""")
c.fetchall()

[(None, 'Andrew Adams'),
 ('Andrew Adams', 'Nancy Edwards'),
 ('Andrew Adams', 'Michael Mitchell'),
 ('Michael Mitchell', 'Robert King'),
 ('Michael Mitchell', 'Laura Callahan'),
 ('Nancy Edwards', 'Jane Peacock'),
 ('Nancy Edwards', 'Margaret Park'),
 ('Nancy Edwards', 'Steve Johnson')]

### Set clauses

Set operations are applicable to tables. Their behaviour is fairly easy to predict, and intuitive.  A list of records could implement a set of records if duplicate records are not allowed.

Most set operators tend to have both a version allowing duplicates (i.e. a list operation) and a version enforcing distinctness (i.e. a true set operation).

#### Membership

As discussed before, membership can be tested in a `WHERE` clause to reduce the effort of equality checks. Similarly, we've briefly mentioned the ability to use nested queries. We are now able to expand properly on the topic without confusion. How does one check that a value is present in a table, list or sub-query? 

Clearly stated, all of the above are modeled as a list. Thus, they all use the same syntax, `IN`.

In [35]:
# This query returns 3 rows, where track id equals to 2, 3 or 4

c.execute("""
SELECT *
FROM track
WHERE trackid IN (2,4,3);
""")

c.fetchall()

[(2, 'Balls to the Wall', 2, 2, 1, None, 342562, 5510424, 0.99),
 (3,
  'Fast As a Shark',
  3,
  2,
  1,
  'F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman',
  230619,
  3990994,
  0.99),
 (4,
  'Restless and Wild',
  3,
  2,
  1,
  'F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman',
  252051,
  4331779,
  0.99)]

In [36]:
'''
This is a nested query. 
First, the second query is executed to select the first 3 records 
returned by the query in the brackets, which returns a list of 3 track IDs.
The second query is then returns all columns from the same table, `track`, 
for the records with track IDs returned by the first query.
'''

c.execute("""
SELECT *
FROM track
WHERE trackid IN (SELECT trackid FROM track LIMIT 3);
""")

c.fetchall()

[(1,
  'For Those About To Rock (We Salute You)',
  1,
  1,
  1,
  'Angus Young, Malcolm Young, Brian Johnson',
  343719,
  11170334,
  0.99),
 (6,
  'Put The Finger On You',
  1,
  1,
  1,
  'Angus Young, Malcolm Young, Brian Johnson',
  205662,
  6713451,
  0.99),
 (7,
  "Let's Get It Up",
  1,
  1,
  1,
  'Angus Young, Malcolm Young, Brian Johnson',
  233926,
  7636561,
  0.99)]

**NOTE**: There is only a list version of `IN` because `DISTINCT` can be used separately within a nested query in order to force records to be unique.

#### Comparison

Set comparisons are a generalization on membership checks. The motivation for comparisons is largely syntax based, as SQL itself provides this functionality through regular querying. But, the syntactic sugar provides more human-readable queries.

The set comparisons make use of regular `WHERE` clause comparisons, but add the following new keywords. In a sense, they add **quantifiers** to `WHERE` clauses:

| Keyword | Description |
| ---: | :--- |
| `ALL` | The preceding operator is modified to evaluate true, only where all individual nested query records hold for the condition. |
| `SOME` | The preceding operator is modified to evaluate true as long as at least one record is able to satisfy the constraints |

** Example**

```
SELECT a
FROM A
WHERE b >= SOME (SELECT c FROM B WHERE d > 100);

SELECT a
FROM A
WHERE b >= SOME (SELECT c FROM B WHERE d > 100);
```

**NOTES**: 
1. Equally applicable to `HAVING` clause. *But*, doing so means the comparison has to be with a grouped column.
2. SQLite does not support this type of clause. However, nesting queries and `HAVING` can achieve equivalent results.

#### Union

The `UNION` operator selects only distinct values by default. `UNION ALL` only concatenates the records without enforcing a uniqueness constraint. it returns *all* record instances. The selection behaves much like the python `or` operation, but for tables.

In [37]:
# Union of artists that have names starting with `a` or end with `e` (a Set)
c.execute("""
SELECT name FROM artist WHERE name LIKE 'a%' OR name LIKE 'A%'
UNION 
SELECT name FROM artist WHERE name LIKE '%e' OR name LIKE '%E';
""")
c.fetchall()

[('A Cor Do Som',),
 ('AC/DC',),
 ('Aaron Copland & London Symphony Orchestra',),
 ('Aaron Goldberg',),
 ('Academy of St. Martin in the Fields & Sir Neville Marriner',),
 ('Academy of St. Martin in the Fields Chamber Ensemble & Sir Neville Marriner',),
 ('Academy of St. Martin in the Fields, John Birch, Sir Neville Marriner & Sylvia McNair',),
 ('Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart',),
 ('Academy of St. Martin in the Fields, Sir Neville Marriner & William Bennett',),
 ('Accept',),
 ('Adrian Leaper & Doreen de Feis',),
 ('Aerosmith',),
 ("Aerosmith & Sierra Leone's Refugee Allstars",),
 ('Aisha Duo',),
 ('Alanis Morissette',),
 ('Alberto Turco & Nova Schola Gregoriana',),
 ('Alice In Chains',),
 ('Amy Winehouse',),
 ('Anne-Sophie Mutter, Herbert Von Karajan & Wiener Philharmoniker',),
 ('Antal Doráti & London Symphony Orchestra',),
 ('Antônio Carlos Jobim',),
 ('Apocalyptica',),
 ('Aquaman',),
 ('Audioslave',),
 ('Avril Lavigne',),
 ('Azymuth',),
 (

In [38]:
# Union of artists that have names starting with `a` or end with `e` (a List)
c.execute("""
SELECT name FROM artist WHERE name LIKE 'a%' OR name LIKE 'A%'
UNION ALL
SELECT name FROM artist WHERE name LIKE '%e' OR name LIKE '%E';
""")
c.fetchall()

[('AC/DC',),
 ('Accept',),
 ('Aerosmith',),
 ('Alanis Morissette',),
 ('Alice In Chains',),
 ('Antônio Carlos Jobim',),
 ('Apocalyptica',),
 ('Audioslave',),
 ('Azymuth',),
 ('A Cor Do Som',),
 ('Aquaman',),
 ("Aerosmith & Sierra Leone's Refugee Allstars",),
 ('Avril Lavigne',),
 ('Aisha Duo',),
 ('Aaron Goldberg',),
 ('Alberto Turco & Nova Schola Gregoriana',),
 ('Anne-Sophie Mutter, Herbert Von Karajan & Wiener Philharmoniker',),
 ('Academy of St. Martin in the Fields & Sir Neville Marriner',),
 ('Academy of St. Martin in the Fields Chamber Ensemble & Sir Neville Marriner',),
 ('Academy of St. Martin in the Fields, John Birch, Sir Neville Marriner & Sylvia McNair',),
 ('Aaron Copland & London Symphony Orchestra',),
 ('Academy of St. Martin in the Fields, Sir Neville Marriner & William Bennett',),
 ('Antal Doráti & London Symphony Orchestra',),
 ('Amy Winehouse',),
 ('Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart',),
 ('Adrian Leaper & Doreen de Feis',),
 (

#### Intersect

The **intersect** operation retrieves common records between two queries or tables. The purpose of intersect is to provide similar support to its equivalent set operation.

In [39]:
# Intersection of artists that have names starting with `a` or end with `e` (a Set). 
# There is no list equivalent in SQLite.

c.execute("""
SELECT name FROM artist WHERE name LIKE 'a%' OR name LIKE 'A%'
INTERSECT
SELECT name FROM artist WHERE name LIKE '%e' OR name LIKE '%E';
""")
c.fetchall() #These are the artists that had duplicates in the prior query

[('Alanis Morissette',),
 ('Amy Winehouse',),
 ('Audioslave',),
 ('Avril Lavigne',)]

#### Except

The `EXCEPT` operator is essentially a minus operation for sets. In terms of sets, one would think `EXCEPT` to be the compliment of a set, but what is the compliment of an arbitrary query? Is it the records not selected? Or all possible other queries? 
Rather than attempting to implement such a theoretically intensive framework, `EXCEPT` simply implements a binary operator that allows the conditional set to be an interchangeable operand by the user.

#### Did you know that...

Some database languages **do** attempt to fully align to proper set-theory. One such attempt is known as *Datalog*. This is out of scope for our discussion of SQL. However, it should be noted that much of the logic regarding the use of a schema for maintaining data integrity almost exclusively stemmed from the rigor of Datalog.

Well, now you know!

### **EXERCISE 3:** Practicing JOINs

**Task 1:** One of the columns in the table `track` is `genreid`. This is a foreign key which allows us to join `track` table with the `genre` table to get a genre name. Write an SQL query which will count the number of tracks for each genre. The output should also include two numbers for each genre - the duration of the longest and shortest tracks.

In [40]:
# Type your code here

**Task 2:** Rewrite the query from Task 4 of the Exercise 2 above to get the album name and the number of tracks for each album, display the top 10 results based on the number of tracks, sorted in descending order.

In [41]:
# Type your code here


**Solutions:**

**Task 1:** This query will join two tables based on the `genreid` key, then it will group the rows by the genre and count the number of rows within each group (genre). It will also find the shortest and longest tracks within each group:

In [42]:
c.execute("""
SELECT  genre.name AS genre, COUNT(*), MIN(Milliseconds), MAX(Milliseconds)
FROM track 
JOIN genre on genre.genreid=track.genreid
GROUP BY genre;
""")
c.fetchall()

[('Alternative', 40, 204078, 672773),
 ('Alternative & Punk', 332, 4884, 558602),
 ('Blues', 81, 135053, 589531),
 ('Bossa Nova', 15, 137482, 409965),
 ('Classical', 74, 51780, 596519),
 ('Comedy', 17, 1268268, 2541875),
 ('Drama', 64, 112712, 5088838),
 ('Easy Listening', 24, 89730, 292075),
 ('Electronica/Dance', 30, 143830, 529684),
 ('Heavy Metal', 28, 48013, 516649),
 ('Hip Hop/Rap', 35, 7941, 410409),
 ('Jazz', 130, 126511, 907520),
 ('Latin', 579, 33149, 543007),
 ('Metal', 374, 41900, 816509),
 ('Opera', 1, 174813, 174813),
 ('Pop', 48, 129666, 663426),
 ('R&B/Soul', 61, 127399, 418293),
 ('Reggae', 58, 173008, 366733),
 ('Rock', 1297, 1071, 1612329),
 ('Rock And Roll', 12, 106266, 163265),
 ('Sci Fi & Fantasy', 26, 2622622, 2960293),
 ('Science Fiction', 13, 2563938, 2713755),
 ('Soundtrack', 43, 32287, 383764),
 ('TV Shows', 93, 1237791, 5286953),
 ('World', 28, 39131, 300605)]

** Task 2:** Below is the query which produces the same calculations for the number of tracks per album as in the previous exercise. It also fetches the album name from the `album` table.

In [43]:
c.execute("""
SELECT album.title AS title, COUNT(*)
FROM track
JOIN album on album.albumId = track.albumId
GROUP BY title
ORDER BY COUNT(*) DESC LIMIT 10;
""")
c.fetchall()

[('Greatest Hits', 57),
 ('Minha Historia', 34),
 ('Unplugged', 30),
 ('Lost, Season 3', 26),
 ('Lost, Season 1', 25),
 ('The Office, Season 3', 25),
 ('Battlestar Galactica (Classic), Season 1', 24),
 ('Lost, Season 2', 24),
 ('My Way: The Best Of Frank Sinatra [Disc 1]', 24),
 ('Afrociberdelia', 23)]

## Manipulating Tables
In this section we cover the operations in SQL that are mutable and change the state of the database, resulting in potential future queries changing instead of remaining unchanged.

### CREATE TABLE

Creating tables usually requires a declaration or an initialization for inferring a declaration. As such, a query is often enough for creating a new table. But, it's necessary to also know how to declare the schema. The following template shows how to build a new table from a preexisting table:

```
-- General Form
CREATE TABLE [table name] AS [query];
```



The following query will create a new table in the Chinook database with the following parameters:
- name of the table - `customers_two`
- it will be created by using the data from two existing tables, `customer` and `invoice`, only those records that meet the criteria specified by the `SELECT` statement

In [44]:
c.execute("CREATE TABLE customers_two AS SELECT * FROM Customer AS a, invoice AS b WHERE b.total > 5 AND a.customerid = b.customerid ORDER BY a.customerid;")
c.execute("SELECT COUNT(*) FROM customers_two;")
c.fetchall()

[(179,)]

The table `customers_two` was created and it contains 179 rows of data. Let's take a look at a couple of rows to confirm that we got what was expected:

In [45]:
c.execute("SELECT * FROM customers_two limit 2;")
c.fetchall()

[(1,
  'Luís',
  'Gonçalves',
  'Embraer - Empresa Brasileira de Aeronáutica S.A.',
  'Av. Brigadeiro Faria Lima, 2170',
  'São José dos Campos',
  'SP',
  'Brazil',
  '12227-000',
  '+55 (12) 3923-5555',
  '+55 (12) 3923-5566',
  'luisg@embraer.com.br',
  3,
  143,
  1,
  '2010-09-15 00:00:00',
  'Av. Brigadeiro Faria Lima, 2170',
  'São José dos Campos',
  'SP',
  'Brazil',
  '12227-000',
  5.94),
 (1,
  'Luís',
  'Gonçalves',
  'Embraer - Empresa Brasileira de Aeronáutica S.A.',
  'Av. Brigadeiro Faria Lima, 2170',
  'São José dos Campos',
  'SP',
  'Brazil',
  '12227-000',
  '+55 (12) 3923-5555',
  '+55 (12) 3923-5566',
  'luisg@embraer.com.br',
  3,
  327,
  1,
  '2012-12-07 00:00:00',
  'Av. Brigadeiro Faria Lima, 2170',
  'São José dos Campos',
  'SP',
  'Brazil',
  '12227-000',
  13.86)]

When only specifying the schema, an empty table with `NULL` values is created.

```
-- General Form
CREATE TABLE [table name] ([[attribute name] [attribute type] [, ...]]);
```

The following is an example of declaring a table schema. 

The first table which we are creating is a new table `phonebook` which will have the following 4 columns:
- `phone`, of INTEGER type
- `firstname`, of VARCHAR(32) type
- `lastname`, of VARCHAR(32) type
- `address`, of VARCHAR(255) type

Take a look at the second `CREATE TABLE prodsales` statement and try to understand what the table will look like after it is created.

In [46]:
"""
The following is an example of declaring a table schema. 
We are creating a new table `phonebook`, it will 
"""
c.execute("CREATE TABLE  phonebook (phone INT, firstname VARCHAR(32), lastname VARCHAR(32), address VARCHAR(255));")
output1 = c.fetchall()

c.execute("CREATE TABLE prodsales (product CHAR(3), mnth SMALLINT, sales MONEY);")
output2 = c.fetchall()

output1 + output2 # empty because output isn't from any query.

[]

### DELETE

The `DELETE` statement is used to delete rows from a table. It's important to note that deletes can fail. Often the schema is used to check the validity of not just querying operations, but mutable operations as well. As such, types must be consistent when deleting records.

```
-- General form
DELETE 
FROM [table_name]
WHERE [condition [, ...]];
```

In [47]:
c.execute("SELECT count(*) FROM invoice WHERE total < 1;")
outputBefore = c.fetchall()

c.execute("DELETE FROM invoice WHERE total < 1;")
c.execute("SELECT count(*) FROM invoice WHERE total < 1;")
outputAfter = c.fetchall()

assert outputBefore[0][0] != outputAfter[0][0] 

outputBefore + outputAfter

[(55,), (0,)]

### DROP

The `DROP` statement is similar to `DELETE`, but used to delete tables and databases. Similarly, it is now possible to violate key constraints, thus an option of cascading changes to other tables is also available to prevent integrity failures.

```
-- General forms
DROP TABLE table_name;
DROP DATABASE database_name; 
```



In [48]:
c.execute("DROP TABLE Customer;") # delete
c.execute("SELECT * FROM sqlite_master WHERE type = 'table' AND tbl_name = 'Customer';") # check against schema
output = c.fetchall()

assert output == []
output

[]

### TRUNCATE

If we only want to delete the data inside the table, and not the table itself, we can use the `TRUNCATE TABLE` statement.

```
-- General Form
TRUNCATE TABLE table_name;

-- Equivalent to but faster than
DELETE * FROM table_name;
```



In [49]:
c.execute("SELECT count(*) FROM employee;")
outputBefore = c.fetchall()
outputAfter= None
try:
    c.execute("TRUNCATE TABLE employee;")
except:
    outputAfter = c.fetchall()
    
outputBefore + outputAfter

[(8,)]

Unfortunately, `TRUNCATE` isn't implemented in SQLite. *Or rather, it is implemented, but as an optimization for `DELETE`.*


### UPDATE

The `UPDATE` statement is used to update records in a table. It requires conditions to select rows to be updated with preset values.

```
-- General Form
UPDATE [table_name] 
SET [column1=value1 [, ...] ]
WHERE [condition [, ...]];
```

In [50]:
c.execute("""

    -- In case it's unclear by this point, this is what a comment in SQL looks like.
    -- 2 dashes AND a space. Don't forget that space.
    
    SELECT * 
    FROM artist
    WHERE name = 'Iron Maiden' OR name = 'censored'
;
""")
outputBefore = c.fetchall()
c.execute("""
    UPDATE artist
    SET name = 'censored'
    WHERE name = 'Iron Maiden'
;
""")
c.execute("""
    SELECT * 
    FROM artist
    WHERE name = 'Iron Maiden' OR name = 'censored'
;
""")
outputAfter = c.fetchall()

[outputBefore , outputAfter]

[[(90, 'Iron Maiden')], [(90, 'censored')]]

**NOTE:** In order to save changes to the database (i.e., save to disk and not just memory), we commit changes in memory to disk.

In [51]:
conn.commit()
## To close connections to the database
conn.close()

## Installing SQLite Studio

Before you install SQLite Studio, we should remind you that SQLite is an *embeddable database*. This means it doesn't have to be installed on a computer and doesn't need any special user privileges to be run. The database is *embeddable* in the sense that an application or program could be using it internally. If the database is embedded, the computer process can limit its own memory use by leveraging out-of-core operations and committing state information and data onto disk. This means python's `sqlite3` library already downloaded SQLite when the library was installed via `Anaconda Navigator`. The tool described here is an alternative interface.

### Installing SQLite Studio on your laptop

* Go to https://sqlitestudio.pl/index.rvt?act=download  and choose the right installation for your laptop from the table `Latest stable release (3.2.1)`
* Unzip the downloaded file into a directory of your choice
* On Windows, run `SQLiteStudio.exe`
* On Mac, double-click `sqlitestudio-3.2.1.dmg` which you downloaded, and move `SQLiteStudio.app` into "Applications."

### Downloading Chinook database

- Download Chinook database from this [SQLite Sample Database page](http://www.sqlitetutorial.net/sqlite-sample-database). Open URL, scroll to the middle of the page, find "Download SQLite sample database" section and click on the button with the caption "Download SQLite sample database".
- The sample database file is ZIP format, therefore, you need to extract it to a folder, for example, `C:\sqlite\db`. The name of the file is `chinook.db`
- After you downloaded and unzipped the sample database, open SQLite Studio
- From the `Database` menu, select `Add a Database`
- In the next window, `File` field, navigate to the folder where you saved the sample database, file  `chinook.db` and click `OK`.
- The database will be loaded into the SQLite database and opened in SQLite Studio.

After you successfuly completed all the steps above, you can practice the same queries that we used in this module using SQLite Studio interface.

## Resources

### Web

* Bayer, M., (2009). SQLAlchemy. Retrieved from: [SQLAlchemy webpage](https://www.sqlalchemy.org/)
* W3Schools.com (2018). SQL Tutorial. Retrieved from [W3Schools webpage](http://www.w3schools.com/sql/)
* Eder, L., (2016). A Beginner’s Guide to the True Order of SQL Operations. An article is retrieved from the following [webpage on the Database Zone website](https://dzone.com/articles/a-beginners-guide-to-the-true-order-of-sqlnbspoper)

### Books

* Beaulieu, A. (2009). ``_Learning SQL: Master SQL Fundamentals, 2nd. Ed_''. O'Reilly.
* Forta, B. (2012). ``_SQL in 10 Minutes, Sams Teach Yourself, 4th Ed_''. Sams.
* Viescas, J.L. and Hernandez, M.J. (2014). ``_SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL, 3rd. Ed_''. Addison-Wesley.
* Molinaro, A. (2006). ``_SQL cookbook_'',  O'Reilly.
* Fehily, C. (2015). ``_SQL (Database Programming)_''. Questing Vole Press.

## References

International team of developers. (2018). *SQLite Version 3.24.0 (2018-06-04)*. Retrieved from [https://www.sqlite.org/index.html](https://www.sqlite.org/index.html)

Python Documentation (2018). The Python Standard Library. 12.6. sqlite3. Article *12.6.3. Cursor Objects*. Retrieved from [https://docs.python.org/3/library/sqlite3.html#cursor-objects](https://docs.python.org/3/library/sqlite3.html#cursor-objects)

McKinney, W. (2017). 10.1 GroupBy Mechanics. Python for Data Analysis (p 290). O'Reilly: Boston
