# SQL Review and Practice - Part 2


Many of the examples have been adapted from Data100 and J. Canny's course.

## Setup 

This notebook makes use of a python module: `utils.py` as well as 4 data files: 

* `pls_fy2009_pupld09a.csv` 
* `pls_fy2014_pupld09a.csv` 
* `us_counties_2000.csv`
* `us_counties_2010.csv` 


In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#from pathlib import Path
#from sqlalchemy import create_engine
from utils import fetch_and_cache

In [None]:
%%html
<style>
table {margin-left: 0 !important;}
</style>

For this lab, we will be using SQLite to connect to databases. This is a simple, light-way module, there are of course other modules to interact with databases. 

SQLite has a [nice tutorial](https://www.sqlitetutorial.net/) if you want to learn more about database queries and explore additional options. 

In [None]:
import sqlite3

A couple of things to know about SQLite.  Other database systems such as `MySQL` and `PostgreSQL` use *static typing*, where a column is declared for a specific data type and can only store data of that type. 

SQLite uses *dynamic type system*, where a value stored in a column determines its data type, not the column's data type.  

# Setting up Examples

## Example 1 

*Example from Practical SQL, by Anthony DeBarros, 2018 - Available on O'reilly's learning platform*  


Let's connect to a new, empty database `analysis2.db`

In [None]:
conn = sqlite3.connect("analysis2.db")

### Printing out SQL calls and results 

We define a function `print_sql(s)` that given database connection `c` and a sql query `s` returns the result of that executing.

In [None]:
def print_sql(c, s):
    print('>', s)
    for result in c.execute(s):
        print(result)
    print()

In [None]:
def pretty_print_sql(c, s):
    print('>', s)
    df = pd.read_sql(s, c)
    display(df)

## Example 2

Banking example from J. Freire consisting of the following relations. 

The first table, `Account`, has banking account information: 

    number    custID    owner        balance      type
    ------    ------    ---------    ---------    ---------
    101       1         J. Smith     1000.00      checking
    102       2         W. Wei       2000.00      checking
    103       1         J. Smith     5000.00      saving
    104       3         M. Jones     1000.00      checking
    105       4         H. Martin    10000.00     checking

The second table, `Deposit`, has transaction information: 

    account    transID    dDate       amount
    -------    -------    --------    ---------
    102        1          10/22/21    500.00
    102        2          10/29/21    200.00
    104        3          10/29/21    1000.00
    105        4          11/2/21     10000.00

The third table `CheckInfo`, has information on checks written: 

    account    checkNum    cDate       amount
    -------    --------    --------    ---------
    101        924         10/23/21    125.00
    101        925         10/24/21    23.98

## Setup Examples 1 and 2

#### Example 1 

Create `teachers` table and add data to it. 

In [None]:
# Example 1 - Create table teachers
conn.executescript("""
CREATE TABLE teachers (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    first_name TEXT,
    last_name TEXT,
    school TEXT,
    hire_date TEXT,
    salary INTEGER
);
""");

In [None]:
# Example 1 - Add data into teachers table
conn.executescript("""
INSERT INTO teachers (first_name, last_name, school, hire_date, salary)
VALUES ('Janet', 'Smith', 'F.D. Roosevelt HS', '2011-10-30', 36200),
       ('Lee', 'Reynolds', 'F.D. Roosevelt HS', '1993-05-22', 65000),
       ('Samuel', 'Cole', 'Myers Middle School', '2005-08-01', 43500),
       ('Samantha', 'Bush', 'Myers Middle School', '2011-10-30', 36200),
       ('Betty', 'Diaz', 'Myers Middle School', '2005-08-30', 43500),
       ('Kathleen', 'Roush', 'F.D. Roosevelt HS', '2010-10-22', 38500);
""");

In [None]:
# Example 1 - add row into teachers table
conn.executescript("""
INSERT INTO teachers (first_name, last_name, school, hire_date, salary) 
VALUES ('Hank', 'Smith', 'Jefferson HS', '2018-01-01', 32000);
""");

#### Example 2 

Populate tables, `Account`, `Deposit`, `CheckInfo` and `ATMwithdraw`.  The `ATMwithdraw` has the following schema `ATMwithdraw(transID, custID, account, amount, wDate)` where `transID` is the primary key.   

In [None]:
# Example 2  - Create tables in banking database 
conn2 = sqlite3.connect("banking2.db")

conn2.executescript("""
CREATE TABLE Account (
    number INTEGER PRIMARY KEY,
    custID INTEGER,
    owner TEXT,
    balance REAL,
    type TEXT
);
""");

conn2.executescript("""
CREATE TABLE Deposit (
    accountID INTEGER,
    transID INTEGER PRIMARY KEY, 
    dDate TEXT, 
    amount REAL, 
    FOREIGN KEY (accountID)
       REFERENCES Account (number)
);
""")

conn2.executescript("""
CREATE TABLE CheckInfo (
    accountID INTEGER NOT NULL,
    checkNum INTEGER NOT NULL, 
    cDate TEXT,
    amount REAL,
    PRIMARY KEY (accountID, checkNum), 
    FOREIGN KEY (accountID)
       REFERENCES Account (number)
);
""")

In [None]:
conn2.executescript("""
INSERT INTO Account (number, custID, owner, balance, type)
VALUES (101, 1, 'J. Smith', 1000.00, 'checking'), 
       (102, 2, 'W.Wei', 2000.00, 'checking'),
       (103, 1, 'J. Smith', 5000.00, 'saving'), 
       (104, 3, 'M. Jones', 1000.00, 'checking'),
       (105, 4, 'H. Martin', 10000.00, 'checking');
""")

conn2.executescript("""
INSERT INTO Deposit (accountID, transID, dDate, amount)
VALUES (102, 1, '10/22/21', 500.00), 
       (102, 2, '10/29/21', 200.00), 
       (104, 3, '10/29/21', 1000.00), 
       (105, 4, '11/2/21', 10000.00);
""")

conn2.executescript("""
INSERT INTO CheckInfo (accountID, checkNum, cDate, amount)
VALUES (101, 924, '10/23/21', 125.00),
       (101, 925, '10/24/21', 23.98);
""")

conn2.executescript("""
CREATE TABLE ATMwithdraw (
    transID INTEGER NOT NULL,
    custID INTEGER, 
    accountID INTEGER,
    amount REAL,
    wDate TEXT,
    PRIMARY KEY (transID), 
    FOREIGN KEY (accountID)
       REFERENCES Account (number)
    FOREIGN KEY (custID)
       REFERENCES Account (custID)
);
""")

conn2.executescript("""
INSERT INTO ATMwithdraw (transID, custID, accountID, amount, wDate)
VALUES (1, 2, 102, 25.00, '11/01/21 09:45:00'), 
       (2, 2, 102, 150.00, '11/10/2021 13:15:00'), 
       (3, 1, 101, 40.00, '11/01/2021 10:05:00'), 
       (4, 1, 101, 40.00, '11/01/2021 10:07:00'), 
       (5, 1, 101, 200.00, '11/8/2021 14:14:00');
""")

### Importing Data 

Many times we might have data from a delimited file.  We can import or export this data. 

In SQLite, this is done at the command line with the `.import` command.  In PostgreSQL, this is done with the `COPY` command. 

Here we can take advantage of `pandas`, to write a DataFrame to an sql database.

In [None]:
df = pd.read_csv("us_counties_2010.csv")
df.to_sql('us_counties_2010', conn, if_exists='append', index=False)
df.head()

## Query Language of SQL 

SQL - Structured Query Language 

```mysql 
SELECT <column_list>
FROM <table_name_list>
[WHERE condition] 
[GROUP BY <column_name>]
[HAVING <condition>]
[ORDER BY <column_name> [ASC|DESC]]
```

# Multi-Relation Queries

Interesting queries often combine data from more than one relation.

We can address several relations in one query by listing them all in the FROM clause.

Distinguish attributes of the same name by `<relation>.<attribute>`



The **formal semantics** of multi-relation queries is almost the same as for single-relation queries: 

1. Start with the product of all the relations in the FROM clause 
2. Apply the selection condition from the WHERE clause 
3. Project onto the list of attributes and expressions in the SELECT clause 


A general SQL query has the form: 
```mysql 
SELECT A1, A2, ..., An 
FROM R1, R2, ..., Rm 
WHERE P 
```

This is equivalent to the relational algebra expression. 

$\pi_{A1, A2, ..., An} (\sigma_P (R1 \times R2\; \times \;...\; \times Rm))$ 
 

#### Example 2

For example, we can create an SQL query from Example 2 with the Account and Deposit tables.  Remember the schema as: 

* Account(number, custID, balance, type) 
* Deposit(accountID, transID, dDate, amount) 

```mysql
SELECT number, balance 
FROM Account, Deposit 
WHERE accountID = number and amount > 1000; 
```

The equivalent relational algebra expression is: 
$ \pi_{number, balance} (\sigma_{accountID=number \;and\; amount > 1000}\;\; (Account \times Deposit))$ 

In [None]:
pretty_print_sql(conn2, """
SELECT number, balance 
FROM Account, Deposit 
WHERE accountID = number and Amount > 1000;
""")

Another example for 
```mysql 
SELECT * 
FROM Account, Deposit
``` 
with the equivalent relational algebra expression is $(Account \times Deposit)$

In [None]:
pretty_print_sql(conn2, """
SELECT * 
FROM Account, Deposit;
""")

#### Self-Join Example 

From the `teachers` relation, find all pairs of teachers from the same school. 
* do not return pairs like (Janet, Janet) 
* produce pairs in alphabetic order (Janet, Mike) not (Mike, Janet). 

In [None]:
pretty_print_sql(conn, """
SELECT t1.first_name, t2.first_name, t1.school
FROM teachers t1, teachers t2
WHERE t1.school = t2.school AND 
t1.first_name < t2.first_name
ORDER BY t1.school, t1.first_name;
""")

Note, "t1" is the correlation name for the first teachers table and "t2" is the correlation name for the second teachers table.  

You can choose the correlation names when you write a query, this is particularly useful for disambiguating attribute names.  

Here is another example of correlation names with Example 2. 

In [None]:
pretty_print_sql(conn2, """
SELECT A.owner, A.balance
FROM Account AS A, Deposit as D
WHERE D.accountID = A.number AND A.balance > 1000;
""")

Note, W. Wei appears twice, the result of the query is a "bag" not a set. 

## Joining Tables in Relational Database 

To connect tables in a query, we use `JOIN ... ON` statements (or other `JOIN` variants).  `JOIN` links one table with another in the database during a query, using matching values in columns we specify in both tables.  

```mysql 
SELECT *
FROM table_a JOIN table_b
ON table_a.key_column = table_b.foreign_key_column
```

Matching based on equality is the most common, but the `ON` clause can use anything that evaluates to *Boolean*, e.g., `ON table_a.key_column >= table_b.foreign_key_column`. 

#### Relating Tables with Key Columns 

Let's start with creating a few new tables in Example 1's database.  

In [None]:
print_sql(conn, """
CREATE TABLE departments (
    dept_id INTEGER,
    dept TEXT,
    city TEXT,
    CONSTRAINT dept_key PRIMARY KEY (dept_id),
    CONSTRAINT dept_city_unique UNIQUE (dept, city)
);
""")

The primary key is defined for `departments` with the `CONSTRAINT` keyword.  The `dept_id` column uniquesly identifies the department.  

In [None]:
print_sql(conn, """
CREATE TABLE employees (
    emp_id INTEGER,
    first_name TEXT,
    last_name TEXT,
    salary INTEGER,
    dept_id INTEGER REFERENCES departments (dept_id),
    CONSTRAINT emp_key PRIMARY KEY (emp_id),
    CONSTRAINT emp_dept_unique UNIQUE (emp_id, dept_id)
);
""")

In the `employees` table, the `emp_id` column uniquely identifies each row.  The table also includes a `dept_id` column, that refer to values in the `departments` table's primary key.  This is called a *foreign key*.  A foreign key constraint requires a value entered in a column to already exist in the primary key of the table it references.  

The `UNIQUE` constraint guarantees that values in a column, or a combination of values in more than one column, are unique. 

Now let's add some instances to both tables. 

In [None]:
print_sql(conn, """
INSERT INTO departments (dept_id, dept, city)
VALUES
    (1, 'Tax', 'Atlanta'),
    (2, 'IT', 'Boston');
""")

In [None]:
print_sql(conn,"SELECT * FROM departments;")

In [None]:
print_sql(conn, """
INSERT INTO employees (emp_id, first_name, last_name, salary, dept_id)
VALUES
    (1, 'Julia', 'Reyes', 115300, 1),
    (2, 'Janet', 'King', 98000, 1),
    (3, 'Arthur', 'Pappas', 72700, 2),
    (4, 'Michael', 'Taylor', 89500, 2);
""")

In [None]:
print_sql(conn, "SELECT * FROM employees;")

### Querying Multiple Tables using Join

In [None]:
pretty_print_sql(conn, """
SELECT *
FROM employees JOIN departments
ON employees.dept_id = departments.dept_id;
""")

The result of the `JOIN` operation include all values from both tables where values in the `dept_id` columns match.  The `dept_id` field appears twice because you selected all columns of both tables. 

##### Example 1 - Add tables 

In Example 1, two new tables are created and populated. 

In [None]:
conn.executescript("""
CREATE TABLE district1 (
    id INTEGER CONSTRAINT id_key_1 PRIMARY KEY,
    school_1 TEXT
);
""");
conn.executescript("""
CREATE TABLE district2 (
    id INTEGER CONSTRAINT id_key_2 PRIMARY KEY,
    school_2 TEXT
  );
""");
conn.executescript("""
INSERT INTO district1 VALUES
    (1, 'Oak Street School'),
    (2, 'Roosevelt High School'),
    (5, 'Dover Middle School'),
    (6, 'Webutuck High School');
""");
conn.executescript("""
INSERT INTO district2 VALUES
    (1, 'Oak Street School'),
    (2, 'Roosevelt High School'),
    (3, 'Morrison Elementary'),
    (4, 'Chase Magnet Academy'),
    (6, 'Webutuck High School');
""");

#### Join Types 

* **`JOIN`** Returns rows from both tables where matching values are found in the joined columns of both tables, alternative syntax `INNER JOIN` 
* **`LEFT JOIN`**  Returns every row from the left table plus rows that match values in the joined column from the right table. When a left table row doesn’t have a match in the right table, the result shows no values from the right table.
* **`RIGHT JOIN`**  Returns every row from the right table plus rows that match the key values in the key column from the left table. When a right table row doesn’t have a match in the left table, the result shows no values from the left table.  *NOTE, Right Join is not supported in SQLite*.
* **`FULL OUTER JOIN`**  Returns every row from both tables and matches rows; then joins the rows where values in the joined columns match. If there’s no match for a value in either the left or right table, the query result contains an empty row for the other table. *NOTE, Full outer join is not supported in SQLite*
* **`CROSS JOIN`** Returns every possible combination of rows from both tables.



##### JOIN 

Use `JOIN` or `INNER JOIN`. 

In [None]:
pretty_print_sql(conn, """
SELECT *
FROM district1 JOIN district2
ON district1.id = district2.id;
""");

We can also specify this as a `INNER JOIN`

In [None]:
pretty_print_sql(conn, """
SELECT *
FROM district1 INNER JOIN district2
ON district1.id = district2.id
ORDER BY district1.id;
""");

We can also specify the `JOIN` without the `ON` argument and use the `USING` option. 

In [None]:
pretty_print_sql(conn, """
SELECT *
FROM district1 JOIN district2
USING (id) 
ORDER BY district1.id;
""");

##### Left Join 

In [None]:
pretty_print_sql(conn, """
SELECT *
FROM district1 LEFT JOIN district2
ON district1.id = district2.id
ORDER BY district1.id;
""")

##### Right Join 

In [None]:
pretty_print_sql(conn, """
SELECT *
FROM district1 RIGHT JOIN district2
ON district1.id = district2.id
ORDER BY district1.id;
""")

##### Full Outer Join 


In [None]:
pretty_print_sql(conn, """
SELECT * 
FROM district1 FULL OUTER JOIN district2
ON district1.id = district2.id
ORDER BY district1.id;
""")

We can emulate a `FULL OUTER JOIN` in SQLite with a UNION statement. 

In [None]:
pretty_print_sql(conn, """
SELECT district1.*, district2.*
FROM district1 LEFT JOIN district2
ON district1.id = district2.id
UNION ALL 
SELECT district1.*, district2.*
FROM district2 LEFT JOIN district1
ON district1.id = district2.id
WHERE district1.id IS NULL;
""")

##### Cross Join 

In [None]:
pretty_print_sql(conn, """
SELECT *
FROM district1 CROSS JOIN district2
ORDER BY district1.id, district2.id;
""")

Alternatively, CROSS JOIN can be written with a comma-join syntax. 

In [None]:
pretty_print_sql(conn, """
SELECT *
FROM district1, district2
ORDER BY district1.id, district2.id;
""")

### Joining Multiple Tables 

Let's look at joining multiple tables together. 

In [None]:
conn.executescript("""
CREATE TABLE district1_enrollment (
    id INTEGER,
    enrollment INTEGER
);
""");
conn.executescript("""
CREATE TABLE district1_grades (
    id INTEGER,
    grades TEXT
  );
""");
conn.executescript("""
INSERT INTO district1_enrollment 
VALUES
    (1, 360),
    (2, 1001),
    (5, 450),
    (6, 927);
""");
conn.executescript("""
INSERT INTO district1_grades 
VALUES
    (1, 'K-3'),
    (2, '9-12'),
    (5, '6-8'),
    (6, '9-12');
""");

In [None]:
pretty_print_sql(conn, """
SELECT d1.id, 
       d1.school_1,
       en.enrollment, 
       gr.grades
FROM district1 AS d1 JOIN district1_enrollment AS en
    ON d1.id = en.id
JOIN district1_grades AS gr 
    ON d1.id = gr.id
ORDER BY d1.id;
""")

### Connections to Relational Algebra 

Here we can now explore additional connections to relational algebra operators. 

#### Union 

The union operator $p \cup q$ is related to 

```mysql 
SELECT * 
FROM p 
UNION 
SELECT * 
FROM q
```

In [None]:
pretty_print_sql(conn, """
SELECT * 
FROM district1 
UNION 
SELECT * 
FROM district2
ORDER BY id;
""")

In [None]:
pretty_print_sql(conn, """
SELECT * 
FROM district1 
UNION ALL
SELECT * 
FROM district2
ORDER BY id;
""")

In [None]:
pretty_print_sql(conn, """
SELECT '1' AS num, 
       school_1 AS school
FROM district1 
UNION ALL 
SELECT '2' AS num,
       school_2 AS school
FROM district2
ORDER BY school, num;
""")

#### Intersection 

The intersection operator $p \cap q$ is related to 

```mysql 
SELECT * 
FROM p 
INTERSECT 
SELECT * 
FROM q
```

In [None]:
pretty_print_sql(conn, """
SELECT * FROM district1
INTERSECT
SELECT * FROM district2
ORDER BY id;
""")

#### Set Difference 

The set different operator $p - q$ is related to 

```mysql 
SELECT * 
FROM p 
EXCEPT 
SELECT * 
FROM q
```

In [None]:
pretty_print_sql(conn, """
SELECT * FROM district1
EXCEPT
SELECT * FROM district2
ORDER BY id;
""")

### Performing Math on Joined Table 

Math functions can also be used when working with joined tables. 

First, let's load in data from the 2000 Census.  Note, the file format and headers are slightly different, but we can handle that. 

In [None]:
df = pd.read_csv("us_counties_2000.csv")
df.to_sql('us_counties_2000', conn, if_exists='append', index=False)
df.head()

Now we can look at calculating the change in population from 2000 to 2010 as a percentage. 

In [None]:
print_sql(conn, """
SELECT DISTINCT c2010.NAME,
      c2010.STUSAB AS state,
      c2010.P0010001 AS pop_2010,
      c2000.p0010001 AS pop_2000,
      c2010.P0010001 - c2000.p0010001 AS raw_change,
       round( (CAST(c2010.P0010001 AS FLOAT) - c2000.p0010001)
             / c2000.p0010001 * 100, 1 ) AS pct_change
FROM us_counties_2010 c2010 INNER JOIN us_counties_2000 c2000
ON c2010.STATE = c2000.state_fips
  AND c2010.COUNTY = c2000.county_fips
   AND c2010.P0010001 <> c2000.p0010001
ORDER BY pct_change DESC;
""")

### Information from Grouping and Summarizing

We will load in survey data from the Institute of Museum and Library Services (IMLS) in its annual Public Libraries Survey.  The survey collects data from more than 9,000 library entities. 

We will create two tables one from 2014 survey and one from 2009 survey. 

In [None]:
df = pd.read_csv("pls_fy2014_pupld14a.csv")
df.to_sql('pls_fy2014_pupld14a', conn, if_exists='append', index=False)
df.head()

In [None]:
df = pd.read_csv("pls_fy2009_pupld09a.csv")
df.to_sql('pls_fy2009_pupld09a', conn, if_exists='append', index=False)
df.head()

##### Counting Rows and Values using count()

The `count()` aggregate function can be used to check the number of rows.  The * returns the number of table rows regardless of whether they include `NULL` values. 

In [None]:
print_sql(conn, "SELECT count(*) FROM pls_fy2014_pupld14a;")

In [None]:
print_sql(conn, "SELECT count(*) FROM pls_fy2009_pupld09a;")

##### Counting Values in a Column 

Count the number of rows in salaries column from 2014 that have values. 

In [None]:
print_sql(conn, "SELECT count(SALARIES) FROM pls_fy2014_pupld14a;")

We can count the number of distinct values using the `DISTINCT` keyword. 

In [None]:
print_sql(conn, "SELECT count(LIBNAME) FROM pls_fy2014_pupld14a;")

In [None]:
print_sql(conn, "SELECT count(DISTINCT LIBNAME) FROM pls_fy2014_pupld14a;")

We would expect the library agency name to be unique, but we can see that there are only 8,515 out of the 9,305 rows.  

Looking closely we can see several duplicate names.  For example, there are nine library agencies named Oxford Public Library, each in a city or town named Oxford, in different states, e.g., Alabama, Connecticut, Kansas, etc. 

In [None]:
print_sql(conn, """
SELECT LIBNAME, count(LIBNAME)
FROM pls_fy2014_pupld14a
GROUP BY LIBNAME
ORDER BY count(LIBNAME) DESC;
""")

##### Finding Maximum and Minimum Values 

In [None]:
pretty_print_sql(conn, "SELECT max(VISITS), min(VISITS) FROM pls_fy2014_pupld14a;")

##### Aggregating Data Using GROUP BY 

We can use GROUP BY to see all the states represented in the survey. 

In [None]:
pretty_print_sql(conn, """
SELECT STABR 
FROM pls_fy2014_pupld14a
GROUP BY STABR 
ORDER BY STABR;
""")

Why are there more than 50 states, there are also territories, e.g., PR - Puerto Rico, GU - Guam, etc. 

We can also group by multiply columns, e.g., city and state. 

In [None]:
print_sql(conn, """
SELECT CITY, STABR
FROM pls_fy2014_pupld14a
GROUP BY CITY, STABR
ORDER BY CITY, STABR;
""")

We can use `GROUP BY` with aggregate functions.  For example, we can look at using `sum()` or `count()` for each state with the survey data. 

In [None]:
print_sql(conn, """
SELECT STABR, count(*)
FROM pls_fy2014_pupld14a
GROUP BY STABR
ORDER BY count(*) DESC;
""")

We can group by with multiple columns with the aggregate functions.  Here we consider both city and state.  

In [None]:
print_sql(conn, """
SELECT CITY, STABR, count(*)
FROM pls_fy2014_pupld14a
GROUP BY CITY, STABR
ORDER BY count(*) DESC;
""")

In [None]:
print_sql(conn, """
SELECT STABR, STATADDR, count(*)
FROM pls_fy2014_pupld14a
GROUP BY STABR, STATADDR
ORDER BY STABR, STATADDR;
""")

Let's look at `sum()` and library visits.  The value for library visits can be negative to indicate certain encodings in the data, we don't want to include them in our sum. 

In [None]:
print_sql(conn, """
SELECT sum(VISITS) AS visits_2014
FROM pls_fy2014_pupld14a
WHERE VISITS >= 0;
""")

In [None]:
print_sql(conn, """
SELECT sum(VISITS) AS visits_2009
FROM pls_fy2009_pupld09a
WHERE VISITS >= 0;
""")

In [None]:
pretty_print_sql(conn, """
SELECT sum(pls14.VISITS) AS visits_2014,
         sum(pls09.VISITS) AS visits_2009
FROM pls_fy2014_pupld14a pls14 JOIN pls_fy2009_pupld09a pls09
ON pls14.FSCSKEY = pls09.FSCSKEY
WHERE pls14.VISITS >= 0 AND pls09.VISITS >= 0;
""")

Now we can look at grouping visit sums by state. 

In [None]:
print_sql(conn, """
SELECT pls14.STABR,
         sum(pls14.VISITS) AS visits_2014,
         sum(pls09.VISITS) AS visits_2009,
         round( (CAST(sum(pls14.VISITS) AS FLOAT) - sum(pls09.VISITS)) /
                      sum(pls09.VISITS) * 100, 2 ) AS pct_change
FROM pls_fy2014_pupld14a pls14 JOIN pls_fy2009_pupld09a pls09
ON pls14.FSCSKEY = pls09.FSCSKEY
WHERE pls14.VISITS >= 0 AND pls09.VISITS >= 0
GROUP BY pls14.STABR
ORDER BY pct_change DESC;
""")

We can also refine the results by filtering with the `HAVING` clause. 

In [None]:
print_sql(conn, """
SELECT pls14.STABR,
         sum(pls14.VISITS) AS visits_2014,
         sum(pls09.VISITS) AS visits_2009,
         round( (CAST(sum(pls14.VISITS) AS FLOAT) - sum(pls09.VISITS)) /
                      sum(pls09.VISITS) * 100, 2 ) AS pct_change
FROM pls_fy2014_pupld14a pls14 JOIN pls_fy2009_pupld09a pls09
ON pls14.FSCSKEY = pls09.FSCSKEY
WHERE pls14.VISITS >= 0 AND pls09.VISITS >= 0
GROUP BY pls14.STABR
HAVING sum(pls14.VISITS) > 50000000
ORDER BY pct_change DESC;
""")

Note, that the answer to a relation query is always a table, we can use the answer from one query as input to another query. 

This means we can create arbitrarily complex queries. 

Links to examples of subqueries:   
https://www.sqlitetutorial.net/sqlite-subquery/  
https://www.sqltutorial.org/sql-subquery/


### Other Topics

There are many more topics associated with databases including modifying data, statistical functions, working with dates and times, advanced query methods, etc. 

This is meant to be a review of the basic methods. 