# Exercises

Put your grouping & aggregating skills to the test with these challenges:

1. We saw that library visits have declined recently in most places. But what is the pattern in library employment? All three library survey tables contain the column `totstaff`, which is the number of paid full-time equivalent employees. Calculate the percentage change in the sum of the column over time, examining all states as well as states with the most visitors. Watch out for negative values!
2. The library survey tables contain a column called `obereg`, a two-digit Bureau of Economic Analysis Code that classifies each library agency according to a region of the United States, such as New England, Rocky Mountains, & so on. Just as we calculated the percent change in visits grouped by state, do the same to group percent changes in visits by US region using `obereg`. Consult the survey documentation to find the meaning in each region code. For a bonus challenge, create a table with the `obereg` code as the primary key & the region name as text, & join it to the summary query to group by the region name rather than the code.
3. Thinking back on the types of joins you've learned, which join type will show you all the rows in all three tables, including those without a match? Write such a query & add an `IS NULL` filter in a `WHERE` clause to show agencies not included in one or more tables.

---

# 1.

We remove negative values for `totstaff` & `visits`. 

```
SELECT pls18.stabr,
       sum(pls18.totstaff) AS staff_2018,
       sum(pls17.totstaff) AS staff_2017,
       sum(pls16.totstaff) AS staff_2016,
       round((sum(pls18.totstaff) -
           sum(pls17.totstaff)) / sum(pls17.totstaff) *
           100, 1) AS chg_2018_17,
       round((sum(pls17.totstaff) -
           sum(pls16.totstaff)) / sum(pls16.totstaff) *
           100, 1) AS chg_2017_16
FROM pls_fy2018_libraries AS pls18
JOIN pls_fy2017_libraries AS pls17
    ON pls18.fscskey = pls17.fscskey
JOIN pls_fy2016_libraries AS pls16
    ON pls18.fscskey = pls16.fscskey
WHERE pls18.totstaff >= 0 AND pls18.visits >= 0
    AND pls17.totstaff >= 0 AND pls17.visits >= 0
    AND pls16.totstaff >= 0 AND pls16.visits >= 0
GROUP BY pls18.stabr
ORDER BY chg_2018_17 DESC;
```

<img src = "Exercise Images/Total Paid Full-Time Employees in 2018, 2017, & 2016.png" width = "600" style = "margin:auto"/>

Here are the states whose libraries see the greatest increase in the number of paid full-time employees from 2017 to 2018 that also total over 50 million library visits in 2018.

```
SELECT pls18.stabr,
       sum(pls18.totstaff) AS staff_2018,
       sum(pls17.totstaff) AS staff_2017,
       sum(pls16.totstaff) AS staff_2016,
       round((sum(pls18.totstaff) -
           sum(pls17.totstaff)) / sum(pls17.totstaff) *
           100, 1) AS chg_2018_17,
       round((sum(pls17.totstaff) -
           sum(pls16.totstaff)) / sum(pls16.totstaff) *
           100, 1) AS chg_2017_16
FROM pls_fy2018_libraries AS pls18
JOIN pls_fy2017_libraries AS pls17
    ON pls18.fscskey = pls17.fscskey
JOIN pls_fy2016_libraries AS pls16
    ON pls18.fscskey = pls16.fscskey
WHERE pls18.totstaff >= 0 AND pls18.visits >= 0
    AND pls17.totstaff >= 0 AND pls17.visits >= 0
    AND pls16.totstaff >= 0 AND pls16.visits >= 0
GROUP BY pls18.stabr
HAVING sum(pls18.visits) > 50000000
ORDER BY chg_2018_17 DESC;
```

<img src = "Exercise Images/Total Paid Full-Time Employees in 2018, 2017, & 2016 with over 50 Million Total Visits.png" width = "600" style = "margin:auto"/>

# 2.

From page A-12 of the [documentation for 2018](https://www.imls.gov/sites/default/files/2018_pls_data_file_documentation.pdf), we have the data definitions for `obereg`. We'll use this to create a table to join onto `pls_fy2018_libraries` table. Since the `obereg` column in `pls_fy2018_libraries` is a text column, we'll make it a text column here as well.

```
CREATE TABLE obereg_definitions (
    obereg text PRIMARY KEY,
    obereg_description text
);

INSERT INTO obereg_definitions
VALUES ('01', 'New England'),
       ('02', 'Mid East'),
       ('03', 'Great Lakes'),
       ('04', 'Plains'),
       ('05', 'Southeast'),
       ('06', 'Southwest'),
       ('07', 'Rocky Mountains'),
       ('08', 'Far West'),
       ('09', 'Outlying Areas');

SELECT * FROM obereg_definitions;
```

<img src = "Exercise Images/Creating the obereg_definitions Table.png" width = "600" style = "margin:auto"/>

Here are the total visits & percent change of total visits year-over-year, grouped by `obereg`.

```
SELECT pls18.obereg, 
	   obe_def.obereg_description,
	   sum(pls18.visits) AS visits_2018,
	   sum(pls17.visits) AS visits_2017,
	   sum(pls16.visits) AS visits_2016,
	   round((sum(pls18.visits::numeric) - 
	   	   sum(pls17.visits)) / sum(pls18.visits) * 100,
		   2) AS change_2018_17,
	   round((sum(pls17.visits::numeric) - 
	   	   sum(pls16.visits)) / sum(pls17.visits) * 100,
		   2) AS change_2017_16
FROM pls_fy2018_libraries AS pls18
JOIN pls_fy2017_libraries AS pls17
	ON pls18.fscskey = pls17.fscskey
JOIN pls_fy2016_libraries AS pls16
	ON pls18.fscskey = pls16.fscskey
JOIN obereg_definitions AS obe_def
	ON pls18.obereg = obe_def.obereg
WHERE pls18.visits >= 0 
	AND pls17.visits >= 0
	AND pls16.visits >= 0
GROUP BY pls18.obereg, obe_def.obereg_description
ORDER BY obereg;
```

<img src = "Exercise Images/Percent Change in Total Visits Grouped by obereg.png" width = "600" style = "margin:auto"/>

# 3.

We've been using `JOIN` or `INNER JOIN`, which returns matching rows from both tables. To return all rows, matching or not matching, we want to use `FULL JOIN` or `FULL OUTER JOIN`. We can then find the libraries that were not open at some point during the three year span, either because they closed down at a certain year, or were newly opened at a certain year with `IS NULL`.

```
SELECT pls18.libname AS lib2018,
	   pls17.libname AS lib2017,
	   pls16.libname AS lib2016
FROM pls_fy2018_libraries AS pls18
FULL JOIN pls_fy2017_libraries AS pls17
	ON pls18.fscskey = pls17.fscskey
FULL JOIN pls_fy2016_libraries AS pls16
	ON pls18.fscskey = pls16.fscskey
WHERE pls18.libname IS NULL
	OR pls17.libname IS NULL
	OR pls16.libname IS NULL;
```

<img src = "Exercise Images/Finding Non-Matching Values with FULL JOIN & IS NULL.png" width = "600" style = "margin:auto"/>