# Graded Assignment 3: 9 to 5

Time to show off your SQL skills! For each question, copy the SQL query you used and make note of the answer.

## The Dataset

For this assignment, you will be using the Bureau of Labor Statistics (BLS) Current Employment Survey (CES) results which can be found on [Kaggle](https://www.kaggle.com/datasets/bls/employment).

## Business Issue

You are working for the Bureau of Labor Statistics with the United States government and have been approached by your boss with an important meeting request. You have been asked by your supervisor to meet with Dolly Parton whose nonprofit is looking to shed light on the state of employment in the United States. As part of the 9 to 5 project, their research is focused on production and nonsupervisory employees and how those employees fare compared to all employees in the United States. While the data the BLS collects from the CES is publicly available, Dolly Parton and her colleagues need your assistance navigating the thousands of rows in each table in LaborStatisticsDB.

## About the Dataset

This dataset comes directly from the Bureau of Labor Statistics’ Current Employment Survey (CES). Here are some things you need to know:

1. The industry table contains an NAICS code. This is different from the industry code. NAICS stands for North American Industry Classification System.
1. Series ID is composed of multiple different codes. CES stands for Current Employment Survey, the name of the survey which collected the data. The industry code as specified by the BLS and the data type code as specified in the datatype table.

## Set Up

To connect to the database, use the same connection info used during the SQL lessons. 

For the assignment, we will be using `LaborStatisticsDB`.

## Database Exploration

To start with, let’s get to know the database further.

1. Use this space to make note of each table in the database, the columns within each table, each column’s data type, and how the tables are connected. You can write this down or draw a diagram. Whatever method helps you get an understanding of what is going on with `LaborStatisticsDB`.
   
   To add a photo, diagram or document to your file, drop the file into the folder that holds this notebook.  Use the link button to the right of the  </> symbol in the gray part of this cell, the link is just the name of your file.

9 tables
1 (3,6,8(2)annual_2016 - (id)(series_id)(year)(period)(value)(footnote_codes)(original_file) -29042 rows

2 (8)datatype - codes and descriptions for data types found on other tables - (data_type_code)(data_type_text) -45 rows

3 footnote - codes and description for footnote types (footnote_code)(footnote_text) - 2 rows

4 (8)industry - (id)(industry_code)(naisc_code)(publishing_status)(industry_name)(display_level)(selectable)(sort_sequence)- 915 rows

5 (3,6,8(2)january_2017 - (id)(series_id)(year)(period)(value)(footnote_codes)(original_file)- 58995 rows

6 period - code/description - 12 months plus annual average (period_code)(month_abbr)(month) - 13 rows

7 (4)seasonal - Y/N seasonally adjusted - (industry_code)(seasonal_text) - 2 rows

8 (9,2)series - series by code/descr (series_id)(supersector_code)(industry_code)(data_type_code)(seasonal)(series_title) -26709 rows

9 (8)supersector - codes and description for supersector types found on other tables - (supersector_code)(supersector_name)-22 rows

2. What is the datatype for women employees?

In [None]:
SELECT * FROM dbo.datatype
WHERE data_type_text LIKE '%women%';
--data_type_code = 10

3. What is the series id for  women employees in the commercial banking industry in the financial activities supersector?

In [None]:
SELECT * FROM dbo.series
WHERE data_type_code = '10' AND industry_code = '55522110' AND supersector_code = '55'
--CES5552211010	55	55522110	10	S	Women employees

## Aggregate Your Friends and Code some SQL

Put together the following:

1. How many employees were reported in 2016 in all industries? Round to the nearest whole number.

In [None]:
SELECT SUM(value) as total_employees_2016
FROM annual_2016
WHERE RIGHT (series_id, 2) IN ('01', '25', '26');
--2,340,612

2. How many women employees were reported in 2016 in all industries? Round to the nearest whole number. 

In [None]:
SELECT SUM(value) as total_employees_2016
FROM annual_2016
WHERE RIGHT (series_id, 2) = '10';
--1,125,490

3. How many production/nonsupervisory employees were reported in 2016? Round to the nearest whole number. 

In [None]:
SELECT SUM(value) as total_prod_nonsup_emp_2016
FROM annual_2016
WHERE RIGHT (series_id, 2) = '06';
--1,263,650

4. In January 2017, what is the average weekly hours worked by production and nonsupervisory employees across all industries?

In [None]:
SELECT AVG(value) as avg_weeklyhrs_prod_nonsup_emp_2017
FROM january_2017
WHERE RIGHT (series_id, 2) = '07';
--36.06

5. What is the total weekly payroll for production and nonsupervisory employees across all industries in January 2017? Round to the nearest penny.

In [None]:
Select sum(value) as total_wkly_payroll_prod_nonsup_emp_2017
FROM january_2017
WHERE RIGHT (series_id, 2) = '82'
--1,838,753,220.00

6. In January 2017, for which industry was the average weekly hours worked by production and nonsupervisory employees the highest? Which industry was the lowest?

In [None]:
SELECT
    (SELECT TOP 1 series_id
     FROM january_2017
     WHERE RIGHT(series_id, 2) = '07'
     ORDER BY value DESC) AS max_series_id,
     
    MAX(value) AS max_hours_prod_nonsup_emp_2017,
    
    (SELECT TOP 1 series_id
     FROM january_2017
     WHERE RIGHT(series_id, 2) = '07'
     ORDER BY value ASC) AS min_series_id,
     
    MIN(value) AS min_hours_prod_nonsup_emp_2017
FROM january_2017
WHERE RIGHT(series_id, 2) = '07';

Select industry_code, series_id from series
Where series_id IN ('CES3133635007', 'CEU7071394007');
  
SELECT industry_name, industry_code from industry
WHERE industry_code IN ('31336350', '70713940');
--MAX - Motor vehicle power train components	31336350 CES3133635007 49.8
--MIN - Fitness and recreational sports centers	70713940 CEU7071394007 16.7

7. In January 2017, for which industry was the total weekly payroll for production and nonsupervisory employees the highest? Which industry was the lowest?

In [None]:
SELECT TOP 1
    s.industry_code,
    i.industry_name,
    SUM(j.value) AS total_weekly_payroll
FROM january_2021 j
JOIN series s ON j.series_id = s.series_id
JOIN industry i ON s.industry_code = i.industry_code
WHERE RIGHT(j.series_id, 2) = '82'
  AND j.period = 'M01'
GROUP BY s.industry_code, i.industry_name
ORDER BY total_weekly_payroll DESC;
--Total Private (Industry) was highest total production/nonsup employee weekly payroll
--$295,944,946
SELECT TOP 1
    s.industry_code,
    i.industry_name,
    SUM(j.value) AS total_weekly_payroll
FROM january_2021 j
JOIN series s ON j.series_id = s.series_id
JOIN industry i ON s.industry_code = i.industry_code
WHERE RIGHT(j.series_id, 2) = '82'
  AND j.period = 'M01'
GROUP BY s.industry_code, i.industry_name
ORDER BY total_weekly_payroll ASC;
-- Coin-operated laundries and drycleaners (Industry) was lowest total production/nonsup employee weekly payroll
-- $40,448


## Join in on the Fun

Time to start joining! You can choose the type of join you use, just make sure to make a  note!

1. Join `annual_2016` with `series` on `series_id`. We only want the data in the `annual_2016` table to be included in the result.

In [None]:
SELECT TOP 50 *
FROM annual_2016 a
Inner Join series s
    ON a.series_id = s.series_id
ORDER BY a.id;
--inner join

2. Join `series` and `datatype` on `data_type_code`.

In [None]:

SELECT TOP 50 *
From series s
Inner Join datatype d
    ON s.data_type_code = d.data_type_code
ORDER BY id
--inner join

3. Join `series` and `industry` on `industry_code`.

In [None]:
SELECT TOP 50 *
FROM series s
INNER JOIN industry i
    ON s.industry_code = i.industry_code
ORDER BY s.series_id;


## Subqueries, Unions, Derived Tables, Oh My!

1. Write a query that returns the `series_id`, `industry_code`, `industry_name`, and `value` from the `january_2017` table but only if that value is greater than the average value for `annual_2016` of `data_type_code` 82.

In [None]:
SELECT j.series_id, s.industry_code, i.industry_name, j.value
FROM january_2017 j
JOIN series s ON j.series_id = s.series_id
JOIN industry i ON s.industry_code = i.industry_code
WHERE RIGHT(j.series_id,2) = '82'
  AND j.value > (
      SELECT AVG(value)
      FROM annual_2016
      WHERE RIGHT(series_id,2) = '82'
  );


2. Create a `Union` table comparing average weekly earnings of production and nonsupervisory employees between `annual_2016` and `january_2017` using the data type 30.  Round to the nearest penny.  You should have a column for the average earnings and a column for the year, and the period.

In [None]:
SELECT ROUND(AVG(a.value), 2) AS avg_earnings, '2016' AS year, 'annual' AS period
FROM annual_2016 a
JOIN series s ON a.series_id = s.series_id
WHERE RIGHT(a.series_id, 2) = '30'

UNION ALL

SELECT ROUND(AVG(j.value), 2) AS avg_earnings, '2017' AS year, 'January' AS period
FROM january_2017 j
JOIN series s ON j.series_id = s.series_id
WHERE RIGHT(j.series_id, 2) = '30';
-- $797.20	2016	Annual
-- $808.53	2017	January

## Summarize Your Results

With what you know now about the  Bureau of Labor Statistics (BLS) Current Employment Survey (CES) results and working with the Labor Statistics Database, answer the following questions. Note that while this is subjective, you should include relevant data to back up your opinion.

1. During which time period did production and nonsupervisory employees fare better?

Based on results for average weekly earnings, average weekly hours, and total weekly payroll:
Workers in 2017 fared better with higher earnings, consistent full time hours, and weekly payroll that shows increase in employement and payroll.

2. In which industries did production and nonsupervisory employees fare better?

The highest total payroll went to the Total Private sector which had a larger number of employees and higher wages. Motor Vehicle Power Train Components the highest average hours worked per week, meaning production stayed high and lead to strong wages.

3. Now that you have explored the datasets, is there any data or information that you wish you had in this analysis?

It would be helpful to have further information on demographics or more descriptive metrics in general. This would allow us to explore how companies of all sizes perform. Same for breaking down data into time based series, we could track more along the lines of monthly performance across industries. There is also no distinction for rural, metro, or other regional data - so we can not compare based on location or setting. Knowing why some industries perform better would be a great next step if more data was available.