# 1. Introduction

In "Introduction to SQL," we wrote queries that filtered rows and columns in a database table. Each of the queries we ran returned multiple rows of values. In this lesson, we'll go over how to calculate the sum, average, minimum, or maximum of these results.

We'll also learn how to calculate summary statistics on subsets of a database table by working with data on job outcomes, compiled by FiveThirtyEight.

Let's start with some questions about how the data breaks down:

* How many majors had a higher representation among women? How many had a higher representation among men? What proportion of majors had the highest representation among women?
* Which category of majors had the lowest unemployment rates? Which category of majors had the highest representation among women?
* Which majors had the largest spread (difference) between the 25th and 75th percentile starting salaries?

Let's move on to the next screen to start learning!

In [3]:
%load_ext sql
%sql sqlite://


The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [4]:
%sql sqlite:////home/mohammeds/datasets/jobs.db

# 2. A Simple Question

In [8]:
%%sql

SELECT MIN(Unemployment_rate)
    FROM recent_grads;

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


MIN(Unemployment_rate)
0.0


# 3. Aggregate Functions

In [10]:
%%sql

SELECT SUM(Total)
    FROM recent_grads;

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


SUM(Total)
6776015


# 4. Order of Execution

In [11]:
%%sql

SELECT COUNT(Major)
    FROM recent_grads
WHERE Men>Women

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


COUNT(Major)
76


# 5. Missing Values

In [13]:
%%sql

SELECT COUNT(*)
    FROM recent_grads
    
SELECT COUNT(Major)
    FROM recent_grads
    

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
(sqlite3.OperationalError) near "SELECT": syntax error
[SQL: SELECT COUNT(*) FROM recent_grads
    
SELECT COUNT(Major)
    FROM recent_grads]
(Background on this error at: http://sqlalche.me/e/13/e3q8)


In [30]:
%%sql

SELECT COUNT(Rank), 
    COUNT(Major_code),
    COUNT(Major), 
    COUNT(Major_category), 
    COUNT(Total), 
    COUNT(Sample_size), 
    COUNT(Men), 
    COUNT(Women),
    COUNT(ShareWomen), 
    COUNT(Employed), 
    COUNT(Full_time_year_round), 
    COUNT(Unemployed), 
    COUNT(Unemployment_rate), 
    COUNT(Median), 
    COUNT(P25th), 
    COUNT(P75th), 
    COUNT(College_jobs), 
    COUNT(Non_college_jobs), 
    COUNT(Low_wage_jobs)

    FROM recent_grads;
    

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


COUNT(Rank),COUNT(Major_code),COUNT(Major),COUNT(Major_category),COUNT(Total),COUNT(Sample_size),COUNT(Men),COUNT(Women),COUNT(ShareWomen),COUNT(Employed),COUNT(Full_time_year_round),COUNT(Unemployed),COUNT(Unemployment_rate),COUNT(Median),COUNT(P25th),COUNT(P75th),COUNT(College_jobs),COUNT(Non_college_jobs),COUNT(Low_wage_jobs)
173,173,173,173,173,173,173,173,173,173,173,173,172,173,173,173,173,173,173


In [33]:
%%sql

SELECT COUNT(*), COUNT(Unemployment_rate)
    FROM recent_grads;

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


COUNT(*),COUNT(Unemployment_rate)
173,172


# 6. Combining Multiple Aggregation Functions

In [35]:
%%sql

SELECT AVG(Total), MIN(Men), MAX(Women)
    from recent_grads;

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


AVG(Total),MIN(Men),MAX(Women)
39167.71676300578,119,307087


# 7. Customizing the Results

In [40]:
%%sql

SELECT COUNT(*) AS "Number of Majors", 
MAX(Unemployment_rate) AS "Highest Unemployment Rate"
    FROM recent_grads;

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Number of Majors,Highest Unemployment Rate
173,0.177226407


# 8. Counting Unique Values

In [47]:
%%sql

SELECT COUNT(DISTINCT Major) AS unique_majors, 
    COUNT(DISTINCT Major_category) AS unique_major_categories,
    COUNT(DISTINCT Major_code) AS unique_major_codes
    FROM recent_grads;

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


unique_majors,unique_major_categories,unique_major_codes
173,16,173


# 9. Data Types

Let's consider the different kinds of values we got:

* In the Major column, we see text.
* In the Total, Men, and Women columns, we see integers.
* In the Unemployment_rate column, we see decimal numbers.

Each of the above is a data type. Each column has exactly one type of value; it can't be mixed.

You can read more about the SQLite data types here. We'll explore them from the point of view of the database when we learn how to create tables.

For now, we'll focus on some of the things we can do with different data types.

# 10. String Functions and Operations

In [52]:
%%sql

SELECT "Major: " || LOWER(Major) AS Major,
        Total,
        Men,
        Women,
        Unemployment_rate,
        LENGTH(Major) AS Length_of_name
    FROM recent_grads
ORDER BY Unemployment_rate DESC
LIMIT 3;

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major,Total,Men,Women,Unemployment_rate,Length_of_name
Major: nuclear engineering,2573,2200,373,0.177226407,19
Major: public administration,5629,2947,2682,0.1594905999999999,21
Major: computer networking and telecommunications,7613,5291,2322,0.151849807,42


# 11. Performing Arithmetic in SQL

In [59]:
%%sql

SELECT Major,
        Major_category,
        (P75th - P25th) AS quartile_spread
    FROM recent_grads
ORDER BY quartile_spread ASC
LIMIT 20;

   sqlite://
 * sqlite:////home/mohammeds/datasets/jobs.db
Done.


Major,Major_category,quartile_spread
MILITARY TECHNOLOGIES,Industrial Arts & Consumer Services,0
SCHOOL STUDENT COUNSELING,Education,2000
LIBRARY SCIENCE,Education,2000
COURT REPORTING,Law & Public Policy,4000
PHARMACOLOGY,Biology & Life Science,5000
EDUCATIONAL ADMINISTRATION AND SUPERVISION,Education,6000
COUNSELING PSYCHOLOGY,Psychology & Social Work,6800
SPECIAL NEEDS EDUCATION,Education,10000
MATHEMATICS TEACHER EDUCATION,Education,10000
SOCIAL WORK,Psychology & Social Work,10000


# 12. Next Steps

In this lesson, we did the following:

* Explored how to calculate summary statistics in SQL.
* Learned about different types of functions.
* Learned about data types in SQL.

In the next lesson, we'll learn how to calculate statistics within specific subgroups using the GROUP BY statement.