In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


# Introduction to SQL

By Maggie Orton and Alex Cao

<a href="http://cscar.research.umich.edu/" target="_blank">CSCAR</a> at The University of Michigan

Structured Query Language ("SQL") allows you to extract or change specific information in a relational database (i.e. a series of tables). 

 MySQL, SQLite, PostgreSQL, SQL Server, etc. are all database management systems that rely on SQL. Each has its own special variety of SQL, but the general format of the queries is the same. 

We'll practice SQL using the W3Schools online database <a href="https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all" target="_blank">here</a>

# SQL Query Format



# SELECT, FROM, AS
The SELECT command specifies the desired columns 

The FROM command specifies the table from which those columns should be selected

The AS command temporarily renames a column or table with the specified name (an "alias")

An '\*' character selects all columns in a table

## Example
The 'Employees' table contains the columns:

>   `EmployeeID, LastName, FirstName, BirthDate, Photo, Notes`

To retrieve all columns from this table, you would use the command 

> `SELECT * FROM Employees`
    
To retrieve only the name-related columns from this table, you would use the command

> `SELECT LastName, FirstName FROM Employees`
    
To do the above while renaming the columns as "last" and "first":

> `SELECT LastName AS Last, FirstName AS First FROM Employees`

## Practice
1. Retrieve the Products table

2. Retrieve only the product names and prices from Products

3. Retrieve the product names and prices from Products, but call them PN and Dollars

### Answers
1. SELECT * FROM Products (77)
2. SELECT ProductName, Price FROM Products (77)
3. SELECT ProductName AS 'PN', Price AS 'Dollars' FROM Products (77)

# WHERE, AND, OR
The WHERE command retrieves rows that satisfy a given condition

The BETWEEN command gives a range of values (inclusive)

The IN command gives multiple possible matching values

The LIKE command specifies a pattern to match; see next slide set for SQL patterns

Potential conditions:
- =          Equals  
- != or <>   Does not equal
- /> or <    Is greater/less than
- />= or <=  Is greater/less than or equal to
- BETWEEN ... AND ...
- IN (...,...,...)
- LIKE '...'

BETWEEN, IN, and LIKE can all be modified using the "NOT" keyword to retrieve rows that are not a match
    

## Example
Using the 'Employees' table again, suppose you want to retrieve all employee information for those born before 1960:
    
    SELECT * FROM Employees WHERE BirthDate < '1960-01-01'
    
Filtering out everyone whose last name is King:
    
    SELECT * FROM Employees WHERE LastName != 'King'

Applying both filters: 

    SELECT * FROM Employees WHERE Lastname != 'King' AND BirthDate < '1960-01-01'

Retrieving Employee IDs for employees named Nancy or Andrew:

    SELECT EmployeeID FROM Employees WHERE FirstName IN ('Nancy', 'Andrew')
    
Retrieving Employee IDs for employees not named Nancy or Andrew:

    SELECT EmployeeID FROM Employees WHERE FirstName NOT IN ('Nancy', 'Andrew')    

## Practice
Retrieve the Product ID and Quantity from the OrderDetails table for Order IDs between 10250 and 10350 (inclusive)

### Answers
SELECT ProductID, Quantity FROM OrderDetails WHERE OrderID BETWEEN 10250 AND 10350 (269)

# LIKE, wildcards
The LIKE command specifies a pattern to match, such as 'Nancy' or '1999'. 

Wildcards are characters that stand in for a range of possible values. 

Wildcards:
    
    %      A string of 0+ characters
    
    _      A single character
    
    [...]  A single character from the range or list in the brackets
    
    [^...] A single character not from the range or list in the brackets

Without any wildcards, LIKE will match only values equal to the exact pattern.

## Example
To match all customers with names starting with M, we use the query

    SELECT * FROM Customers WHERE CustomerName Like 'M%'

To match all customers with names starting with letters after D, we can use the query

    SELECT * FROM Customers WHERE CustomerName Like '[^a-d]%'

## Practice
1. Retrieve all information about customers from two- or three-letter named countries
2. Retrieve the customer names from people whose contacts have a last name starting with S

### Answers
1. SELECT * FROM Customers WHERE Country LIKE '___' \(13)
2. SELECT CustomerName FROM Customers WHERE ContactName LIKE '% S%' (7)

# DISTINCT
The DISTINCT command retrieves only distinct combinations of the specified columns.

## Example
To retrieve a list of all CustomerIDs with orders from the Orders table:

    SELECT DISTINCT CustomerID FROM Orders

To retrieve all past combinations of employee IDs and shipper IDs:
    
    SELECT DISTINCT EmployeeID, ShipperID FROM Orders

## Practice

### Answers

# ORDER BY, TOP, LIMIT
ORDER BY sorts the results according to the specified columns. Default is ascending, but you can use ASC/DESC to specify the ordering.

TOP and LIMIT (syntax depends on type of database) retrieve only the first x results or percent of results; good for checking results before requesting a very large query

## Example
To retrieve the first three rows of customers:
    
    SELECT TOP 2 * FROM Customers

To retrieve the ten most expensive products:

    SELECT TOP 10 ProductName, Price FROM Products ORDER BY Price DESC

To retrieve the top ten percent of products by price:

    SELECT TOP 10 PERCENT ProductName, Price FROM Products ORDER BY Price DESC

## Practice
1. Return the names of the oldest 50% of employees
2. Return the top 10 results for products sorted by supplier ID (ascending) then product name (ascending)

### Answers
1. SELECT TOP 50 PERCENT LastName, FirstName FROM Employees ORDER BY BirthDate
2. SELECT TOP 10 * FROM Products ORDER BY SupplierID ASC, ProductName DESC

# JOIN
JOIN connects two tables where the specified columns match.


INNER JOIN - all rows where specified columns match (default)

LEFT JOIN - all rows from left table and matching rows in right table

RIGHT JOIN - all rows from right table and matching rows in left table

FULL OUTER JOIN - all rows from left and right table

<img src="pcs.png">
Image excerpt from <a href="http://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf" target="_blank">Pandas Cheat Sheet</a>

## Example


## Practice

### Answers

# UNION
UNION combines the results of 2+ SELECT queries.

Requirements: same number, type, and order of columns

Default is only distinct results; use UNION ALL for all results

## Example


## Practice

### Answers

# primary keys

## Example

## Practice

### Answers