## Selecting Data With SQL

### SELECT Statement


In [24]:
# Importing sqlite3 library
import sqlite3
# To display our queries as a table
import pandas as pd
# Connecting to the database
conn = sqlite3.connect(r'C:\Users\nrmmw\Documents\Flatiron\Repos\Phase_2\dsc-selecting-data-v2-4\data.sqlite')

This is the schema for the data database

<img src="https://curriculum-content.s3.amazonaws.com/data-science/images/Database-Schema.png">

In [26]:
# To get all records from the employees table
pd.read_sql("""
SELECT * FROM employees;
""", conn)

Unnamed: 0,employeeNumber,lastName,firstName,extension,email,officeCode,reportsTo,jobTitle
0,1002,Murphy,Diane,x5800,dmurphy@classicmodelcars.com,1,,President
1,1056,Patterson,Mary,x4611,mpatterso@classicmodelcars.com,1,1002.0,VP Sales
2,1076,Firrelli,Jeff,x9273,jfirrelli@classicmodelcars.com,1,1002.0,VP Marketing
3,1088,Patterson,William,x4871,wpatterson@classicmodelcars.com,6,1056.0,Sales Manager (APAC)
4,1102,Bondur,Gerard,x5408,gbondur@classicmodelcars.com,4,1056.0,Sale Manager (EMEA)
5,1143,Bow,Anthony,x5428,abow@classicmodelcars.com,1,1056.0,Sales Manager (NA)
6,1165,Jennings,Leslie,x3291,ljennings@classicmodelcars.com,1,1143.0,Sales Rep
7,1166,Thompson,Leslie,x4065,lthompson@classicmodelcars.com,1,1143.0,Sales Rep
8,1188,Firrelli,Julie,x2173,jfirrelli@classicmodelcars.com,2,1143.0,Sales Rep
9,1216,Patterson,Steve,x4334,spatterson@classicmodelcars.com,2,1143.0,Sales Rep


`"""""""` is preferably used because most SQL statements are multiline. There is however nothing wrong with using `""`.

Indentation does not affect the code.

In [27]:
# Retrieving specific columns
# Retrieved in order of being written
pd.read_sql("""
SELECT firstname, lastname 
    FROM employees;
""", conn)

Unnamed: 0,firstName,lastName
0,Diane,Murphy
1,Mary,Patterson
2,Jeff,Firrelli
3,William,Patterson
4,Gerard,Bondur
5,Anthony,Bow
6,Leslie,Jennings
7,Leslie,Thompson
8,Julie,Firrelli
9,Steve,Patterson


In [30]:
# AS keyword can be used to set aliases for columns
pd.read_sql("""
SELECT firstname AS name
    FROM employees
""",conn).head()

Unnamed: 0,name
0,Diane
1,Mary
2,Jeff
3,William
4,Gerard


## CASE statements

CASE statements work like if statements in Python

Python: If, elif, else
SQL: CASE, WHEN, THEN, ELSE, END

* `CASE` - Start of conditional statement
* `END` - End of conditional statement
* `WHEN`- Similar to if in Python
* `THEN` - Execution statement when the `WHEN` statement is True. If True, jumps to END, otherwise it goes to the next `WHEN` statement
* `ELSE` - If no CASE statement returns True, ELSE is executed

Let us categorize the `jobTitle` field on whether one is a Sales Rep or not. 
(Binning the column values)

In [62]:
pd.read_sql("""
SELECT firstname, lastname, jobTitle,
        CASE
        WHEN jobTitle = "Sales Rep" THEN "Sales Rep"
        ELSE "Not Sales Rep"
        END AS Role
    FROM employees;
""", conn).head(10)

Unnamed: 0,firstName,lastName,jobTitle,Role
0,Diane,Murphy,President,Not Sales Rep
1,Mary,Patterson,VP Sales,Not Sales Rep
2,Jeff,Firrelli,VP Marketing,Not Sales Rep
3,William,Patterson,Sales Manager (APAC),Not Sales Rep
4,Gerard,Bondur,Sale Manager (EMEA),Not Sales Rep
5,Anthony,Bow,Sales Manager (NA),Not Sales Rep
6,Leslie,Jennings,Sales Rep,Sales Rep
7,Leslie,Thompson,Sales Rep,Sales Rep
8,Julie,Firrelli,Sales Rep,Sales Rep
9,Steve,Patterson,Sales Rep,Sales Rep


CASE can also be used to make a column more readable. Like converting pre-formatted integers into their corresponding string values

In [60]:
pd.read_sql("""
SELECT firstname, lastname, officeCode,
        CASE
        WHEN officeCode = "1" THEN "San Francisco, CA"
        WHEN officeCode = "2" THEN "Boston, MA"
        WHEN officeCode = "3" THEN "New York, NY"
        WHEN officeCode = "4" THEN "Paris, France"
        WHEN officeCode = "5" THEN "Tokyo, Japan"
        END AS Office
    FROM employees;
""", conn).head()
# The lack of an ELSE statement or lack of specifying "6" 
# results in a null value

Unnamed: 0,firstName,lastName,officeCode,Office
0,Diane,Murphy,1,"San Francisco, CA"
1,Mary,Patterson,1,"San Francisco, CA"
2,Jeff,Firrelli,1,"San Francisco, CA"
3,William,Patterson,6,
4,Gerard,Bondur,4,"Paris, France"


CASE statements can also be written as below to avoid repetition

In [59]:
pd.read_sql("""
SELECT firstname, lastname, officeCode,
        CASE officeCode
        WHEN "1" THEN "San Francisco, CA"
        WHEN "2" THEN "Boston, MA"
        WHEN "3" THEN "New York, NY"
        WHEN "4" THEN "Paris, France"
        WHEN "5" THEN "Tokyo, Japan"
        ELSE "HQ"
        END AS office
    FROM employees
""", conn).head()

Unnamed: 0,firstName,lastName,officeCode,office
0,Diane,Murphy,1,"San Francisco, CA"
1,Mary,Patterson,1,"San Francisco, CA"
2,Jeff,Firrelli,1,"San Francisco, CA"
3,William,Patterson,6,HQ
4,Gerard,Bondur,4,"Paris, France"


## Built-in SQL Functions for String Manipulation

`length` - returns the number of characters

In [58]:
pd.read_sql("""
SELECT firstname AS Fname, length(firstname) AS Fname_length, 
lastname AS Lname, length("lastname") AS Lname_length
    FROM employees;
""", conn).head()

Unnamed: 0,Fname,Fname_length,Lname,Lname_length
0,Diane,5,Murphy,6
1,Mary,4,Patterson,9
2,Jeff,4,Firrelli,8
3,William,7,Patterson,9
4,Gerard,6,Bondur,6


`upper` converts the characters to uppercase

`lower` converts the characters to lowercase

In [57]:
pd.read_sql("""
SELECT firstname, upper(firstname) AS name_caps, lastname, 
lower(lastname) AS name_small
    FROM employees;
""", conn).head()

Unnamed: 0,firstName,name_caps,lastName,name_small
0,Diane,DIANE,Murphy,murphy
1,Mary,MARY,Patterson,patterson
2,Jeff,JEFF,Firrelli,firrelli
3,William,WILLIAM,Patterson,patterson
4,Gerard,GERARD,Bondur,bondur


`substr` - to find a substring of a string

Eg - Picking the initial

Syntax(substr(firstname, 1, 1) AS name_intial)

**Breakdown**
* Column to be manipulated `firstname`
* 1 - Start at position 1 (SQL uses 1-based indexing)
* 1 - Returns 1 character

In [72]:
pd.read_sql("""
SELECT firstname, substr(firstname, 1, 1) AS name_intial
    FROM employees;
""", conn).head()

Unnamed: 0,firstName,name_intial
0,Diane,D
1,Mary,M
2,Jeff,J
3,William,W
4,Gerard,G


We also have concatenation in SQL using `||`

In [73]:
pd.read_sql("""
SELECT firstname, lastname,
firstname || " " || lastname AS names,
substr(firstname, 1, 1) || "." || substr(lastname, 1, 1) AS name_initials
    FROM employees;
""", conn)

Unnamed: 0,firstName,lastName,names,name_initials
0,Diane,Murphy,Diane Murphy,D.M
1,Mary,Patterson,Mary Patterson,M.P
2,Jeff,Firrelli,Jeff Firrelli,J.F
3,William,Patterson,William Patterson,W.P
4,Gerard,Bondur,Gerard Bondur,G.B
5,Anthony,Bow,Anthony Bow,A.B
6,Leslie,Jennings,Leslie Jennings,L.J
7,Leslie,Thompson,Leslie Thompson,L.T
8,Julie,Firrelli,Julie Firrelli,J.F
9,Steve,Patterson,Steve Patterson,S.P


## Built-in SQL Functions for Math Ops

Lets use the `orderDetails` table.

In [84]:
pd.read_sql("""
SELECT * FROM orderDetails;
""", conn)

Unnamed: 0,orderNumber,productCode,quantityOrdered,priceEach,orderLineNumber
0,10100,S18_1749,30,136.00,3
1,10100,S18_2248,50,55.09,2
2,10100,S18_4409,22,75.46,4
3,10100,S24_3969,49,35.29,1
4,10101,S18_2325,25,108.06,4
...,...,...,...,...,...
2991,10425,S24_2300,49,127.79,9
2992,10425,S24_2840,31,31.82,5
2993,10425,S32_1268,41,83.79,11
2994,10425,S32_2509,11,50.32,6


`round` - rounding off

In [85]:
pd.read_sql("""
SELECT priceEach, round(priceEach) AS price_rounded
    FROM orderDetails;
""", conn)

Unnamed: 0,priceEach,price_rounded
0,136.00,136.0
1,55.09,55.0
2,75.46,75.0
3,35.29,35.0
4,108.06,108.0
...,...,...
2991,127.79,128.0
2992,31.82,32.0
2993,83.79,84.0
2994,50.32,50.0


`CAST` - used to set the datatype of the created column. In this case, we set to integer

In [86]:
pd.read_sql("""
SELECT priceEach, CAST(round(priceEach) AS INTEGER) AS price_rounded
    FROM orderDetails;
""", conn)

Unnamed: 0,priceEach,price_rounded
0,136.00,136
1,55.09,55
2,75.46,75
3,35.29,35
4,108.06,108
...,...,...
2991,127.79,128
2992,31.82,32
2993,83.79,84
2994,50.32,50


In [87]:
# Using basic math ops
pd.read_sql("""
SELECT priceEach, quantityOrdered, priceEach * quantityOrdered AS total_price
    FROM orderDetails;
""", conn)

Unnamed: 0,priceEach,quantityOrdered,total_price
0,136.00,30,4080.00
1,55.09,50,2754.50
2,75.46,22,1660.12
3,35.29,49,1729.21
4,108.06,25,2701.50
...,...,...,...
2991,127.79,49,6261.71
2992,31.82,31,986.42
2993,83.79,41,3435.39
2994,50.32,11,553.52


## Built-in SQL Functions for Date-Time Ops

Using the `orders` table

In [88]:
pd.read_sql("""
SELECT * FROM orders;
""", conn)

Unnamed: 0,orderNumber,orderDate,requiredDate,shippedDate,status,comments,customerNumber
0,10100,2003-01-06,2003-01-13,2003-01-10,Shipped,,363
1,10101,2003-01-09,2003-01-18,2003-01-11,Shipped,Check on availability.,128
2,10102,2003-01-10,2003-01-18,2003-01-14,Shipped,,181
3,10103,2003-01-29,2003-02-07,2003-02-02,Shipped,,121
4,10104,2003-01-31,2003-02-09,2003-02-01,Shipped,,141
...,...,...,...,...,...,...,...
321,10421,2005-05-29,2005-06-06,,In Process,Custom shipping instructions were sent to ware...,124
322,10422,2005-05-30,2005-06-11,,In Process,,157
323,10423,2005-05-30,2005-06-05,,In Process,,314
324,10424,2005-05-31,2005-06-08,,In Process,,141


`julianday` - To calaculate the difference between dates

In [89]:
pd.read_sql("""
SELECT requiredDate, orderDate,
julianday(requiredDate) - julianday(orderDate) AS date_diff
    FROM orders;
""", conn)

Unnamed: 0,requiredDate,orderDate,date_diff
0,2003-01-13,2003-01-06,7.0
1,2003-01-18,2003-01-09,9.0
2,2003-01-18,2003-01-10,8.0
3,2003-02-07,2003-01-29,9.0
4,2003-02-09,2003-01-31,9.0
...,...,...,...
321,2005-06-06,2005-05-29,8.0
322,2005-06-11,2005-05-30,12.0
323,2005-06-05,2005-05-30,6.0
324,2005-06-08,2005-05-31,8.0


To figure out the date after 7 days, we do something like this:

In [92]:
pd.read_sql("""
SELECT orderDate, date(orderDate, "+7 days") AS one_week_later
    FROM orders;
""", conn)

Unnamed: 0,orderDate,one_week_later
0,2003-01-06,2003-01-13
1,2003-01-09,2003-01-16
2,2003-01-10,2003-01-17
3,2003-01-29,2003-02-05
4,2003-01-31,2003-02-07
...,...,...
321,2005-05-29,2005-06-05
322,2005-05-30,2005-06-06
323,2005-05-30,2005-06-06
324,2005-05-31,2005-06-07


`strftime` - can be used to split a date into day, month, year

In [99]:
pd.read_sql("""
SELECT orderDate,
        strftime("%d", orderDate) AS day,
        strftime("%m", orderDate) AS month,
        strftime("%Y", orderDate) AS year
    FROM orders;
""", conn)

Unnamed: 0,orderDate,day,month,year
0,2003-01-06,06,01,2003
1,2003-01-09,09,01,2003
2,2003-01-10,10,01,2003
3,2003-01-29,29,01,2003
4,2003-01-31,31,01,2003
...,...,...,...,...
321,2005-05-29,29,05,2005
322,2005-05-30,30,05,2005
323,2005-05-30,30,05,2005
324,2005-05-31,31,05,2005


In [100]:
# Once done, close the database connection
conn.close()