### Objectives
You will be able to:

- Retrieve a subset of columns from a table
- Create an alias in a SQL query
- Use SQL CASE statements to transform selected columns
- Use built-in SQL functions to transform selected columns

### The Data
we connect to a SQLite database using the Python sqlite2 library

In [1]:
!ls


Database Admin 101 - Lab.ipynb
Database Admin 101.ipynb
Filtering Data with SQL - Lab.ipynb
Filtering, Ordering, and Limiting Data with SQL - Lab.ipynb
Getting Started with SQL - Recap.ipynb
Grouping Data with SQL - Lab.ipynb
Ordering and Limiting Data with SQL.ipynb
SQL Database Data Types.ipynb
connect to SQL Databases.ipynb
connect to SQL Databases2.ipynb
contact_list.pickle
data
filtering.ipynb
grouping data with SQL.ipynb
pets_database.db
school.sqlite
selectData.ipynb
sql_queries.ipynb


In [4]:
import sqlite3
conn = sqlite3.connect('data/data.sqlite')

If we want to get all information about the employee records, we might do something like this (* means all columns):

In [5]:
import pandas as pd
pd.read_sql(''' select * from employees;''',conn)

Unnamed: 0,employeeNumber,lastName,firstName,extension,email,officeCode,reportsTo,jobTitle
0,1002,Murphy,Diane,x5800,dmurphy@classicmodelcars.com,1,,President
1,1056,Patterson,Mary,x4611,mpatterso@classicmodelcars.com,1,1002.0,VP Sales
2,1076,Firrelli,Jeff,x9273,jfirrelli@classicmodelcars.com,1,1002.0,VP Marketing
3,1088,Patterson,William,x4871,wpatterson@classicmodelcars.com,6,1056.0,Sales Manager (APAC)
4,1102,Bondur,Gerard,x5408,gbondur@classicmodelcars.com,4,1056.0,Sale Manager (EMEA)
5,1143,Bow,Anthony,x5428,abow@classicmodelcars.com,1,1056.0,Sales Manager (NA)
6,1165,Jennings,Leslie,x3291,ljennings@classicmodelcars.com,1,1143.0,Sales Rep
7,1166,Thompson,Leslie,x4065,lthompson@classicmodelcars.com,1,1143.0,Sales Rep
8,1188,Firrelli,Julie,x2173,jfirrelli@classicmodelcars.com,2,1143.0,Sales Rep
9,1216,Patterson,Steve,x4334,spatterson@classicmodelcars.com,2,1143.0,Sales Rep


### Readability

In [23]:
employees = pd.read_sql('''
select * 
from employees
''',conn)
employees

Unnamed: 0,employeeNumber,lastName,firstName,extension,email,officeCode,reportsTo,jobTitle
0,1002,Murphy,Diane,x5800,dmurphy@classicmodelcars.com,1,,President
1,1056,Patterson,Mary,x4611,mpatterso@classicmodelcars.com,1,1002.0,VP Sales
2,1076,Firrelli,Jeff,x9273,jfirrelli@classicmodelcars.com,1,1002.0,VP Marketing
3,1088,Patterson,William,x4871,wpatterson@classicmodelcars.com,6,1056.0,Sales Manager (APAC)
4,1102,Bondur,Gerard,x5408,gbondur@classicmodelcars.com,4,1056.0,Sale Manager (EMEA)
5,1143,Bow,Anthony,x5428,abow@classicmodelcars.com,1,1056.0,Sales Manager (NA)
6,1165,Jennings,Leslie,x3291,ljennings@classicmodelcars.com,1,1143.0,Sales Rep
7,1166,Thompson,Leslie,x4065,lthompson@classicmodelcars.com,1,1143.0,Sales Rep
8,1188,Firrelli,Julie,x2173,jfirrelli@classicmodelcars.com,2,1143.0,Sales Rep
9,1216,Patterson,Steve,x4334,spatterson@classicmodelcars.com,2,1143.0,Sales Rep


In [7]:
q='''SELECT * FROM EMPLOYEES'''
pd.read_sql(q,conn)

Unnamed: 0,employeeNumber,lastName,firstName,extension,email,officeCode,reportsTo,jobTitle
0,1002,Murphy,Diane,x5800,dmurphy@classicmodelcars.com,1,,President
1,1056,Patterson,Mary,x4611,mpatterso@classicmodelcars.com,1,1002.0,VP Sales
2,1076,Firrelli,Jeff,x9273,jfirrelli@classicmodelcars.com,1,1002.0,VP Marketing
3,1088,Patterson,William,x4871,wpatterson@classicmodelcars.com,6,1056.0,Sales Manager (APAC)
4,1102,Bondur,Gerard,x5408,gbondur@classicmodelcars.com,4,1056.0,Sale Manager (EMEA)
5,1143,Bow,Anthony,x5428,abow@classicmodelcars.com,1,1056.0,Sales Manager (NA)
6,1165,Jennings,Leslie,x3291,ljennings@classicmodelcars.com,1,1143.0,Sales Rep
7,1166,Thompson,Leslie,x4065,lthompson@classicmodelcars.com,1,1143.0,Sales Rep
8,1188,Firrelli,Julie,x2173,jfirrelli@classicmodelcars.com,2,1143.0,Sales Rep
9,1216,Patterson,Steve,x4334,spatterson@classicmodelcars.com,2,1143.0,Sales Rep


In [5]:
cur = conn.cursor()
cur.execute("""SELECT name FROM sqlite_master WHERE type = 'table';""")
table_names = cur.fetchall()
table_names

[('orderdetails',),
 ('payments',),
 ('offices',),
 ('customers',),
 ('orders',),
 ('productlines',),
 ('products',),
 ('employees',)]

### Retrieving a Subset of Columns

In [10]:
pd.read_sql('''
      select lastName,FirstName from employees      
            ''',conn).head()

Unnamed: 0,lastName,firstName
0,Murphy,Diane
1,Patterson,Mary
2,Firrelli,Jeff
3,Patterson,William
4,Bondur,Gerard


Reorder columns other than the way they appear in the database

In [13]:
pd.read_sql('''
          select FirstName,LastName from employees  
            ''',conn).head()

Unnamed: 0,firstName,lastName
0,Diane,Murphy
1,Mary,Patterson
2,Jeff,Firrelli
3,William,Patterson
4,Gerard,Bondur


### Use Aliases (AS Keyword) to change the column names in our query result:

As Keyword is technincally optional when assigning alias in SQL. In other words, you could just say SELECT firstName name and it would work the same as SELECT firstName AS name.However, we recommend being more explicit and including the AS,so that its clearer what your code is doing

In [14]:
pd.read_sql(''' 
select FirstName as Name from employees
''',conn).head()

Unnamed: 0,Name
0,Diane
1,Mary
2,Jeff
3,William
4,Gerard


### Using SQL CASE STATEMENTS
They are a type of conditional statement, similar to if statements in Python.Whereas Python uses the keyword `if, elif, and else`, SQL uses `CASE, WHEN, THEN,ELSE AND END`

### CASE to Bin Column Values
most common cases for CASE statemnets is to bin the column values.For both numeric and categorical columns

In the example below, we use the `JobTitle` field to bin all employees into ROLE categories bases on whther or not their jo title is "Sales Rep"

In [20]:
pd.read_sql(''' 
select Firstname,LastName,jobTitle,
            CASE
            WHEN jobTitle = 'Sales Rep' then 'Sales Rep'
            ELSE 'Not Sales Rep'
            END AS role
From employees
''',conn)

Unnamed: 0,firstName,lastName,jobTitle,role
0,Diane,Murphy,President,Not Sales Rep
1,Mary,Patterson,VP Sales,Not Sales Rep
2,Jeff,Firrelli,VP Marketing,Not Sales Rep
3,William,Patterson,Sales Manager (APAC),Not Sales Rep
4,Gerard,Bondur,Sale Manager (EMEA),Not Sales Rep
5,Anthony,Bow,Sales Manager (NA),Not Sales Rep
6,Leslie,Jennings,Sales Rep,Sales Rep
7,Leslie,Thompson,Sales Rep,Sales Rep
8,Julie,Firrelli,Sales Rep,Sales Rep
9,Steve,Patterson,Sales Rep,Sales Rep


### CASE TO MAKE VALUES HUMAN-READABLE
Another typical way to use CASE is to translate the column values into something that your eventual audience will understand. This is especially true of data that is entered into the database as a "code" or "ID" rather than a human-readable name.

In the example below, we use CASE statement with multiple WHEN in order to transform the "OfficeCode" column into office Column that uses a more meaning name for the office

In [37]:
pd.read_sql('''
select FirstName,lastName,officeCode,
            CASE
            WHEN officeCode = '1' then 'San Francisco, CA'
            WHEN officeCode = '2' then 'Boston, MA'
            WHEN officeCode = '3' then 'New York, NY'
            WHEN officeCode = '4' then 'Paris, France '
            WHEN officeCode =  '5' then  'Tokyo, Japan'
            END as Office
from employees
''',conn).head(10)

Unnamed: 0,firstName,lastName,officeCode,Office
0,Diane,Murphy,1,"San Francisco, CA"
1,Mary,Patterson,1,"San Francisco, CA"
2,Jeff,Firrelli,1,"San Francisco, CA"
3,William,Patterson,6,
4,Gerard,Bondur,4,"Paris, France"
5,Anthony,Bow,1,"San Francisco, CA"
6,Leslie,Jennings,1,"San Francisco, CA"
7,Leslie,Thompson,1,"San Francisco, CA"
8,Julie,Firrelli,2,"Boston, MA"
9,Steve,Patterson,2,"Boston, MA"


Note that because we did not specify a name for officeCode "6", and did not include an ELSE, the associated office value for William Patterson is NULL (represented as None in Python).

There is also a shorter syntax possible if all of the WHENs are just checking if a value is equal to another value (e.g. in this case where we are repeating officeCode = over and over). Instead we can specify officeCode right after CASE, then only specify the potential matching values:

In [51]:
pd.read_sql('''
select FirstName,lastName,officeCode,
            CASE officeCode
            WHEN '1' then 'San Francisco, CA'
            WHEN '2' then 'Boston, MA'
            WHEN '3' then 'New York, NY'
            WHEN '4' then 'Paris, France '
            WHEN  '5' then  'Tokyo, Japan'
            END as office
from employees
            ''',conn).head(10)

Unnamed: 0,firstName,lastName,officeCode,office
0,Diane,Murphy,1,"San Francisco, CA"
1,Mary,Patterson,1,"San Francisco, CA"
2,Jeff,Firrelli,1,"San Francisco, CA"
3,William,Patterson,6,
4,Gerard,Bondur,4,"Paris, France"
5,Anthony,Bow,1,"San Francisco, CA"
6,Leslie,Jennings,1,"San Francisco, CA"
7,Leslie,Thompson,1,"San Francisco, CA"
8,Julie,Firrelli,2,"Boston, MA"
9,Steve,Patterson,2,"Boston, MA"


### Using Built-in SQL Functions
Similar to the Python built-in functions, SQL also has built-in functions. The available functions will differ somewhat by the type of SQL you are using, but in general you should be able to find functions for:
- String manipulation
- Math operations
- date and time operations

### Built-in SQL Functions for String Manipulation
#### `length`
Returns the no of characters.if we wanted to find the length of the firstname of all employees, that would look like this:

In [49]:
pd.read_sql('''
select length(firstName)  as name_length
    from employees          
''',conn).head(5)

Unnamed: 0,name_length
0,5
1,4
2,4
3,7
4,6


### `Upper`
return all employee names in all caps

In [52]:
pd.read_sql('''
select upper(firstName) as name_in_all_caps
        from employees;
''',conn).head(5)

Unnamed: 0,name_in_all_caps
0,DIANE
1,MARY
2,JEFF
3,WILLIAM
4,GERARD


### `Substr`
In python we do this with string slicing

In [8]:
pd.read_sql('''
select substr(firstname,1,1) as first_initial
from employees;
''',conn).head(5)

Unnamed: 0,first_initial
0,D
1,M
2,J
3,W
4,G


In [10]:
pd.read_sql('''
select substr(firstName,1,1) || '.' as first_initial     --use || as concatentate operator
    from employees
            ''',conn)

Unnamed: 0,first_initial
0,D.
1,M.
2,J.
3,W.
4,G.
5,A.
6,L.
7,L.
8,J.
9,S.


We can also combine multiple column values, not just string literals.

lets combine the first and last name:

In [80]:

pd.read_sql('''
select firstname || ' ' || lastname as name
from employees           
''',conn).head()

Unnamed: 0,name
0,Diane Murphy
1,Mary Patterson
2,Jeff Firrelli
3,William Patterson
4,Gerard Bondur


### Built-in SQL Functions for Math operations
For these examples, let's switch over to using the orderDetails table:

In [82]:
pd.read_sql('''
select * from orderDetails
''',conn)

Unnamed: 0,orderNumber,productCode,quantityOrdered,priceEach,orderLineNumber
0,10100,S18_1749,30,136.00,3
1,10100,S18_2248,50,55.09,2
2,10100,S18_4409,22,75.46,4
3,10100,S24_3969,49,35.29,1
4,10101,S18_2325,25,108.06,4
...,...,...,...,...,...
2991,10425,S24_2300,49,127.79,9
2992,10425,S24_2840,31,31.82,5
2993,10425,S32_1268,41,83.79,11
2994,10425,S32_2509,11,50.32,6


### `round`

In [86]:
pd.read_sql('''
select round(priceEach) as rounded_price
            from orderDetails
            ''',conn)

Unnamed: 0,rounded_price
0,136.0
1,55.0
2,75.0
3,35.0
4,108.0
...,...
2991,128.0
2992,32.0
2993,84.0
2994,50.0


### `CAST`
The previous result looks ok, but it's returning floating point numbers. What if we want integers instead?

In [91]:
pd.read_sql('''
select cast(round(priceEach) as INTEGER) as rounded_price_int
        from orderDetails
            ''',conn) 

Unnamed: 0,rounded_price_int
0,136
1,55
2,75
3,35
4,108
...,...
2991,128
2992,32
2993,84
2994,50


### Basic Math Operations
Just like when performing math operations with Python, you don't always need to use a function. Sometimes all you need is an operator like +, -, /, or *. For example, below we multiply the price times the quantity ordered to find the total price:

In [94]:
pd.read_sql('''
select priceEach * quantityOrdered as Total_Price
    from orderdetails
            ''',conn)

Unnamed: 0,Total_Price
0,4080.00
1,2754.50
2,1660.12
3,1729.21
4,2701.50
...,...
2991,6261.71
2992,986.42
2993,3435.39
2994,553.52


### Built-in SQL Functions for Date and Time Operations
For these examples, we'll look at yet another table within the database, this time the orders table:

In [95]:
pd.read_sql('''
select * from orders;
            ''',conn)

Unnamed: 0,orderNumber,orderDate,requiredDate,shippedDate,status,comments,customerNumber
0,10100,2003-01-06,2003-01-13,2003-01-10,Shipped,,363
1,10101,2003-01-09,2003-01-18,2003-01-11,Shipped,Check on availability.,128
2,10102,2003-01-10,2003-01-18,2003-01-14,Shipped,,181
3,10103,2003-01-29,2003-02-07,2003-02-02,Shipped,,121
4,10104,2003-01-31,2003-02-09,2003-02-01,Shipped,,141
...,...,...,...,...,...,...,...
321,10421,2005-05-29,2005-06-06,,In Process,Custom shipping instructions were sent to ware...,124
322,10422,2005-05-30,2005-06-11,,In Process,,157
323,10423,2005-05-30,2005-06-05,,In Process,,314
324,10424,2005-05-31,2005-06-08,,In Process,,141


What if we wanted to know how many days there are between the requiredDate and the orderDate for each order?

In [99]:
pd.read_sql('''
select julianday(requiredDate)- julianday(orderDate) AS days_from_order_to_required
    from orders;
            ''',conn)

Unnamed: 0,days_from_order_to_required
0,7.0
1,9.0
2,8.0
3,9.0
4,9.0
...,...
321,8.0
322,12.0
323,6.0
324,8.0


If we wanted to select the order dates as well as dates 1 week after the order dates,

In [19]:
pd.read_sql(''' 
select orderDate, date(orderDate,'+7 days') AS one_week_later
    from orders
            ''',conn)

Unnamed: 0,orderDate,one_week_later
0,2003-01-06,2003-01-13
1,2003-01-09,2003-01-16
2,2003-01-10,2003-01-17
3,2003-01-29,2003-02-05
4,2003-01-31,2003-02-07
...,...,...
321,2005-05-29,2005-06-05
322,2005-05-30,2005-06-06
323,2005-05-30,2005-06-06
324,2005-05-31,2005-06-07


You can also use the strftime function, which is very similar to the Python version. This is useful if you want to split apart a date or time value into different sub-parts. For example, here we extract the year, month, and day of month from the order date:

In [109]:
pd.read_sql(''' 
select orderDate,
            strftime('%m',orderDate) AS month,
            strftime('%Y',orderDate) AS year,
            strftime('%d',orderDate) as day
from orders;
''',conn)

Unnamed: 0,orderDate,month,year,day
0,2003-01-06,01,2003,06
1,2003-01-09,01,2003,09
2,2003-01-10,01,2003,10
3,2003-01-29,01,2003,29
4,2003-01-31,01,2003,31
...,...,...,...,...
321,2005-05-29,05,2005,29
322,2005-05-30,05,2005,30
323,2005-05-30,05,2005,30
324,2005-05-31,05,2005,31


### Now that we are finished with our queries, we can close the database connection.
 

In [110]:
conn.close()

### Summary
In this lesson, you saw how to execute several kinds of SQL SELECT queries. First, there were examples of specifying the selection of particular columns, rather than always using SELECT * to select all columns. Then you saw some examples of how to use CASE to transform column values using conditional logic. Finally, we walked through how to use built-in SQL functions, particularly for string, numeric, and date/time fields.