![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/py_logo.png)


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#SQL-Recap" data-toc-modified-id="SQL-Recap-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>SQL Recap</a></span><ul class="toc-item"><li><span><a href="#SELECT" data-toc-modified-id="SELECT-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>SELECT</a></span><ul class="toc-item"><li><span><a href="#Selecting-Multiple-Columns" data-toc-modified-id="Selecting-Multiple-Columns-4.1.1"><span class="toc-item-num">4.1.1&nbsp;&nbsp;</span>Selecting Multiple Columns</a></span></li><li><span><a href="#Selecting-Everything-from-table-with-*" data-toc-modified-id="Selecting-Everything-from-table-with-*-4.1.2"><span class="toc-item-num">4.1.2&nbsp;&nbsp;</span>Selecting Everything from table with *</a></span></li><li><span><a href="#Syntax-for-the-SQL-DISTINCT-Statement" data-toc-modified-id="Syntax-for-the-SQL-DISTINCT-Statement-4.1.3"><span class="toc-item-num">4.1.3&nbsp;&nbsp;</span>Syntax for the SQL DISTINCT Statement</a></span></li></ul></li><li><span><a href="#WHERE" data-toc-modified-id="WHERE-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>WHERE</a></span><ul class="toc-item"><li><span><a href="#Syntax-for-the-SQL-WHERE" data-toc-modified-id="Syntax-for-the-SQL-WHERE-4.2.1"><span class="toc-item-num">4.2.1&nbsp;&nbsp;</span>Syntax for the SQL WHERE</a></span></li><li><span><a href="#Syntax-for-AND" data-toc-modified-id="Syntax-for-AND-4.2.2"><span class="toc-item-num">4.2.2&nbsp;&nbsp;</span>Syntax for AND</a></span></li><li><span><a href="#Syntax-for-OR" data-toc-modified-id="Syntax-for-OR-4.2.3"><span class="toc-item-num">4.2.3&nbsp;&nbsp;</span>Syntax for OR</a></span></li><li><span><a href="#Wildcards" data-toc-modified-id="Wildcards-4.2.4"><span class="toc-item-num">4.2.4&nbsp;&nbsp;</span>Wildcards</a></span></li><li><span><a href="#IMPORTANT-NOTE!" data-toc-modified-id="IMPORTANT-NOTE!-4.2.5"><span class="toc-item-num">4.2.5&nbsp;&nbsp;</span>IMPORTANT NOTE!</a></span></li></ul></li><li><span><a href="#SQL-ORDER-BY" data-toc-modified-id="SQL-ORDER-BY-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>SQL ORDER BY</a></span></li><li><span><a href="#SQL-GROUP-BY" data-toc-modified-id="SQL-GROUP-BY-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>SQL GROUP BY</a></span></li></ul></li></ul></div>

# SQL Recap

To see how this and multiple other queries work, we'll connect to the database and make a function that automatically takes in our query and returns a DataFrame.

In [23]:
import pandas as pd
import sqlalchemy as db
engine = db.create_engine('sqlite:///sakila.db')
connection = engine.connect()

## SELECT

### Selecting Multiple Columns

In [24]:
# Select multiple columns example
query = ''' SELECT first_name,last_name
            FROM customer; '''

# Grab from first two columns
pd.read_sql_query(query, connection).head()

Unnamed: 0,first_name,last_name
0,MARY,SMITH
1,PATRICIA,JOHNSON
2,LINDA,WILLIAMS
3,BARBARA,JONES
4,ELIZABETH,BROWN


### Selecting Everything from table with *

In [25]:
# Select all columns example
query = ''' SELECT *
            FROM customer; '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update
0,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
1,2,1,PATRICIA,JOHNSON,PATRICIA.JOHNSON@sakilacustomer.org,6,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
2,3,1,LINDA,WILLIAMS,LINDA.WILLIAMS@sakilacustomer.org,7,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
3,4,2,BARBARA,JONES,BARBARA.JONES@sakilacustomer.org,8,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
4,5,1,ELIZABETH,BROWN,ELIZABETH.BROWN@sakilacustomer.org,9,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28


### Syntax for the SQL DISTINCT Statement

In a table, a column may contain duplicate values; and sometimes you only want to list the distinct (unique) values. The DISTINCT keyword can be used to return only distinct (unique) values.

SELECT DISTINCT column_name <br/>
FROM table_name;

In [26]:
# Select distinct country_ids from the city table.
query = ''' SELECT DISTINCT(country_id)
            FROM city'''

pd.read_sql_query(query, connection).head()

Unnamed: 0,country_id
0,1
1,2
2,3
3,4
4,5


## WHERE

### Syntax for the SQL WHERE 

The WHERE clause is used to filter records, the WHERE clause is used to extract only the records that fulfill the specific parameter.

SELECT column_name <br/>
FROM table_name <br/>
WHERE column_name ( math operator) desired_value;<br/>

In [27]:
# Select all customer info from the 1st store.
query = ''' SELECT *
            FROM customer
            WHERE store_id = 1'''

pd.read_sql_query(query, connection).head()

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update
0,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
1,2,1,PATRICIA,JOHNSON,PATRICIA.JOHNSON@sakilacustomer.org,6,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
2,3,1,LINDA,WILLIAMS,LINDA.WILLIAMS@sakilacustomer.org,7,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
3,5,1,ELIZABETH,BROWN,ELIZABETH.BROWN@sakilacustomer.org,9,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
4,7,1,MARIA,MILLER,MARIA.MILLER@sakilacustomer.org,11,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28


Note, there are a variety of logical operators you can use for a SQL request.

<table>
<tr>
<th>Operator</th>
<th>Description</th>
</tr>
<tr>
<td>%</td>
<td> Equal</td>
</tr>
<tr>
<td><></td>
<td>Not equal. Note: In some versions of SQL this operator may be written  !=</td>
</tr>
<tr>
<td>></td>
<td> Greater than</td>
</tr>
<tr>
<td><</td>
<td> Less than
</td>
</tr>
<tr>
<td>>=</td>
<td> Greater than or equal</td>
</tr>
<tr>
<td><=</td>
<td> Less than or equal</td>
</tr>
</table>




SQL requires single quotes around text values, while numeric fields are not enclosed in quotes, for example a text value for the above where statement: 

In [28]:
# Select all customer info from Mary.
query = ''' SELECT *
            FROM customer
            WHERE first_name = 'MARY'  '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update
0,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28


### Syntax for AND

The AND operator is used to filter records based on more than one condition.

The AND operator displays a record if both the first condition AND the second condition are true.


In [29]:
# Select all films from 2006 that are rated R.

query = ''' SELECT *
            FROM film
            WHERE release_year = 2006
            AND rating = 'R' '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,film_id,title,description,release_year,language_id,original_language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features,last_update
0,8,AIRPORT POLLOCK,A Epic Tale of a Moose And a Girl who must Con...,2006,1,,6,4.99,54,15.99,R,Trailers,2011-09-14 18:05:33
1,17,ALONE TRIP,A Fast-Paced Character Study of a Composer And...,2006,1,,3,0.99,82,14.99,R,"Trailers,Behind the Scenes",2011-09-14 18:05:33
2,20,AMELIE HELLFIGHTERS,A Boring Drama of a Woman And a Squirrel who m...,2006,1,,4,4.99,79,23.99,R,"Commentaries,Deleted Scenes,Behind the Scenes",2011-09-14 18:05:33
3,21,AMERICAN CIRCUS,A Insightful Drama of a Girl And a Astronaut w...,2006,1,,3,4.99,129,17.99,R,"Commentaries,Behind the Scenes",2011-09-14 18:05:33
4,23,ANACONDA CONFESSIONS,A Lacklusture Display of a Dentist And a Denti...,2006,1,,3,0.99,92,9.99,R,"Trailers,Deleted Scenes",2011-09-14 18:05:33


### Syntax for OR


The OR operator displays a record if either the first condition OR the second condition is true.

In [30]:
# Select all films from R or PG.

query = ''' SELECT *
            FROM film
            WHERE rating = 'PG'
            OR rating = 'R' '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,film_id,title,description,release_year,language_id,original_language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features,last_update
0,1,ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist...,2006,1,,6,0.99,86,20.99,PG,"Deleted Scenes,Behind the Scenes",2011-09-14 18:05:32
1,6,AGENT TRUMAN,A Intrepid Panorama of a Robot And a Boy who m...,2006,1,,3,2.99,169,17.99,PG,Deleted Scenes,2011-09-14 18:05:33
2,8,AIRPORT POLLOCK,A Epic Tale of a Moose And a Girl who must Con...,2006,1,,6,4.99,54,15.99,R,Trailers,2011-09-14 18:05:33
3,12,ALASKA PHANTOM,A Fanciful Saga of a Hunter And a Pastry Chef ...,2006,1,,6,0.99,136,22.99,PG,"Commentaries,Deleted Scenes",2011-09-14 18:05:33
4,13,ALI FOREVER,A Action-Packed Drama of a Dentist And a Croco...,2006,1,,4,4.99,150,21.99,PG,"Deleted Scenes,Behind the Scenes",2011-09-14 18:05:33


Before we check Wildcards, ORDER BY, and GROUP BY. Let's take a look at aggregate functions.

* AVG() - Returns the average value.
* COUNT() - Returns the number of rows.
* FIRST() - Returns the first value.
* LAST() - Returns the last value.
* MAX() - Returns the largest value.
* MIN() - Returns the smallest value.
* SUM() - Returns the sum.

You can call any of these aggregate functions on a column to get the resulting values back. For example:

In [31]:
# Count the number of customers
query = ''' SELECT COUNT(customer_id)
            FROM customer; '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,COUNT(customer_id)
0,599


Go ahead and experiment with the other aggregate functions. The usual syntax is:

``` java
SELECT column_name, aggregate_function(column_name) 
FROM table_name 
WHERE column_name
```



### Wildcards

A wildcard character can be used to substitute for any other characters in a string. In SQL, wildcard characters are used with the SQL LIKE operator. The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

There are several wildcard operators:

<table>
<tr>
<th>Wildcard</th>
<th>Description</th>
</tr>
<tr>
<td>%</td>
<td>A substitute for zero or more characters</td>
</tr>
<tr>
<td>_</td>
<td>A substitute for a single character</td>
</tr>
<tr>
<td>[character_list]</td>
<td>Sets and ranges of characters to match</td>
</tr>
</table>

Let's see them in action now!

In [95]:
# First the % wildcard

# Select any customers whose name start with an M
query = ''' SELECT *
            FROM customer
            WHERE first_name LIKE 'M%' ; '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update
0,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
1,7,1,MARIA,MILLER,MARIA.MILLER@sakilacustomer.org,11,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
2,9,2,MARGARET,MOORE,MARGARET.MOORE@sakilacustomer.org,13,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
3,21,1,MICHELLE,CLARK,MICHELLE.CLARK@sakilacustomer.org,25,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
4,30,1,MELISSA,KING,MELISSA.KING@sakilacustomer.org,34,1,2006-02-14 22:04:36.000,2011-09-14 18:10:29


In [32]:
# Next the _ wildcard

# Select any customers whose last name ends with ing
query = ''' SELECT *
            FROM customer
            WHERE last_name LIKE '_ING' ; '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update
0,30,1,MELISSA,KING,MELISSA.KING@sakilacustomer.org,34,1,2006-02-14 22:04:36.000,2011-09-14 18:10:29



### IMPORTANT NOTE!
Using [charlist] with SQLite is a little different than with other SQL formats, such as MySQL. 

In MySQL you would use: 

WHERE value LIKE '[charlist]%'

In SQLite you use: 

WHERE value GLOB '[charlist]*'

In [33]:
# Finally the [character_list] wildcard

# Select any customers whose first name begins with an A or a B
query = ''' SELECT *
            FROM customer
            WHERE first_name GLOB '[AB]*' ; '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update
0,4,2,BARBARA,JONES,BARBARA.JONES@sakilacustomer.org,8,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
1,14,2,BETTY,WHITE,BETTY.WHITE@sakilacustomer.org,18,1,2006-02-14 22:04:36.000,2011-09-14 18:10:28
2,29,2,ANGELA,HERNANDEZ,ANGELA.HERNANDEZ@sakilacustomer.org,33,1,2006-02-14 22:04:36.000,2011-09-14 18:10:29
3,31,2,BRENDA,WRIGHT,BRENDA.WRIGHT@sakilacustomer.org,35,1,2006-02-14 22:04:36.000,2011-09-14 18:10:29
4,32,1,AMY,LOPEZ,AMY.LOPEZ@sakilacustomer.org,36,1,2006-02-14 22:04:36.000,2011-09-14 18:10:29


## SQL ORDER BY 

The ORDER BY keyword is used to sort the result-set by one or more columns. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword. The syntax is:

```sql
SELECT column_name 
FROM table_name
ORDER BY column_name ASC|DESC
````

Let's see it in action:

In [98]:
# Select all customers and order results by last name
query = ''' SELECT *
            FROM customer
            ORDER BY last_name ; '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update
0,505,1,RAFAEL,ABNEY,RAFAEL.ABNEY@sakilacustomer.org,510,1,2006-02-14 22:04:37.000,2011-09-14 18:10:42
1,504,1,NATHANIEL,ADAM,NATHANIEL.ADAM@sakilacustomer.org,509,1,2006-02-14 22:04:37.000,2011-09-14 18:10:42
2,36,2,KATHLEEN,ADAMS,KATHLEEN.ADAMS@sakilacustomer.org,40,1,2006-02-14 22:04:36.000,2011-09-14 18:10:29
3,96,1,DIANA,ALEXANDER,DIANA.ALEXANDER@sakilacustomer.org,100,1,2006-02-14 22:04:36.000,2011-09-14 18:10:30
4,470,1,GORDON,ALLARD,GORDON.ALLARD@sakilacustomer.org,475,1,2006-02-14 22:04:37.000,2011-09-14 18:10:41


In [99]:
# Select all customers and order results by last name, DESCENDING
query = ''' SELECT *
            FROM customer
            ORDER BY last_name DESC; '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update
0,28,1,CYNTHIA,YOUNG,CYNTHIA.YOUNG@sakilacustomer.org,32,1,2006-02-14 22:04:36.000,2011-09-14 18:10:29
1,413,2,MARVIN,YEE,MARVIN.YEE@sakilacustomer.org,418,1,2006-02-14 22:04:37.000,2011-09-14 18:10:40
2,402,1,LUIS,YANEZ,LUIS.YANEZ@sakilacustomer.org,407,1,2006-02-14 22:04:37.000,2011-09-14 18:10:39
3,318,1,BRIAN,WYMAN,BRIAN.WYMAN@sakilacustomer.org,323,1,2006-02-14 22:04:37.000,2011-09-14 18:10:37
4,31,2,BRENDA,WRIGHT,BRENDA.WRIGHT@sakilacustomer.org,35,1,2006-02-14 22:04:36.000,2011-09-14 18:10:29


## SQL GROUP BY 

The GROUP BY statement is used with the aggregate functions to group the results by one or more columns. The syntax is:

SELECT column_name, aggregate_function(column_name) <br/>
FROM table_name <br/>
WHERE column_name operator value <br/>
GROUP BY column_name; 

Let's see how it works.

In [34]:
# Count the number of customers per store

query = ''' SELECT store_id , COUNT(customer_id)
            FROM customer
            GROUP BY store_id; '''

pd.read_sql_query(query, connection).head()

Unnamed: 0,store_id,COUNT(customer_id)
0,1,326
1,2,273


In [None]:
![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/py_logo.png)


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#SQL-Recap" data-toc-modified-id="SQL-Recap-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>SQL Recap</a></span><ul class="toc-item"><li><span><a href="#SELECT" data-toc-modified-id="SELECT-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>SELECT</a></span><ul class="toc-item"><li><span><a href="#Selecting-Multiple-Columns" data-toc-modified-id="Selecting-Multiple-Columns-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Selecting Multiple Columns</a></span></li><li><span><a href="#Selecting-Everything-from-table-with-*" data-toc-modified-id="Selecting-Everything-from-table-with-*-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Selecting Everything from table with *</a></span></li><li><span><a href="#Syntax-for-the-SQL-DISTINCT-Statement" data-toc-modified-id="Syntax-for-the-SQL-DISTINCT-Statement-1.1.3"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Syntax for the SQL DISTINCT Statement</a></span></li></ul></li><li><span><a href="#WHERE" data-toc-modified-id="WHERE-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>WHERE</a></span><ul class="toc-item"><li><span><a href="#Syntax-for-the-SQL-WHERE" data-toc-modified-id="Syntax-for-the-SQL-WHERE-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Syntax for the SQL WHERE</a></span></li><li><span><a href="#Syntax-for-AND" data-toc-modified-id="Syntax-for-AND-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Syntax for AND</a></span></li><li><span><a href="#Syntax-for-OR" data-toc-modified-id="Syntax-for-OR-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Syntax for OR</a></span></li><li><span><a href="#Wildcards" data-toc-modified-id="Wildcards-1.2.4"><span class="toc-item-num">1.2.4&nbsp;&nbsp;</span>Wildcards</a></span></li><li><span><a href="#IMPORTANT-NOTE!" data-toc-modified-id="IMPORTANT-NOTE!-1.2.5"><span class="toc-item-num">1.2.5&nbsp;&nbsp;</span>IMPORTANT NOTE!</a></span></li></ul></li><li><span><a href="#SQL-ORDER-BY" data-toc-modified-id="SQL-ORDER-BY-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>SQL ORDER BY</a></span></li><li><span><a href="#SQL-GROUP-BY" data-toc-modified-id="SQL-GROUP-BY-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>SQL GROUP BY</a></span></li></ul></li><li><span><a href="#SQL,-ETL,-and-SQLAlchemy-in-Python" data-toc-modified-id="SQL,-ETL,-and-SQLAlchemy-in-Python-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>SQL, ETL, and SQLAlchemy in Python</a></span><ul class="toc-item"><li><span><a href="#Extract" data-toc-modified-id="Extract-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Extract</a></span></li><li><span><a href="#Transform" data-toc-modified-id="Transform-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Transform</a></span><ul class="toc-item"><li><span><a href="#Bringing-all-together" data-toc-modified-id="Bringing-all-together-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Bringing all together</a></span></li></ul></li><li><span><a href="#Load" data-toc-modified-id="Load-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Load</a></span></li></ul></li><li><span><a href="#Next-Steps" data-toc-modified-id="Next-Steps-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Next Steps</a></span></li></ul></div>

# SQL Recap

To see how this and multiple other queries work, we'll connect to the database and make a function that automatically takes in our query and returns a DataFrame.

import pandas as pd
import sqlalchemy as db
engine = db.create_engine('sqlite:///sakila.db')
connection = engine.connect()

## SELECT

### Selecting Multiple Columns

# Select multiple columns example
query = ''' SELECT first_name,last_name
            FROM customer; '''

# Grab from first two columns
pd.read_sql_query(query, connection).head()

### Selecting Everything from table with *

# Select all columns example
query = ''' SELECT *
            FROM customer; '''

pd.read_sql_query(query, connection).head()

### Syntax for the SQL DISTINCT Statement

In a table, a column may contain duplicate values; and sometimes you only want to list the distinct (unique) values. The DISTINCT keyword can be used to return only distinct (unique) values.

SELECT DISTINCT column_name <br/>
FROM table_name;

# Select distinct country_ids from the city table.
query = ''' SELECT DISTINCT(country_id)
            FROM city'''

pd.read_sql_query(query, connection).head()

## WHERE

### Syntax for the SQL WHERE 

The WHERE clause is used to filter records, the WHERE clause is used to extract only the records that fulfill the specific parameter.

SELECT column_name <br/>
FROM table_name <br/>
WHERE column_name ( math operator) desired_value;<br/>

# Select all customer info from the 1st store.
query = ''' SELECT *
            FROM customer
            WHERE store_id = 1'''

pd.read_sql_query(query, connection).head()

Note, there are a variety of logical operators you can use for a SQL request.

<table>
<tr>
<th>Operator</th>
<th>Description</th>
</tr>
<tr>
<td>%</td>
<td> Equal</td>
</tr>
<tr>
<td><></td>
<td>Not equal. Note: In some versions of SQL this operator may be written  !=</td>
</tr>
<tr>
<td>></td>
<td> Greater than</td>
</tr>
<tr>
<td><</td>
<td> Less than
</td>
</tr>
<tr>
<td>>=</td>
<td> Greater than or equal</td>
</tr>
<tr>
<td><=</td>
<td> Less than or equal</td>
</tr>
</table>




SQL requires single quotes around text values, while numeric fields are not enclosed in quotes, for example a text value for the above where statement: 

# Select all customer info from Mary.
query = ''' SELECT *
            FROM customer
            WHERE first_name = 'MARY'  '''

pd.read_sql_query(query, connection).head()

### Syntax for AND

The AND operator is used to filter records based on more than one condition.

The AND operator displays a record if both the first condition AND the second condition are true.


# Select all films from 2006 that are rated R.

query = ''' SELECT *
            FROM film
            WHERE release_year = 2006
            AND rating = 'R' '''

pd.read_sql_query(query, connection).head()

### Syntax for OR


The OR operator displays a record if either the first condition OR the second condition is true.

# Select all films from R or PG.

query = ''' SELECT *
            FROM film
            WHERE rating = 'PG'
            OR rating = 'R' '''

pd.read_sql_query(query, connection).head()

Before we check Wildcards, ORDER BY, and GROUP BY. Let's take a look at aggregate functions.

* AVG() - Returns the average value.
* COUNT() - Returns the number of rows.
* FIRST() - Returns the first value.
* LAST() - Returns the last value.
* MAX() - Returns the largest value.
* MIN() - Returns the smallest value.
* SUM() - Returns the sum.

You can call any of these aggregate functions on a column to get the resulting values back. For example:

# Count the number of customers
query = ''' SELECT COUNT(customer_id)
            FROM customer; '''

pd.read_sql_query(query, connection).head()

Go ahead and experiment with the other aggregate functions. The usual syntax is:

``` java
SELECT column_name, aggregate_function(column_name) 
FROM table_name 
WHERE column_name
```



### Wildcards

A wildcard character can be used to substitute for any other characters in a string. In SQL, wildcard characters are used with the SQL LIKE operator. The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

There are several wildcard operators:

<table>
<tr>
<th>Wildcard</th>
<th>Description</th>
</tr>
<tr>
<td>%</td>
<td>A substitute for zero or more characters</td>
</tr>
<tr>
<td>_</td>
<td>A substitute for a single character</td>
</tr>
<tr>
<td>[character_list]</td>
<td>Sets and ranges of characters to match</td>
</tr>
</table>

Let's see them in action now!

# First the % wildcard

# Select any customers whose name start with an M
query = ''' SELECT *
            FROM customer
            WHERE first_name LIKE 'M%' ; '''

pd.read_sql_query(query, connection).head()

# Next the _ wildcard

# Select any customers whose last name ends with ing
query = ''' SELECT *
            FROM customer
            WHERE last_name LIKE '_ING' ; '''

pd.read_sql_query(query, connection).head()


### IMPORTANT NOTE!
Using [charlist] with SQLite is a little different than with other SQL formats, such as MySQL. 

In MySQL you would use: 

WHERE value LIKE '[charlist]%'

In SQLite you use: 

WHERE value GLOB '[charlist]*'

# Finally the [character_list] wildcard

# Select any customers whose first name begins with an A or a B
query = ''' SELECT *
            FROM customer
            WHERE first_name GLOB '[AB]*' ; '''

pd.read_sql_query(query, connection).head()

## SQL ORDER BY 

The ORDER BY keyword is used to sort the result-set by one or more columns. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword. The syntax is:

```sql
SELECT column_name 
FROM table_name
ORDER BY column_name ASC|DESC
````

Let's see it in action:

# Select all customers and order results by last name
query = ''' SELECT *
            FROM customer
            ORDER BY last_name ; '''

pd.read_sql_query(query, connection).head()

# Select all customers and order results by last name, DESCENDING
query = ''' SELECT *
            FROM customer
            ORDER BY last_name DESC; '''

pd.read_sql_query(query, connection).head()

## SQL GROUP BY 

The GROUP BY statement is used with the aggregate functions to group the results by one or more columns. The syntax is:

SELECT column_name, aggregate_function(column_name) <br/>
FROM table_name <br/>
WHERE column_name operator value <br/>
GROUP BY column_name; 

Let's see how it works.

# Count the number of customers per store

query = ''' SELECT store_id , COUNT(customer_id)
            FROM customer
            GROUP BY store_id; '''

pd.read_sql_query(query, connection).head()