<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Database_02_IntroToSQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to SQL: SELECT
### Database and SQL Through Pop Culture | Brendan Shea, PhD
In this chapter, we dive into the heart of SQL and explore some of its most fundamental and powerful features. We start by understanding the basics of the SELECT and FROM statements, which allow us to retrieve data from a table. We then learn how to sort this data using ORDER BY and filter it based on specific conditions using WHERE. The chapter also covers how to handle NULL values, search for patterns using LIKE, and perform calculations across sets of rows using aggregate functions. Throughout, we use the real-world example of a database of sci-fi books to illustrate these concepts in a practical context.

Learning Outcomes: By the end of this chapter, you will be able to:

1.  Retrieve data from a table using SELECT and FROM statements
2.  Sort query results using ORDER BY
3.  Filter data based on specific conditions using WHERE
4.  Handle NULL values in your queries
5.  Search for patterns in data using LIKE
6.  Use aggregate functions to perform calculations across sets of rows
7.  Make your query results more readable using aliases with AS

##  What is the Relational Model?

The **relational model** of data has revolutionized the way we store, organize, and interact with information. This model, proposed by English computer scientist Edgar F. Codd in 1970 while working at IBM, forms the foundation of modern database systems.

At its core, the relational model is a way of structuring data in a database using tables. These tables, which resemble spreadsheets, represent specific entities or concepts. For instance, a library database might have tables for "Books," "Authors," "Publishers," and "Borrowers."

Some ideas of the relational model:

-   An **entity** is a specific object or concept that data is being stored about, such as a book, author, or customer.
-   An **attribute** is a characteristic or property of an entity, such as a book's title, author, or publication date.
-   A **table** is a collection of data elements organized in terms of rows and columns. Each table represents a specific entity.
-   A **row** (record or tuple) is a single instance of an entity in a table, such as one specific book in the "Books" table.
-   A **column (field)** represents an attribute of the entity in a table, such as the "Title" or "Author" of a book.
-   A **primary key** is a unique identifier for each row in a table, used to distinguish one record from another.
-   A **foreign key** is a field in one table that uniquely identifies a row in another table, used to establish relationships between tables.

Here's an example of what a simple "Books" table might look like:

| id | title | author | publication_year |
| --- | --- | --- | --- |
| 1 | Dune | Frank Herbert | 1965 |
| 2 | Ender's Game | Orson Scott Card | 1985 |
| 3 | Hitchhiker's Guide to the Galaxy | Douglas Adams | 1979 |

In this table, each row represents a specific book, with columns for the book's unique identifier (id), title, author, and publication year.

The real strength of the relational model lies in its ability to establish relationships between tables. Instead of duplicating data across tables, we can link related data using keys.

For example, let's say we have a separate "Authors" table:

| id | name | birth_year |
| --- | --- | --- |
| 1 | Frank Herbert | 1920 |
| 2 | Orson Scott Card | 1951 |
| 3 | Douglas Adams | 1952 |

We can link the "Books" and "Authors" tables using the "author_id" field in the "Books" table, which would serve as a foreign key referencing the "id" field (the primary key) in the "Authors" table:

| id | title | author_id | publication_year |
| --- | --- | --- | --- |
| 1 | Dune | 1 | 1965 |
| 2 | Ender's Game | 2 | 1985 |
| 3 | Hitchhiker's Guide to the Galaxy | 3 | 1979 |

This structure allows us to efficiently store and manage data, as changes to an author's details only need to be made in the "Authors" table, and will automatically be reflected wherever that author is referenced.

### BrendyBot is Here to Answer Your Questions
![image.png](https://github.com/brendanpshea/colab-utilities/raw/main/brendy_bot_pic.png)

If you have questions about the content of this chapter, you can try out "BrendyBot", an AI chat bot I've trained on the lecture notes for this class (note that BrendyBot is stil experimental, and can definitley make mistakes!).

https://poe.com/BrendyBot

## What is SQL?

**SQL**, which stands for Structured Query Language, is a programming language used to communicate with and manipulate relational databases.SQL was developed in the early 1970s at IBM to work with these relational databases, and it has since become the standard language for interacting with them.

One of the key features of SQL is that it is a **declarative language**. This means that when you write SQL queries, you tell the database what you want it to do, but not how to do it. It's a bit like giving someone directions to a destination without specifying the exact route they should take. The database figures out the best way to execute your query and retrieve the data you requested.

A **query** is simply a question or a request for data from a database. With SQL, you can write queries to retrieve, insert, update, and delete data in a database. For example, if you have a database of sci-fi books, you might write a query to find all the books published after a certain year, like this:

```sql
SELECT title, author, publication_year
FROM Books
WHERE publication_year > 1980;
```

This query tells the database, "Give me the title, author, and publication year for all the books in the 'Books' table that were published after 1980."

SQL is divided into several sublanguages, each with a specific purpose:

1.  **Data Definition Language (DDL).** This part of SQL is used to define and modify the structure of a database, including creating, altering, and deleting tables and other database objects.
2.  **Data Manipulation Language (DML).** DML is used to manipulate the data stored in a database. This includes inserting new data, updating existing data, and deleting data.
3.  **Data Query Language (DQL).** This is the part of SQL used to retrieve data from a database. The most common DQL command is SELECT, which is used to query data from one or more tables.
4.  **Data Control Language (DCL).** DCL is used to manage access to a database. This includes granting and revoking permissions for users to perform specific actions on the database.
5.  **Transaction Control Language (TCL).** TCL is used to manage database transactions. This includes commands to commit (permanently save) or rollback (undo) changes made to the database.

By learning SQL, you gain a powerful tool for working with relational databases. Whether you're a data analyst, a software developer, or just someone who wants to understand how to manage and query data effectively, SQL is a valuable skill to have. It's used in a wide variety of settings, from small personal projects to large-scale enterprise applications, making it a versatile and essential part of working with data.

##  What is SQLite?

**SQLite** is a lightweight, file-based relational database management system (RDBMS) that is embedded directly into the application that uses it. Unlike traditional database systems that run as separate server processes, SQLite is a serverless, self-contained library that requires minimal support from external libraries or operating systems. This unique architecture makes SQLite an ideal choice for many scenarios, particularly those where a full-fledged database server might be overkill or impractical. It's one of the most widely installed pieces of software in the world, and is the database of choice for many mobile, web, and desktop applications, as well "embedded" databases in the internet-of-things.

One of the key advantages of SQLite is its simplicity and ease of use. Setting up a new SQLite database is as simple as creating a new file on your computer. There is no need to install any additional software or configure a complex server environment. This makes SQLite an excellent choice for beginners learning SQL and for developers who need a quick and easy way to store and manage data in their applications. However, SQLite isn't well-suited for traditional business/enterprise use (where there are large numbers of users simultaneously writing to huge databases across a network). For this, you'd need to use a traditional "client-server database" (such as Oracle, SQL Server, MySQL, or Postgres).

Happily, SQLite supports most of the core features of SQL, including complex queries, transactions, and constraints. This means that you can use the same SQL syntax and concepts you would use with other RDBMSs, making it a great learning tool and a gateway to more advanced database systems.

Throughout this book, we'll be using SQLite to explore SQL and relational databases. By the end, you'll have a solid foundation in SQL and a practical understanding of how to use SQLite to manage and query data in your own projects, whether you're building a mobile app, a small web application, or a desktop tool for managing your personal library of sci-fi books.

## Getting Started With SQLite in Google Colab
Google Colab is a free, cloud-based Jupyter notebook environment that allows you to write and execute Python code in your browser. It's a great platform for learning and experimenting with SQL and SQLite, as it provides a simple, interactive way to run SQL queries and visualize the results.

To get started with SQLite in Google Colab, we'll use the IPython SQL magic extension. This extension allows us to write SQL queries directly in our notebook cells, prefixed with the `%%sql` magic command.
First, if needed, let's install the IPython SQL extension and load it into our notebook:

In [None]:
# !pip install ipython-sql # Not needed--already installed
%load_ext sql

Now that we have the SQL magic extension loaded, we can connect to our SQLite database. In this case, we'll be using a pre-populated database file called `books.db`, which contains data about a collection of sci-fi books.

To load the `books.db` file into our Colab notebook, we can use the `wget` command to download it from a web link:

In [None]:
!wget -N 'https://github.com/brendanpshea/database_sql/raw/main/data/sci_fi_books.db' -q

Once the file is downloaded, we can connect to it using the SQL magic extension:

In [None]:
%config SqlMagic.autopandas=True
%sql sqlite:///sci_fi_books.db

This command establishes a connection to the `sci_fi_books.db` SQLite database file in our current working directory.

With the connection established, we can now run SQL queries on our database directly in our Colab notebook cells. For example, to see a list of all the tables in our database, we can use the following command.



In [None]:
%%sql
SELECT name FROM sqlite_master WHERE type='table';

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,name
0,books


## Getting to Know the Books Table

The data we'll be working with throughout this book comes from Goodreads, a popular social cataloging website that allows users to search its database of books, annotations, and reviews. Users can sign up and register books to generate library catalogs and reading lists. They can also create their own groups of book suggestions, surveys, polls, blogs, and discussions.

For our purposes, we'll be using a dataset that contains information about a collection of science fiction books. This data has been extracted from Goodreads and formatted to fit into a SQLite database, allowing us to easily explore and analyze it using SQL.

To work effectively with a database, it's crucial to understand its structure or schema. A database **schema** is a blueprint that defines how the data is organized. It specifies what tables exist in the database, what columns each table has, and what type of data each column can store.

Here's a breakdown of the schema for our "Books" table:

-   `id`: A unique identifier for each book, serving as the PRIMARY KEY for our database. This ensures that each record in the table can be uniquely identified and accessed.
-   `title`: The title of the book, stored as a TEXT data type.
-   `series`: If the book is part of a series, this column will contain the name of that series. It's also stored as TEXT.
-   `author`: The author of the book, stored as TEXT.
-   `rating`: The average rating of the book on Goodreads, stored as a REAL number (a decimal number).
-   `language`: The language the book is written in, stored as TEXT.
-   `pages`: The number of pages in the book, stored as an INT (integer).
-   `publisher`: The name of the book's publisher, stored as TEXT.
-   `numRatings`: The total number of ratings the book has received on Goodreads, stored as an INT.
-   `firstPublishDate`: The date when the book was first published, stored as TEXT.
-   `publishDate`: The most recent publication date of the book, also stored as TEXT.

Understanding this schema is essential for writing effective SQL queries. It tells us what information we have access to and how it's structured. For example, knowing that the `rating` is stored as a REAL number tells us that we can perform mathematical operations on it, like finding the average rating across all books. Knowing that `pages` is an INT tells us that we can use it for numerical comparisons, like finding all books over 500 pages.

##  SELECT and FROM

Now that we understand the structure of our database, let's dive into the fundamental building blocks of SQL queries: SELECT and FROM.

The SELECT statement is used to retrieve data from one or more tables in a database. Its basic syntax is as follows:

```sql
SELECT column1, column2, ...
FROM table_name;
```

Here's what each part of this statement means:

-   `SELECT`: This keyword initiates the query and indicates that we're about to specify which columns we want to retrieve.
-   `column1, column2, ...`: These are the names of the columns we want to include in our result set. You can list as many columns as you want, separated by commas. If you want to select all columns, you can use an asterisk (`*`) instead.
-   `FROM`: This keyword indicates that we're about to specify which table we want to retrieve data from.
-   `table_name`: This is the name of the table we're querying.

So, in plain English, this statement tells the database: "Give me the specified columns from this table."

Let's look at some examples of SELECT queries using our "Books" table.

### Selecting all columns

To select all columns from the "Books" table, we can use the asterisk (`*`) shorthand:

In [None]:
%%sql
SELECT * -- Select all columns
FROM Books -- From the Books table

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,firstPublishDate,publishDate,title,series,author,rating,language,pages,publisher,numRatings,id
0,,2008-09-14,The Hunger Games,The Hunger Games #1,Suzanne Collins,4.33,English,374.0,Scholastic Press,6376780,1
1,2003-06-21,2004-09-28,Harry Potter and the Order of the Phoenix,Harry Potter #5,J.K. Rowling,4.50,English,870.0,Scholastic Inc.,2507623,2
2,1945-08-17,1996-04-28,Animal Farm,,George Orwell,3.95,English,141.0,Signet Classics,2740713,3
3,1956-10-28,2002-09-16,The Chronicles of Narnia,The Chronicles of Narnia (Publication Order) #...,C.S. Lewis,4.26,English,767.0,HarperCollins,517740,4
4,1955-10-20,2012-09-25,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...,The Lord of the Rings #0-3,J.R.R. Tolkien,4.60,English,1728.0,Ballantine Books,110146,5
...,...,...,...,...,...,...,...,...,...,...,...
6586,,1998-04-02,To Hold Infinity,,John Meaney,3.73,English,560.0,Bantam,141,6599
6587,,2015-10-01,The Natural Way of Things,,Charlotte Wood,3.53,English,320.0,Allen & Unwin,10894,6600
6588,,1983-01-01,Arafel's Saga,Arafel #1-2,C.J. Cherryh,3.69,English,408.0,"Nelson Doubleday, Inc.",1070,6601
6589,,2017-01-12,Nameless Fate,Fated Mate #1,Stephanie West,3.93,English,445.0,,508,6602


### Selecting specific columns
If we only want to retrieve certain columns, we can list them explicitly after the SELECT keyword:

In [None]:
%%sql
SELECT
  title, -- Select the title column
  author, -- Select the author column
  publishDate-- Select the publishDate column
FROM Books -- From the Books table
LIMIT 10; -- Limit the result to the first 10 rows

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author,publishDate
0,The Hunger Games,Suzanne Collins,2008-09-14
1,Harry Potter and the Order of the Phoenix,J.K. Rowling,2004-09-28
2,Animal Farm,George Orwell,1996-04-28
3,The Chronicles of Narnia,C.S. Lewis,2002-09-16
4,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...,J.R.R. Tolkien,2012-09-25
5,The Hitchhiker's Guide to the Galaxy,Douglas Adams,2007-06-23
6,Fahrenheit 451,Ray Bradbury,2011-11-29
7,Divergent,Veronica Roth,2012-02-28
8,Ender's Game,Orson Scott Card,2004-09-30
9,Harry Potter and the Sorcerer's Stone,J.K. Rowling,2003-11-01


###  Selecting distinct values
Sometimes, you might want to retrieve only the unique values from a column. You can do this using the DISTINCT keyword:

In [None]:
%%sql
SELECT DISTINCT author -- Select distinct values from the author column
FROM Books -- From the Books table
LIMIT 10; -- Limit the result to the first 10 distinct authors

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,author
0,Suzanne Collins
1,J.K. Rowling
2,George Orwell
3,C.S. Lewis
4,J.R.R. Tolkien
5,Douglas Adams
6,Ray Bradbury
7,Veronica Roth
8,Orson Scott Card
9,Audrey Niffenegger


You can see the difference this makes if you try the same query WITHOUT using distinct.

In [None]:
%%sql
SELECT author -- Select all values from the author column
FROM Books -- From the Books table
LIMIT 15; -- Limit the result to the first 15 rows

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,author
0,Suzanne Collins
1,J.K. Rowling
2,George Orwell
3,C.S. Lewis
4,J.R.R. Tolkien
5,Douglas Adams
6,Ray Bradbury
7,Veronica Roth
8,Orson Scott Card
9,J.K. Rowling


You'll notice that some names (such as JK Rowling) appear twice. Sometimes, you'll want this (for example, if you are trying to count the number of books she wrote). Other times, though, you'll want to use DISTINCT to get rid of these "duplicates."

## ORDER BY
The ORDER BY clause is used to sort the results of a SQL query based on one or more columns. Its basic syntax is as follows:

```sql
SELECT
  column1,
  column2,
  ...
FROM
  table_name
ORDER BY
  column1 [ASC|DESC],
  column2 [ASC|DESC],
  ...;
```

Here's what each part of this statement means:

-   `SELECT` and `FROM` are used as before to specify the columns to retrieve and the table to retrieve them from.
-   `ORDER BY` is followed by the column(s) you want to use for sorting the results.
-   For each column, you can optionally specify `ASC` for ascending order (smallest to largest, or A to Z) or `DESC` for descending order (largest to smallest, or Z to A). If not specified, `ASC` is used by default.
-   If multiple columns are specified, the results are sorted by the first column, then by the second column within each value of the first column, and so on.

Now, let's look at a simple example:


In [None]:
%%sql
SELECT
  title, -- Select the title column
  author, -- Select the author column
  rating -- Select the rating column
FROM Books -- From the Books table
ORDER BY rating -- Sort the results by rating in ascending order
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author,rating
0,Revealing Eden,Victoria Foyt,1.99
1,Skull Flowers,Jazon Dion Fletcher,2.48
2,Ù…ÙˆØ³Ù… ØµÙŠØ¯ Ø§Ù„ØºØ²Ù„Ø§Ù†,Ø£Ø­Ù…Ø¯ Ù…Ø±Ø§Ø¯,2.74
3,Blueprint: Blaupause,Charlotte Kerner,2.78
4,Lost,Gregory Maguire,2.82
5,Redemption Prep,Samuel Miller,2.82
6,Alpha Centauri,William Barton,2.87
7,Corvus,L. Lee Lowe,2.9
8,Buck Rogers in the 25th Century: The Western P...,Paul S. Newman,2.91
9,L'an 2440,Louis-SÃ©bastien Mercier,2.92


We can also do this in descending order (with the highest ratings first)

In [None]:
%%sql
SELECT
  title, -- Select the title column
  author, -- Select the author column
  rating -- Select the rating column
FROM Books -- From the Books table
ORDER BY rating DESC -- Sort the results by rating in descending order
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author,rating
0,"Kiss Me, I'm Irish",John Blandly,5.0
1,The Present,Kenneth Thomas,4.92
2,Maya of the Inbetween,Sita Bennett,4.86
3,Predestination: The Future is History,S. W. Cotton,4.85
4,The Beachhead,David Anderson,4.84
5,Assault On Utopia: Part 1,Steven P Sharp,4.8
6,Insectland,Neil D. Ostroff,4.8
7,"The Way of Kings, Part 2",Brandon Sanderson,4.79
8,"Harry Potter Boxed Set, Books 1-5 (Harry Potte...",J.K. Rowling,4.78
9,Words of Radiance,Brandon Sanderson,4.75


Finally, we can order by multiple columns (for example, by author and rating).

##  WHERE

The WHERE clause in SQL is used to filter the results of a query based on a specified condition. It allows you to retrieve only the rows that satisfy a particular criterion, making it a powerful tool for narrowing down your query results to only the data you're interested in.

The basic syntax of a SQL query with a WHERE clause is as follows:

```sql
SELECT
  column1,
  column2,
  ...
FROM
  table_name
WHERE
  condition;
```

Here's what each part of this statement means:

-   `SELECT` and `FROM` are used as before to specify the columns to retrieve and the table to retrieve them from.
-   `WHERE` is followed by a condition that each row must satisfy to be included in the result set.
-   The condition is usually a comparison between a column value and a constant, or between two column values.


Let's look at an example. Let's find all books in our "Books" table with a rating higher than 4.5:


In [None]:
%%sql
SELECT
  title, -- Select the title column
  author, -- Select the author column
  rating -- Select the rating column
FROM Books -- From the Books table
WHERE rating > 4.5 -- Filter the results to only include books with a rating higher than 4.5
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author,rating
0,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...,J.R.R. Tolkien,4.6
1,Harry Potter and the Deathly Hallows,J.K. Rowling,4.62
2,Harry Potter and the Prisoner of Azkaban,J.K. Rowling,4.57
3,Harry Potter and the Goblet of Fire,J.K. Rowling,4.56
4,Harry Potter and the Half-Blood Prince,J.K. Rowling,4.57
5,The Name of the Wind,Patrick Rothfuss,4.53
6,A Storm of Swords,George R.R. Martin,4.53
7,The Wise Man's Fear,Patrick Rothfuss,4.56
8,The Way of Kings,Brandon Sanderson,4.63
9,Harry Potter Series Box Set,J.K. Rowling,4.73


The condition in the WHERE clause can use various comparison operators:

-   `=` for equality
-   `<>` or `!=` for inequality
-   `<` for less than
-   `>` for greater than
-   `<=` for less than or equal to
-   `>=` for greater than or equal to

You can also combine multiple conditions using logical operators:

-   `AND`: Both conditions must be true.
-   `OR`: At least one condition must be true.
-   `NOT`: The condition must not be true.

For example, to find all books with more than 500 pages and more than 10,000 ratings:

In [None]:
%%sql
SELECT
  title,
  author,
  pages,
  numRatings
FROM
  Books
WHERE
  -- Long books that have many ratings
  pages > 500 AND numRatings > 10000
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author,pages,numRatings
0,Harry Potter and the Order of the Phoenix,J.K. Rowling,870,2507623
1,The Chronicles of Narnia,C.S. Lewis,767,517740
2,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...,J.R.R. Tolkien,1728,110146
3,A Game of Thrones,George R.R. Martin,835,2003043
4,Dune,Frank Herbert,661,765785
5,The Stand,Stephen King,1153,616021
6,Harry Potter and the Deathly Hallows,J.K. Rowling,759,2811637
7,The Fellowship of the Ring,J.R.R. Tolkien,527,2355237
8,Atlas Shrugged,Ayn Rand,1168,353814
9,Harry Potter and the Goblet of Fire,J.K. Rowling,734,2594622


In [1]:
import base64
from IPython.display import Image, display
import matplotlib.pyplot as plt

def mm(graph):
    graphbytes = graph.encode("utf8")
    base64_bytes = base64.urlsafe_b64encode(graphbytes)
    base64_string = base64_bytes.decode("ascii")
    display(Image(url="https://mermaid.ink/img/" + base64_string))


mm("""
graph TD
    A[SQL Query] --> B[SELECT]
    A --> C[FROM]
    A --> D[WHERE]
    B -->|Specifies| E[Columns to retrieve]
    C -->|Identifies| F[Table or tables]
    D -->|Filters| G[Rows based on conditions]
    E --> H[Result Set]
    F --> H
    G --> H
""")

## SQLite Syntax Rules
Before continuing our study of SQL, it's worth taking a moment to review some basic syntax rules. As we've been discussing, a basic SQLite query follows this structure:

```sql
SELECT column1, column2
FROM table_name
WHERE condition;
```

### Key Syntax Rules

#### 1. Semicolons (;)
- End each SQL statement with a semicolon.
- This tells SQLite that the statement is complete.

Example:
```sql
SELECT * FROM users;
```

#### 2. Capitalization
- SQLite is not case-sensitive for keywords (SELECT, FROM, WHERE, etc.).
- However, it's a good practice to capitalize keywords for readability.
- Table and column names are case-sensitive if they're in double quotes, otherwise they're not.

Example:
```sql
select * from Users;  -- This works
SELECT * FROM users;  -- This also works
SELECT * FROM "Users";  -- This is case-sensitive
```

#### 3. Line Breaks
- You can write your query on a single line or split it across multiple lines for better readability.
- SQLite ignores extra whitespace.

Example:
```sql
SELECT name, age FROM users WHERE age > 18;

-- This is equivalent to:
SELECT name, age
FROM users
WHERE age > 18;
```

#### 4. Single vs Double Quotes
- Use single quotes ('') for string literals (text values).
- Use double quotes ("") for table or column names if they contain spaces or are case-sensitive.

Example:
```sql
SELECT * FROM users WHERE name = 'Sam Q';
SELECT * FROM "User Table" WHERE "First Name" = 'Sam Q';
```

#### 5. Comments
- Use `--` for single-line comments.
- Use `/* */` for multi-line comments.

Example:
```sql
-- This is a single-line comment
SELECT * FROM users; /* This is a
multi-line comment */
```

## Handling NULL Values

In SQL, NULL is a special value that represents a missing or unknown value. It's important to understand how to handle NULLs in your queries, as they can sometimes lead to unexpected results.

In our "Books" table, both the "series" and "firstPublishDate" columns have some NULL values. This means that for some books, the series information or the original publication date is not known or not applicable.

When filtering data using the WHERE clause, you need to use special operators to check for NULL values:

-   `IS NULL`: Checks if a value is NULL.
-   `IS NOT NULL`: Checks if a value is not NULL.

For example, to find all books that are part of a series (i.e., the "series" column is not NULL):

In [None]:
%%sql
SELECT
  title,
  author,
  series
FROM
  Books
WHERE
  -- books that ARE part of a series
  series IS NOT NULL
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author,series
0,The Hunger Games,Suzanne Collins,The Hunger Games #1
1,Harry Potter and the Order of the Phoenix,J.K. Rowling,Harry Potter #5
2,The Chronicles of Narnia,C.S. Lewis,The Chronicles of Narnia (Publication Order) #...
3,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...,J.R.R. Tolkien,The Lord of the Rings #0-3
4,The Hitchhiker's Guide to the Galaxy,Douglas Adams,The Hitchhiker's Guide to the Galaxy #1
5,Divergent,Veronica Roth,Divergent #1
6,Ender's Game,Orson Scott Card,Ender's Saga #1
7,Harry Potter and the Sorcerer's Stone,J.K. Rowling,Harry Potter #1
8,A Wrinkle in Time,Madeleine L'Engle,Time Quintet #1
9,A Game of Thrones,George R.R. Martin,A Song of Ice and Fire #1


Similarly, to find all books where the original publication date is unknown:

In [None]:
%%sql
SELECT
  title,
  author,
  firstPublishDate
FROM
  Books
WHERE
  -- books where the original publication date is unknown
  firstPublishDate IS NULL
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author,firstPublishDate
0,The Hunger Games,Suzanne Collins,
1,Harry Potter and the Deathly Hallows,J.K. Rowling,
2,Insurgent,Veronica Roth,
3,The Maze Runner,James Dashner,
4,The Selection,Kiera Cass,
5,Catching Fire,Suzanne Collins,
6,Uglies,Scott Westerfeld,
7,Cinder,Marissa Meyer,
8,The Wise Man's Fear,Patrick Rothfuss,
9,Ready Player One,Ernest Cline,


It's important to note that you cannot use the equality operator (=) to check for NULLs. For example, the following query will not return any results:

In [None]:
%%sql
SELECT
  title,
  author,
  firstPublishDate
FROM
  Books
WHERE
  firstPublishDate = NULL; -- This will not work as expected!

 * sqlite:///sci_fi_books.db
Done.


This is because in SQL, any comparison with NULL (even `NULL = NULL`) evaluates to NULL, which is treated as false in the context of a WHERE clause.

##  LIKE

The LIKE operator in SQL is used to search for a specified pattern in a column. It is often used in conjunction with the WHERE clause to filter rows based on a partial match rather than an exact match.

The basic syntax of using LIKE in a WHERE clause is as follows:

```sql
SELECT
  column1,
  column2,
  ...
FROM
  table_name
WHERE
  column LIKE pattern;
```

Here:

-   `column` is the name of the column you want to search.
-   `pattern` is the pattern you want to match. It can include wildcards:
    -   `%` matches any sequence of zero or more characters.
    -   `_` matches any single character.

For example:
- `title LIKE 'a%'` matches any title beginning with 'a' (case-insenstive).
- `title LIKE '%a'` matches any title ending with 'a' (case-insenstive).
- `title LIKE '%a%'` matches any title with an 'a' (case-insenstive) anywhere in the title.
- `title LIKE '_a'` matches any title with an 'a' as the second character (case-insenstive).

Let's look at some examples using our "Books" table.

### Example: Book titles starting with 'The'
To find all books whose title starts with "The":

In [None]:
%%sql
SELECT
  title,
  author
FROM
  Books
WHERE
  -- books whose title starts with "The"
  title LIKE 'The%'
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author
0,The Hunger Games,Suzanne Collins
1,The Chronicles of Narnia,C.S. Lewis
2,The Hitchhiker's Guide to the Galaxy,Douglas Adams
3,The Time Traveler's Wife,Audrey Niffenegger
4,The Princess Bride,William Goldman
5,The Handmaid's Tale,Margaret Atwood
6,The Giver,Lois Lowry
7,The Stand,Stephen King
8,The Fellowship of the Ring,J.R.R. Tolkien
9,The Road,Cormac McCarthy


### Example: Author Names Containing `Lewis`
Now, let's look for author names that contain "lewis" anywhere.

In [None]:
%%sql
SELECT
  title,
  author
FROM
  Books
WHERE
  -- author names that contain "lewis" anywhere
  author LIKE '%lewis%'
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,author
0,The Chronicles of Narnia,C.S. Lewis
1,The Voyage of the Dawn Treader,C.S. Lewis
2,The Last Battle,C.S. Lewis
3,The Silver Chair,C.S. Lewis
4,Prince Caspian,C.S. Lewis
5,Perelandra,C.S. Lewis
6,That Hideous Strength,C.S. Lewis
7,Space Trilogy: Out of the Silent Planet / Pere...,C.S. Lewis
8,Out of the Silent Planet,C.S. Lewis
9,It Can't Happen Here,Sinclair Lewis


### Case Study: Snape's SQL Lesson
Professor Snape swept into the dungeon classroom, his robes swirling around him. He turned to face the students, his dark eyes glittering.

"Today," he said, his voice barely above a whisper, "we will be learning the fundamentals of SQL. Open your textbooks to page 394."

He waved his wand, and a complex diagram appeared on the chalkboard. "SQL, or Structured Query Language, is the language we use to communicate with databases. It allows us to retrieve, manipulate, and analyze data with precision and efficiency. Now, who can tell me the basic structure of an SQL query? Mr. Weasley?"

Ron gulped. "Um... SELECT, FROM, WHERE?"

Snape's lip curled. "Correct, but incomplete. A full SQL query consists of SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT clauses, in that order. Five points from Gryffindor for your lack of thoroughness."

He turned to the class. "Now, let's start with a simple query. Miss Granger, write a query to select all columns and rows from the 'students' table."

Hermione stepped up to the chalkboard and confidently wrote: `SELECT * FROM students;`

"Correct," Snape said grudgingly. "Five points to Gryffindor. Now, Mr. Malfoy, write a query to select only the 'name' and 'house' columns from the 'students' table, but with a syntax error."

Malfoy smirked and wrote: `SELECT name, house students;`

"Incorrect," Snape said. "You've forgotten the FROM keyword. The correct query is: `SELECT name, house FROM students;` Five points from Slytherin."

"Miss Lovegood," Snape said, turning to Luna. "Write a query to select all students from Ravenclaw."

Luna dreamily approached the chalkboard and wrote: `SELECT * FROM students WHERE house = 'Ravenclaw';`

"Correct," Snape said, nodding slightly. "Five points to Ravenclaw. Now, Mr. Potter, write a query to select all students whose names begin with the letter 'H', but with a logical error."

Harry stepped up and wrote: `SELECT * FROM students WHERE name = 'H%';`

"Incorrect," Snape said. "You've used the equals operator instead of the LIKE operator. The correct query is: `SELECT * FROM students WHERE name LIKE 'H%';` Five points from Gryffindor."

The lesson continued, with Snape challenging the students with increasingly complex queries.

"Miss Granger," he said, "write a query to select the top 5 students by alphabetical order of name."

Hermione wrote: `SELECT * FROM students ORDER BY name ASC LIMIT 5;`

"Correct. Five points to Gryffindor."

"Miss Lovegood, write a query to select all students who are not in Slytherin."

Luna wrote: `SELECT * FROM students WHERE house <> 'Slytherin';`

"Correct. Five points to Ravenclaw."

As the class ended, Snape assigned a three-foot essay on the different types of SQL operators and their uses. The students filed out, their heads spinning with the intricacies of SQL, but Hermione and Luna shared a satisfied smile, knowing they had mastered the day's lesson.

## Aggregate Functions and Aliasing with AS

Aggregate functions are powerful tools in SQL that allow you to perform calculations on sets of rows. They operate on a set of values and return a single result. Some common aggregate functions are:

-   COUNT(): Counts the number of rows that match the specified criteria.
-   SUM(): Calculates the sum of a set of values.
-   AVG(): Calculates the average of a set of values.
-   MAX(): Returns the largest value in a set of values.
-   MIN(): Returns the smallest value in a set of values.

### Examples of Aggregate Functions

Let's look at some examples using our "Books" table.

In [None]:
%%sql
--Find total number of books
SELECT
  COUNT(*) AS total_books
FROM
  Books;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,total_books
0,6591


In [None]:
%%sql
--Average ratings of books
SELECT
  AVG(rating) AS avg_rating
FROM
  Books;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,avg_rating
0,3.988885


In [None]:
%%sql
--Total number of pages across all books
SELECT
  SUM(pages) AS total_pages
FROM
  Books;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,total_pages
0,2495231


In [None]:
%%sql
--Find maximum and minimum rating
SELECT
  MAX(rating) AS max_rating,
  MIN(rating) AS min_rating
FROM
  Books;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,max_rating,min_rating
0,5.0,1.99


### Combining Aggregate Functions with WHERE Clauses
We can also combine aggregate functions with WHERE clauses to calculate values based on certain conditions.

In [None]:
%%sql
--Find the average rating of books after 2000
SELECT
  AVG(rating) AS avg_rating_after_2000
FROM
  Books
WHERE
  publishDate > 2000;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,avg_rating_after_2000
0,3.994752


In [None]:
%%sql
--Find the number of books with more than 500 pages
SELECT
  COUNT(*) AS books_over_500_pages
FROM
  Books
WHERE
  pages > 500;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,books_over_500_pages
0,1108


In [None]:
%%sql
--Find the maximum rating of books with more than 10,000 ratings
SELECT
  MAX(rating) AS max_rating_over_10000
FROM
  Books
WHERE
  numRatings >= 10000;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,max_rating_over_10000
0,4.79


### Aliasing with AS
The `AS` keyword is used to give a column or an expression a temporary name (an **alias**) in the result set. This can make the result set more readable.

In the previous examples, we used AS to give descriptive names to the results of the aggregate functions, like "total_books", "avg_rating", "max_rating", etc. This makes the output easier to understand at a glance.

Aggregate functions are essential for summarizing and analyzing data in SQL. They allow you to quickly calculate totals, averages, maximums, minimums, and other summary statistics across entire sets of rows or based on specific conditions. Combined with the aliasing capabilities of the AS keyword, they provide a powerful way to extract insights from your data.


## Basic Math Operations and Functions in SQL

SQL supports various mathematical operations and functions that allow you to perform calculations on numeric data directly within your queries. These can be used in the SELECT clause, the WHERE clause, or in combination with aggregate functions.

The basic arithmetic operators in SQL are:

-   Addition: `+`
-   Subtraction: `-`
-   Multiplication: `*`
-   Division: `/`
-   Modulo (remainder): `%`

For example, let's say we want to calculate the price after a 10% discount for a book that costs $20:

In [None]:
%%sql
SELECT
  20 - (20 * 0.1) AS discounted_price;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,discounted_price
0,18.0


SQL also provides a set of built-in mathematical functions. Some commonly used ones are:

-   `ROUND(number, decimal_places)`: Rounds a number to a specified number of decimal places.
-   `FLOOR(number)`: Returns the largest integer value that is less than or equal to the number.
-   `CEIL(number)` or `CEILING(number)`: Returns the smallest integer value that is greater than or equal to the number.
-   `ABS(number)`: Returns the absolute value of the number.
-   `SQRT(number)`: Returns the square root of the number.
-   `POWER(number, power)`: Returns the result of raising the number to the specified power.

Let's look at an example using `ROUND()`. Say we want to see the average rating of each book rounded to 1 decimal place:

In [None]:
%%sql
--Get rounded rating
SELECT
  title,
  ROUND(rating, 1) AS rounded_rating
FROM
  Books
LIMIT 10;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,title,rounded_rating
0,The Hunger Games,4.3
1,Harry Potter and the Order of the Phoenix,4.5
2,Animal Farm,4.0
3,The Chronicles of Narnia,4.3
4,J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...,4.6
5,The Hitchhiker's Guide to the Galaxy,4.2
6,Fahrenheit 451,4.0
7,Divergent,4.2
8,Ender's Game,4.3
9,Harry Potter and the Sorcerer's Stone,4.5


Mathematical operations and functions can be particularly useful when combined with aggregate functions.

For example, let's calculate the average number of pages across all books:

In [None]:
%%sql
--Calculate average number of pages, round to 0 decimal places
SELECT
  ROUND(AVG(pages), 0) AS avg_pages
FROM
  Books;

 * sqlite:///sci_fi_books.db
Done.


Unnamed: 0,avg_pages
0,391.0


## Giles the Librarian on Writing Good SQL Queries
*clears throat and speaks in a British accent*

Ah, yes, writing efficient SQL queries for a single table. It's rather like navigating a labyrinth of ancient tomes, you see. One must be precise and methodical to find the desired information without getting lost in the stacks.

Now, let's start at the beginning, shall we? Always begin your query with the SELECT statement, followed by the FROM clause. It's like declaring your intent to the powers that be. "I seek the knowledge contained within these columns, and I shall find it in this table!"

For example, if you were searching for a particular vampire's name and age in the "Vampires" table, you might start with:

```sql
-- Good capitalizations makes things clear
SELECT
name, age
FROM Vampires;
```

Notice how I've capitalized the keywords? It's not strictly necessary, but it does make the incantation - er, query - much easier to read. Like using a bookmark to find your place in a grimoire.

Now, let's say you only wanted to find vampires over a certain age. You'd use a WHERE clause to filter the results, like so:

```sql
--Most queries need a WHERE clause...
SELECT name, age
FROM Vampires
WHERE age > 100;
```

This would return only the names and ages of vampires over a century old. It's rather like only pulling the relevant books off the shelf, instead of lugging the entire stack to your desk.

If you needed the results in a particular order, perhaps alphabetically by name, you'd invoke the ORDER BY clause:

```sql
--Order by is helpful for interpreting data
--It can slow things down, though
SELECT name, age
FROM Vampires
WHERE age > 100
ORDER BY name;
```

But be judicious with your sorting! Like reorganizing the entire library for each query, it can slow things down considerably.

When trying out a new query, it's wise to use the LIMIT clause to retrieve only a few results at first. Think of it as skimming a book's table of contents before committing to reading the entire tome. For instance:

```sql
--LIMIT makes this go much faster.
SELECT name, age
FROM Vampires
WHERE age > 100
ORDER BY name
LIMIT 5;
```

This would return only the first five results, allowing you to check your work before unleashing the query on the entire table.

Remember to comment your queries, like leaving helpful notes in the margins of a book. It will make it much easier to decipher your work later on, or for another researcher to follow in your footsteps.

And finally, always double-check your query before executing it, especially if it modifies data! One misplaced symbol can have catastrophic consequences, like accidentally setting fire to the rare books section. Believe me, I've seen it happen.

*takes off glasses and polishes them*

So, in summary, writing effective SQL queries is rather like being a responsible librarian. Be precise, be organized, and above all, respect the power you wield. The knowledge you seek is at your fingertips - use it wisely!

### Practice Your SQL
You can run the following program to practice writing SQL queries:


In [None]:
!wget https://github.com/brendanpshea/colab-utilities/raw/main/sql_select_quiz.py -q -nc
from sql_select_quiz import sql_select_quiz_from_id
sql_select_quiz_from_id("books")

VBox(children=(Textarea(value='', description='Query:', layout=Layout(height='100px', width='60%'), placeholde…

### Review With Quizlet

In [None]:
%%html
<iframe src="https://quizlet.com/819299445/learn/embed?i=psvlh&x=1jj1" height="600" width="100%" style="border:0"></iframe>

## GLossary

| Term | Definition |
|------|------------|
| Relational model | A database model that organizes data into one or more tables (relations) of rows and columns, with a unique key for each row |
| Table (relation) | A collection of related data entries consisting of rows and columns |
| Row (record) | A single, implicitly structured data item in a table |
| Column (field) | A set of data values of a particular type, one for each row of the table |
| Primary key | A unique identifier for each row in a table |
| Foreign key | A field in one table that uniquely identifies a row of another table, used to establish relationships between tables |
| SQL | Structured Query Language, a standard language for managing and manipulating relational databases |
| Declarative language | A programming paradigm that expresses the logic of a computation without describing its control flow |
| Query | A request for data or information from a database |
| SQLite | A self-contained, serverless, and zero-configuration relational database engine |

### SQL Terms
| Term | Definition |
|------|------------|
| SELECT | In the query "_____ column1, column2, etc. FROM table", this term specifies the columns to retrieve. |
| * | In the query "SELECT _____ FROM table", this symbol is used to select all columns. |
| FROM | In the query "SELECT column1, column2 _____ table", this clause specifies the table to query. |
| ORDER BY | In "SELECT column FROM table _____ column", this clause sorts the result set. |
| ASC | In "ORDER BY column _____", this keyword sorts results in ascending order (default if omitted). |
| DESC | In "ORDER BY column _____", this keyword sorts results in descending order. |
| LIMIT | In "SELECT column FROM table _____ 10", this clause restricts the number of rows returned. |
| DISTINCT | In "SELECT _____ column FROM table", this keyword eliminates duplicate rows from the result set. |
| WHERE | In "SELECT column FROM table _____ condition", this clause filters rows based on a condition. |
| AND | In "WHERE condition1 _____ condition2", this operator combines multiple conditions (all must be true). |
| OR | In "WHERE condition1 _____ condition2", this operator combines multiple conditions (at least one must be true). |
| NOT | In "WHERE _____ condition", this operator negates a condition. |
| IS NULL | In "WHERE column _____", this checks if a value is missing or undefined in the database. |
| IS NOT NULL | In "WHERE column _____", this checks if a value is present and defined in the database. |
| LIKE | In "WHERE column _____ pattern", this operator performs pattern matching with wildcard characters. |
| % | In "WHERE column LIKE '____text'", this wildcard matches any sequence of zero or more characters. |
| _ | In "WHERE column LIKE '____'", this wildcard matches any single character. |
| [CBA] | In "WHERE column LIKE '___at'", this matches any single character in the set (C, B, or A). |
| [A-Z] | In "WHERE column LIKE '____at'", this matches any single character in the range A to Z. |
| [^C] | In "WHERE column LIKE '___at'", this matches any single character except C. |
| COUNT | In "SELECT _____(*) FROM table", this function returns the number of rows. |
| SUM | In "SELECT _____(numeric_column) FROM table", this function calculates the sum of a set of values. |
| AVG | In "SELECT _____(numeric_column) FROM table", this function calculates the average of a set of values. |
| MAX | In "SELECT _____(column) FROM table", this function returns the maximum value in a set. |
| MIN | In "SELECT _____(column) FROM table", this function returns the minimum value in a set. |
| ROUND | In "SELECT _____(numeric_column, decimal_places) FROM table", this function rounds a number to specified decimal places. |
| FLOOR | In "SELECT _____(numeric_column) FROM table", this function returns the largest integer less than or equal to a number. |
| CEIL | In "SELECT _____(numeric_column) FROM table", this function returns the smallest integer greater than or equal to a number. |
| AS | In "SELECT column _____ new_name FROM table", this keyword creates a column alias. |
| + | In "SELECT (column1 _____ column2) AS sum FROM table", this symbol performs addition. |
| - | In "SELECT (column1 _____ column2) AS difference FROM table", this symbol performs subtraction. |
| * | In "SELECT (column1 _____ column2) AS product FROM table", this symbol performs multiplication. |
| / | In "SELECT (column1 _____ column2) AS quotient FROM table", this symbol performs division. |
| % | In "SELECT (column1 _____ column2) AS remainder FROM table", this symbol calculates the remainder (modulo). |
