# **Structured Query Language (SQL)**

SQL is a special programming language used for communicating with a database 

### **Table and Schema**

<img src="https://ds100.org/course-notes/sql_I/images/sql_terminology.png" alt="Image Alt Text" width="700" height="300">

Looking at the `Dragon` table above, we can see that it contains three distinct columns 
* `"name"` contains text data 
* `"year"` contains integer data that must be greater than $2000$
* `"cute"` contains integer data thathas no restrictions on allowable values 

**schema** refers to the structure that defines the organization of data within the database 
* includes tables themselves, their columns, data types, constraints, and relationships with other tables 

The statement `CREATE TABLE` is used to specify the schema of a table - a description of what logic is used to organize the table 
* If you want to create a table as a filtered version of another, you can use `CREATE TABLE AS` where after the `AS` goes your SQL query from the other table


Here is the following format of the Schema: 

* `ColName`: The name of a column 

* `DataType`: the type of data to be stored in a column: 
    * `INT` (integers)
    * `FLOAT` (floating point numbers)
    * `TEXT` (strings)
    * `BLOB` (arbitrary data, such as audio / video files)
    * `DATETIME` (a date and time )

* `Constraint`: some restriction on the data to be stored in the column 
    * `CHECK` (data must obey a certain condition)
    * `PRIMARY KEY` (designate a column as the table's primary key)
    * `TEXT` (strings)
    * `NOT NULL` (data cannot be null)
    * `DEFAULT` (a default fill value if no specific entry is given)




#### **Primary Keys**

* The **primary key** is a set of column(s) that uniquely identify each record in the table 
* No two entries in a table can have the same primary key 

#### **Secondary Keys**
* A foreign key is a column or set of columns that references a *primary key in another table*
* A foregin key constraint ensures that a primary key exists in the referenced table 

Let's say we have 2 tables, `student` and `assignment` as follows

```sql
CREATE TABLE student (
    student_id INTEGER PRIMARY KEY,
    name VARCHAR,
    email VARCHAR
);
```

```sql
CREATE TABLE assignment (
    assignment_id INTEGER PRIMARY KEY,
    description VARCHAR
);
```

* Note that each table has a primary key that uniquely identifies each student and assignment 

Now, suppose we want to create the table `grade` to store the score each student got on each assignment

```sql
CREATE TABLE grade (
    student_id INTEGER,
    assignment_id INTEGER,
    score REAL,
    FOREIGN KEY (student_id) REFERENCES student(student_id),
    FOREIGN KEY (assignment_id) REFERENCES assignment(assignment_id)
);
```


### **Queries** 

We refer to the peices of SQL code as **queries**

#### `SELECT`**ing From Tables**

The basic unit of a SQL query is the `SELECT` statement
* Specifies what columns we would like to extract from a given table 

We use `FROM` to tell SQL which table we want to `SELECT` our data from 

```sql
SELECT *
FROM Dragon;
```
* In SQL, `*` means 'everything", so the above query grabs *all* columns in `Dragon`

If we don't want all of our columns, we can also grab specific columns 

```sql
SELECT cute, year
FROM Dragon;
```

* This will output the selected columns *in the order* from which they were selected 

**Every** SQL query must include both a `SELECT` and `FROM` statement 

SQL enforces a strict "order of operations" -- SQL clauses must *always* follows the sequence (and have the same general structure)

```sql
SELECT <column list>
FROM <table>
[additional clauses]
```

### **Aliasing with `AS`** 

The `AS` keyword allows us to give a column a new name (called an **alias**) after it has been `SELECT`ed. The general syntax is: 


```sql
SELECT column_in_input_table AS new_name_in_output_table
```
<br>

```sql
SELECT cute AS cuteness, year AS birth
FROM Dragon;
```

* You can think of this as when we select specific columns from our table, we can choose what their names will appear as in the output

### **Uniqueness with `DISTINCT`** 

To `SELECT` only the *unique* values in a column, we use the `DISTINCT` keyword
* Any duplicate entries in a column will be removed 

```sql
SELECT DISTINCT year
FROM Dragon;
```

### **Applying `WHERE` conditions** 

The `WHERE` keyword is used to select only some rows of a table, filtered on a given boolean condition 

```sql
SELECT name, year
FROM Dragon
WHERE cute > 0;
```

We can make these `WHERE` conditions more complicated using `AND`, `OR`, and `NOT`

```sql
SELECT name, year
FROM Dragon
WHERE cute > 0 OR year > 2013;
```

We can also filter for entries that are `IN` a specified list of values 

```sql
SELECT name, year
FROM Dragon
WHERE name IN ('hiccup', 'puff');
```


#### **Strings in SQL** 

Unlike Python, In SQL `""` and `''` serve different purposes 
* Double quotes `""` are used for *column names* 

* Single quotes `''` are used for *strings*

Here is an example:

```sql
SELECT "birth weight"
FROM patient
WHERE "first name" = 'Joey'
```

#### **`WHERE` WITH `NULL` Values** 

* If we want to filter our table to include only non-`NULL` values, we can do something similar to the example below: 

```SQL
SELECT name, cute
FROM Dragon
WHERE cute IS NOT NULL;
```

### **Sorting and Restricted Output**

What if we want our data to be in a certain order? The `ORDER BY` keyword can help us!

```SQL
SELECT *
FROM Dragon
ORDER BY cute;
```

By default, `ORDER BY` will display results in ascending order (`ASC`) with the lowers values at the top of the table 

* We can change tis to sort in descending order by using the `DESC` keyword 

```SQL
SELECT *
FROM Dragon
ORDER BY cute DESC;
```

We can also `ORDER BY` two columns at once

```SQL
SELECT *
FROM Dragon
ORDER BY year, cute DESC;
```
* In this example, `year` is sorted in ascending order and `cute` in descending order 
* If you want `year` to be ordered in descending order as well, you need to specify `year DESC, cute DESC;`

#### `LIMIT` vs. `OFFSET`

`LIMIT` restricts output to a specified number of rows 
* It serves a similar function to that of `.head()` in `pandas`

```SQL
SELECT *
FROM Dragon
LIMIT 2;
```

<br>

`OFFSET` indicates the index at which `LIMIT` should start
* We can use this keyword to shift where the `LIMIT`ing begins by a specified number of rows

```SQL
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
```

* The above examples starts our limit at positions $2$ and $3$

## **Summary**

So far, here are the current building blocks we can use to make our queries: 

```SQL

SELECT <column list>
FROM <table>
[WHERE <predicate>]
[ORDER BY <column list>]
[LIMIT <number of rows>]
[OFFSET <number of rows>

```