# 1. Splitting data into related tables 

It's often preferable to make sure that a particular column of data is only stored in a single location, so there are fewer places to update and less risk of having different data in different places. If we do that, we need to make sure we have a way to relate the data across the tables. 

When we split data into related tables, this table doesn't look nearly as readable as the old table that had all of information stuffed into every row. But, tables are often not designed to be readable to humans-- they're designed to be the easiest to maintain and least prone to bugs. In many cases, it may be best to split information into multiple related tables, so that there is less redundant data and fewer places to update. 

It's important to understand how to use SQL to deal with data that has been split up into multiple related tables, and bring the data back together across the tables when we need it. 

# 2. JOINing related tables 

<img src="Img/Join1.png" width="400" height="200"> 

What we want to do is output the student name and email next to each test grade. 

## 2.1 CROSS JOIN 

The **CROSS JOIN** is used to generate a paired combination of each row of the first table with each row of the second table. The join type is also known as cartesian join. 

This results table with eight rows, because it creates four rows for each of two rows. 

```SQL
/* Cross Join */ 
SELECT * FROM student_grades, students; 
```

<img src="Img/Join2.png" width="400" height="200"> 

## 2.2 INNER JOIN 

The **INNER JOIN** keyword selects records that have matching values in both tables. We can add WHERE which will check to see that rows of one table matches rows of others. 

```SQL
/*implicit Inner Join */ 
SELECT * FROM student_grades, students
    WHERE student_grades.student_id = students.id; 
    
/* explicit Inner Join */ 
SELECT first_name, last_name, email, test, grade FROM students 
    JOIN students_grades 
    ON students.id = student_grades.student_id; 
```

<img src="Img/Join3.png" width="400" height="200">

If two tables both contains columns with the same column name, We should prefix our columns with the table name that they're from. 

```SQL 
SELECT s.first_name, s.last_name, s.email, sg.test, sg.grade FROM students AS s 
    JOIN student_grades AS sg 
    ON s.id = sg.student_id; 
```

# 3. Joining related tables with left outer joins

<img src="Img/Join4.png" width="400" height="200"> 

What we want is a list of student names and the projects they're working on. To get desired output, we can us INNER JOIN at first. However, INNER JOIN only creates rows if there are matching records in the two tables. 

## 3.1 LEFT OUTER JOIN 

The **LEFT JOIN** keyword returns all records from the left table, and the matching records from the right table. The result is 0 records from the right side, if there is no match. 

The LEFT tells SQL that it should make sure to retain row from the left table, which is the one after the FROM, AND the OUTER tells it that it should retain the rows even if there's no match in the right table. 

```SQL
/* outer join */ 
SELECT students.first_name, students.last_name, student_projects.title
  FROM students
  LEFT OUTER JOIN student_projects
  ON students.id = student_projects.student_id;
```

## 3.2 RIGHT OUTER JOIN 

There's a **RIGHT OUTER JOIN**, and it basically does the opposite, make sure that it keeps everything from the right and then joins with the left. 

## 3.3 FULL OUTER JOIN 

The **FULL OUTER JOIN** matches rows on both the left and the right side, and fills in "NULL"s when it can't on either side. 

# 4. Joining tables to themselves with self-joins 

The **SELF JOIN** is a regular join, but the table is joined with itself. 

```SQL
SELECT s1.first_name, s1.last_name, s2.email AS buddy_email
  FROM students AS s1 
  JOIN students AS s2
  ON s1.buddy_id = s2.id; 
```

# 5. Combining multiple joins 

<img src="Img/Join5.png" width="400" height="200"> 

What we want to do with multiple tables is tracking the students and their projects each others.

```SQL
SELECT a.title, b.title FROM project_pairs
    JOIN student_projects a
    ON project_pairs.project1_id = a.id
    JOIN student_projects b
    ON project_pairs.project2_id = b.id;
```

# 6. More efficient SQL with query planning and optimization

SQL is a declaritive language - each query declares what we want the SQL engine to do, but it doesn't say how. As it turns out, the how -- the "plan" -- is what effects the efficiency of the queries. 

## 6.1 Why do SQL queries need a plan? 

There are 2 different ways that SQL could find the results : 

- **FULL Table SCAN** : Look at every single row in the table, return the matching rows. 
- **Creating an Index** : Make a copy of the table sorted by author, then do a binary search to find the row where the conditions meets. 

## 6.2 The lifecycle of a SQL query 

1. **Parse** : The query parser makes sure that the query is syntactically correct and semantically correct, and returns errors if not. If it's correct, then it turns it into an algebraric expression and passes it to the next step. 
2. **Optimize** : The query planner and optimizer does the hard thinking work. If first performs straight forward optimizations. It then considers different "qeury plans" which may have different optimizations, estimates the cost (CPU and time) of each query plan based on the number of rows in the relevant tables, then it picks the optimal plan and passes it on to the next step. 
3. **Execute** : The query executor takes the plan and turns it into operations for the database, returning the results back to us if there are any. 

## 6.3 Where do humans come in? 

Many times, especially for complex queries, there are indeed ways we can help optimize a query, and that's known as **query tuning**. 

The first step is to identify what queries we want to tune, which we can figure out by looking at which of our database calls are taking the longest or using the most resources, like with a SQL profiler. 

The next step is to understand how a particular SQL engine is executing a query, and all SQL systems come with a way to ask the engine. In SQLite, we can stick **EXPLAIN QUERY PLAIN** in front of any SQL to see what it's doing behind the scenes. 

Now manual optimization to improve that execution plan is needed. If we use index, then the SQL engine would be able to use that index to efficiently find the matching rows. Creating indexes can often make repeated queriers more efficients. 

# 7. Project : Famous people

```SQL
/* Create table about the people and what they do here */
CREATE TABLE Famous_People (id INTEGER PRIMARY KEY,
first_name TEXT,
last_name TEXT,
birthdate TEXT,
occupation integer);

INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Leonardo", "Dicaprio", "11-11-1974", 1);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Willie", "Nelson", "04-29-1933", 2);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Misty", "Copeland", "09-10-1982", 3);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Doutzen", "Kroes", "01-23-1985", 4);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Jack", "Nicholson", "04-22-1937", 1);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Christoph", "Waltz", "10-04-1956", 1);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Bob", "Fosse", "04-27-1911", 3);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Thom", "Yorke", "10-07-1968", 2);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Kate", "Moss", "01-16-1974", 4);
INSERT INTO Famous_People (first_name, last_Name, birthdate, occupation)
VALUES ("Cary", "Grant", "01-18-1904", 5);


CREATE TABLE Profession (id INTEGER PRIMARY KEY,
title TEXT);

INSERT INTO Profession VALUES (1, "Actor");
INSERT INTO Profession VALUES (2, "Musician");
INSERT INTO Profession VALUES (3, "Dancer");
INSERT INTO Profession VALUES (4, "Model");
INSERT INTO Profession VALUES (5, "Actor");

/* Count the number by profession title */ 
SELECT p.title, COUNT(*) AS number_of_profession
  FROM Famous_People AS fp  
  JOIN Profession AS p 
  ON fp.occupation = p.id
  GROUP BY title; 
```