# Advanced SQL for Data Scientists (PostgreSQL)

## Chapter 1: Overview

### Data Management Operations
* Linking data from different data stores
* Filtering and reformatting data for different uses
* Aggregating data to provide 'big picture' summaries
* Answering specific questions about business operations

### Data Sources
* Relational Databases
* NoSQL Database:
    * Non-relational databases: NoSQL
* Manually Managed Data

### Types of SQL Commands

```sql
CREATE TABLE address_book (
    pk_id PRIMARY KEY, 
    first_name VARCHAR(20),
    last_name VARCHAR (20),
    address VARCHAR (200)
    
CREATE TABLE order_book (
    order_id PRIMARY KEY, 
    f_pk_id VARCHAR(20),
    item VARCHAR (20),
    quantity INTEGER
```
#### Data Manipulation
  * Used to consolidate/clean/view information prior to using
  * Common Commands:
    * INSERT: Adds one row to a table
    * UPDATE: Updates row(s) with a new value(s)
    * DELETE: Deletes row(s) from a table
    * SELECT: Selects information from a database

#### Data Definition
  * Used to define structures for organizing data in a database
  * Data Structures: 
    * Tables: Collections of related data records
    * Indexes: Sets of data about the locations of records
    * Views: Repeated information derived from one or more tables
        * Helps showcase/retrieve data without unnecessary results
    * Schemas: Collections of all of the above
 * Common Commands:
   * CREATE TABLE: Defined above
   * CREATE INDEX: Builds an index to quickly look up rows and tables
```sql
CREATE INDEX idx_last_name
    ON address_book
    USING (last_name)
```
    * CREATE VIEW: Defined above
```sql
CREATE VIEW comb_sales AS
    SELECT (ob.quantity, ob.item, ab.first_name)
    FROM order_book ob
    LEFT JOIN address_book ab
    ON ab.pk_id = ob.f_pk_id
```
  * CREATE SCHEMA: Create a space to organize related structures

```sql
  CREATE SCHEMA data_sci
```

```
  ____________Schema___________
|                              |
|   ___________                |
|  | Table     |               |
|  |  A Index  |               |
|  |  B Index  |               |
|  |___________|  ----> View   |
|  | Table     |               |
|  |  C Index  |               |
|  |  D Index  |               |
|  |___________|               |
|                              |
|______________________________|

```

---

## Chapter 2: Basis Statistics

* Example Table Schema
```sql
CREATE TABLE staff (
    id PRIMARY KEY, 
    last_name VARCHAR(100),
    email VARCHAR (200),
    gender VARCHAR (10),
    department VARCHAR (100),
    start_date DATE,
    salary INTEGER,
    job_title VARCHAR (100),
    region_id INTEGER
)
```

### Aggregate Function Fun

* How many people does each department have?
```sql
SELECT
    department, 
    COUNT(department)
FROM
    staff
GROUP BY
    department
```

* What is the salary of the highest paid employee?
```sql
SELECT
    MAX(salary)
FROM staff
```

* What is the highest paying salary per department?
```sql
SELECT
    department,
    MAX(salary)
FROM
    staff
GROUP BY
    department
```

### Statistical Function Fun
* What is the average salary paid per employee in each department?
```sql
SELECT department, avg(salary) FROM staff GROUP BY department
```

* Do the above with variance and stdevs for the spread!
```sql
SELECT
    department,
    avg(salary),
    var_pop(salary),
    stddev_pop(salary)
FROM
    staff
GROUP BY
    department
```

### Filtering and Grouping Fun
* What are the top 10, highest-paying employees between 50,000 and 100,000 that are not in Grocery?

```sql
SELECT
	last_name, department, salary
FROM
	staff 
WHERE 
    (salary < 100000) AND
    (salary > 50000) AND
    (department != 'Grocery')
ORDER BY
	salary DESC LIMIT 10
```

---

### Chapter 3: Data Manipulation

* Example Table Schema
```sql
CREATE TABLE staff (
    id PRIMARY KEY, 
    last_name VARCHAR(100),
    email VARCHAR (200),
    gender VARCHAR (10),
    department VARCHAR (100),
    start_date DATE,
    salary INTEGER,
    job_title VARCHAR (100),
    region_id INTEGER
)
```

#### Reformatting Character Data

* Return all unique departments in upper case
```sql
SELECT
    DISTINCT UPPER(department)
FROM
    staff
```

* Return all unique department - job title combinations in lower case
```sql
SELECT
	DISTINCT(LOWER(department || ' - ' || job_title))
FROM
	staff
```

#### Filtering Data
* Create a table that lists the job title and if it has 'Assistant' in its name
```sql
SELECT
	DISTINCT job_title,
    (job_title LIKE '%Assistant%')
FROM
	staff
```

* Get the second unique word from all job titles whose job title includes 'Assistant'
```sql
SELECT
	TRIM(SUBSTRING(job_title, 10), '')
FROM
	staff
WHERE
	job_title LIKE 'Assistant%'
```

* Replace all job titles whose first name is 'Assistant' with 'Asst.'
```sql
SELECT
	OVERLAY(job_title PLACING 'Asst.' FROM 1 FOR 9)
FROM
	staff
WHERE
	job_title LIKE 'Assistant%'
```

* Find all job titles with Assistant and the levels III or IV via regex'
```sql
SELECT
	job_title
FROM
	staff
WHERE
	job_title SIMILAR TO 'Assistant%(III|IV)'
```

* Find all job titles starting with E, P, or S
```sql
SELECT
  job_title
FROM
  staff
WHERE
  job_title SIMILAR TO '[E,P,S]%'
```

#### Reformatting Numbers
* Get the average salary from each department rounded to 2 decimal places. 
```sql
SELECT
    department,
    ROUND(AVG(salary), 2)
FROM
    staff
GROUP BY
    department
```

* Get the average salary from each department with no decimal places and without rounding. 
```sql
SELECT
    department,
    TRUNC(AVG(salary), 2)
FROM
    staff
GROUP BY
    department
---