### BASIC SQL STATEMENTS

- **SELECT** and **FROM**
  - Those are required clauses for any statement 
  - SQL is case-insensitive, ';' is required at the end of each statement for "MySQL" but not mandatory for "PostgreSQL"
  -  The LIMIT statement is useful when you want to see just the first few rows of a table. If you are using the `LIMIT` statement, it will always appear last
    
- **ORDER BY**    
  - Always comes in a query after the SELECT and FROM statements, before the LIMIT 
  - DESC can be added after the column in your ORDER BY statement to sort in descending order
  - Default behaviour is to sort in ascending order.
  - When you provide a list of columns in an ORDER BY command, the sorting occurs using the leftmost column in your list first, then the next column from the left, and so on
    
- **WHERE**
  - Common symbols used in WHERE statements include:
  - `>` (greater than) `<` (less than)
  - `>=` (greater than or equal to) `<=` (less than or equal to)
  - `=` (equal to) `!=` (not equal to)
  - This statement can also be used with non-numeric data ( we use the `LIKE`, `NOT`, or `IN` operators ) 
  - SQL requires single-quotes, not double-quotes, around text values `WHERE sample_column_name= 'Sample' ` 

- **Sample Code** 
```SQL
    SELECT * 
    FROM table_name
    ORDER BY column_name1,column_name2
    WHERE condition
    LIMIT 10
```

### Arithmetic Operators
- **Derived Column**
 - new column that is a combination of existing columns (aka "calculated" or "computed" column ) 
 - mathematical operators : `*` (Multiplication) `+` (Addition) `-` (Subtraction) `/` (Division)
 - use `AS` to define new column name 
 
 - **Sample Code** 
```SQL
     SELECT (standard_usd/total_usd)*100 AS std_percent 
     FROM orders
     LIMIT 10;
```

###  Logical Operators

1. LIKE 
 - This allows you to perform operations similar to using WHERE and =, but for cases when you might not know exactly what you are looking for.
 - requires "wild cards" in form of `%` ; to represent a character or a number of characters 
 - for Postgres you will need to use single quotes for the text you pass to the `LIKE` operator
 - So lower and uppercase letters are not the same within the string

```SQL
SELECT name
FROM accounts
WHERE name LIKE '%one%';
```
2. IN
 - This allows you to perform operations similar to using WHERE and =, but for more than one condition.
 - Check one, two or many column values to pull data,  all within the same query
 
```SQL
SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name IN ('Walmart', 'Target', 'Nordstrom');
```
 
3. NOT
 - This is used with IN and LIKE to select all of the rows NOT LIKE or NOT IN a certain condition.
 - Comes before and provides Inverse Results for `IN`, `LIKE` and similar operators
 
```SQL

SELECT *
FROM web_events
WHERE channel NOT IN ('organic', 'adwords');

SELECT name
FROM accounts
WHERE name NOT LIKE '%one%';

```

4. AND & BETWEEN
 - These allow you to combine operations where all combined conditions must be true.
 - `AND` operator runs two seperate filters so prefer `BETWEEN` whene filtering the same column for different parts
 - the BETWEEN operator in SQL is inclusive; that is, the endpoint values are included.

```SQL
SELECT *
FROM web_events
WHERE channel IN ('organic', 'adwords') AND occurred_at BETWEEN '2016-01-01' AND '2017-01-01'
ORDER BY occurred_at DESC;
```
**Reminder:** While `BETWEEN` is generally inclusive of endpoints, it assumes the time is at 00:00:00 (i.e. midnight) for dates

5. OR
 - This allow you to combine operations where at least one of the combined conditions must be true.
 - arithmetic operators (+, *, -, /), and logical operators ( LIKE, IN, NOT, AND, and BETWEEN ) can all be linked   
   together using the OR operator.
 
 
```SQL 
SELECT *
FROM accounts
WHERE (name LIKE 'C%' OR name LIKE 'W%') 
           AND ((primary_poc LIKE '%ana%' OR primary_poc LIKE '%Ana%') 
           AND primary_poc NOT LIKE '%eana%')
```

6. IS
 - Mostly used with `NULL` ; since Null is not a value but a property of data prefer `IS` instead of `=`

 - **Sample Code** 
 
```SQL
SELECT *
FROM table_name
WHERE channel IN ('condition1', 'condition2')
      AND
	  time_stamp BETWEEN '2016-01-01T00:00:00.000Z' AND '2016-12-31T23:59:59.000Z'
  ### AND time_stamp BETWEEN '2016-01-01' AND '2017-01-01'
ORDER BY occurred_at DESC;
ORDER BY time_stamp DESC
limit 30
```

###  SQL CheatSheet

| Statement   | How to Use It                 | Other Details | 		
| ----------- | -----------                   | -----------   |
| SELECT      | SELECT Col1, Col2, ...        |Provide the columns you want|
| FROM        | FROM Table                    |Provide the table where the columns exist|
| LIMIT       | LIMIT 10                      |Limits based number of rows returned|
| ORDER BY    | ORDER BY Col                  |Orders table based on the column. Used with DESC.|
| WHERE       | WHERE Col > 5                 |A conditional statement to filter your results|
| LIKE        | WHERE Col LIKE '%me%'         |Only pulls rows where column has 'me' within the text|
| IN          | WHERE Col IN ('Y', 'N')       |A filter for only rows with column of 'Y' or 'N'|
| NOT         | WHERE Col NOT IN ('Y', 'N')   |NOT is frequently used with LIKE and IN|
| AND         |WHERE Col1 > 5 AND Col2 < 3    |Filter rows where two or more conditions must be true|
| OR          | WHERE Col1 > 5 OR Col2 < 3    |Filter rows where at least one condition must be true|
| BETWEEN     | WHERE Col BETWEEN 3 AND 5     |Often easier syntax than using an AND|


### JOINS 
- JOIN Clause
  - Tells query an additional table from which data will be pulled
  - `ON` clause specifies a logical statement to combine the table in from and join statements 
  
**Sample Code**   
```SQL
SELECT source_table.*, target_table.*
FROM source_table
JOIN target_table
ON source_table.source_id = target_table.target_id;

SELECT orders.standard_qty, orders.gloss_qty, 
       orders.poster_qty,  accounts.website, 
       accounts.primary_poc
FROM orders
JOIN accounts
ON orders.account_id = accounts.id
```


####  Entity Relationship Diagrams - ERD

- While a database is a collection of tables that share connected data stored in a computer, an entity relationship diagram (ERD) is a common way to view data in a database. Below is the ERD for the database we will use from Parch & Posey. These diagrams help you visualize the data you are analyzing including:

  - The names of the tables.
  - The columns in each table.
  - The way the tables work together.

- A **primary key - PK**   exists in every table, and it is a column that has a unique value for every row. It is common that the primary key is the first column in our tables in most databases

- A **foreign key - FK** is a column in one table that is a primary key in a different table. Primary-foreign key link connects these tables.

- FK can actually appear in many rows , it doesn't have to be unique. Foreign keys are always associated with a primary key, and they are associated with the crow-foot notation above to show they can appear multiple times in a particular table.

![Entity Relationship Diagrams - ERD](https://video.udacity-data.com/topher/2017/October/59e946e7_erd/erd.png)



 
**JOIN more than Two Tables** 
  
```SQL
SELECT *
FROM web_events
JOIN accounts
ON web_events.account_id = accounts.id
JOIN orders
ON accounts.id = orders.account_id
```
  - The ON statement should always occur with the foreign key being equal to the primary key.

  - JOIN statements allow us to pull data from multiple tables in a SQL database.

**aliases** are used instead of long table names or columns

  - We can simply write our alias directly after the column name (in the SELECT) or table name (in the FROM or JOIN) by writing the alias directly following the column or table we would like to alias. 
  
  - This will allow you to create clear column names even if calculations are used to create the column, and you can be more efficient with your code by aliasing table name
  
  - you might also see these statements without the AS statement. Each of the above could be written in the following way instead, and they would still produce the exact same results:
  
   - #columns
  ```SQL  
  Select t1.column1 aliasname, t2.column2 aliasname2
  FROM tablename AS t1
  JOIN tablename2 AS t2
 ```
 
   - #with `AS` 
  ```SQL 
  FROM tablename t1
  JOIN tablename2 t2
  #without `AS` 
  SELECT col1 + col2 AS total, col3
  ```

**Sample Code** 
```SQL
SELECT r.name region, a.name account, 
       o.total_amt_usd/(o.total + 0.01) unit_price
FROM region r
JOIN sales_reps s
ON s.region_id = r.id
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id;
```


**INNER vs OUTER JOIN** 

- `LEFT OUTER JOIN` or `LEFT JOIN` returns all rows from the left table, even if there are no matches in the right table => frequently used by community


- `RIGHT OUTER JOIN` or `RIGHT JOIN`  returns all rows from the right table, even if there are no matches in the left table => not preferred by community


   - A LEFT JOIN and RIGHT JOIN do the same thing if we change the tables that are in the FROM and JOIN statements.
   - A LEFT JOIN will at least return all the rows that are in an INNER JOIN. 
   - JOIN and INNER JOIN are the same.
   - A LEFT OUTER JOIN is the same as LEFT JOIN.
   
   
- `FULL OUTER JOIN` or `OUTER JOIN` will return the inner join result set, as well as any unmatched rows from either of the two tables being joined 


![inner-join-left-join-right-join-and-full-join](https://i.stack.imgur.com/VQ5XP.png)

**JOINS and FILTERING**

=> When the database executes this query, it executes the join and everything in the ON clause first. Think of this as building the new result set. That result set is then filtered using the WHERE clause.

=> To filter before join and moving this filter to the ON clause is possible by using AND clause. It's almost like joining to a newly created, filtered table. 

=> Those will produce different results. Because inner joins only return the rows for which the two tables match, moving this filter to the ON clause of an inner join will produce the same result as keeping it in the WHERE clause. 

```SQL
SELECT r.name as region,
       acc.name as account,
       (total_amt_usd/(total+0.01)) as unitprice
FROM orders o
JOIN accounts acc
ON o.account_id=acc.id 
   AND o.standard_qty>100 
JOIN sales_reps sr 
ON sr.id=acc.sales_rep_id
JOIN region r
ON r.id=sr.region_id
   AND sr.name LIKE 'A%'
```

**SELECT DISTINCT**
```SQL
SELECT DISTINCT a.name, w.channel
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
WHERE a.id = '1001';
```

**TIME COMPARISON**
```SQL
SELECT o.occurred_at, a.name, o.total, o.total_amt_usd
FROM accounts a
JOIN orders o
ON o.account_id = a.id
WHERE o.occurred_at BETWEEN '01-01-2015' AND '01-01-2016'
ORDER BY o.occurred_at DESC;
```

**Reminder:**  If you have two or more columns in your SELECT that have the same name after the table name such as accounts.name and sales_reps.name you will need to alias them. Otherwise it will only show one of the columns. You can alias them like accounts.name AS AcountName, sales_rep.name AS SalesRepName



### NULLS 

`NULLs` are a datatype that specifies where no data exists in SQL. They are often ignored in aggregation functions.
 -  different than a zero - they are cells where data does not exist.
 - When identifying `NULLs` in a `WHERE` clause, `IS NULL` or `IS NOT NULL` are used instead of `=`, because NULL isn't considered as a value in SQL. Rather, it is a property of the data.
 



### SQL AGGREGATIONS

**1. COUNT :**  the Number of Rows in a Table
  - looks for non-null data in specified column. 
  - can be used for non-numeric data also; just looks for non-null data
  - `COUNT` does not consider rows that have NULL values

```SQL
    SELECT COUNT(*) as total_rows
    FROM table;
    
    SELECT COUNT(table.id) as total_rows
    FROM table;
```
  
**2. SUM :**
  - requires column name - * doesn't work  
  - can only be used for numeric data.
  - NULLs are treated as zero
  
```SQL
    SELECT SUM(standard_amt_usd) as total_cost, 
           SUM(standard_qty) as total_amount,
           SUM(standard_amt_usd)/SUM(standard_qty) as avg_per_unit
    FROM ORDERS 
```  

**3. MIN and MAX:**
  - `NULL` values are ignored 
  - They can be used on non-numerical columns. Depending on the column type, `MIN` will return the lowest number, earliest date, or non-numerical value as early in the alphabet as possible.
  - `MAX` does the opposite—it returns the highest number, the latest date, or the non-numerical value closest alphabetically to “Z.

**4. AVG:** 
   - the sum of all of the values in the column divided by the number of values in a column
   - only on numerical
   - ignores NULLs. ( if nulls are zero, use COUNT/SUM instead ) 
   - a median might be a more appropriate measure of center for some sorts of data
   
```SQL
   #MIN MAX AVERAGE
   SELECT AVG(standard_qty) mean_standard, 
       MIN(standard_qty) min_standard, 
       MAX(standard_qty) max_standard
   FROM orders;
   
   #MEDIAN
   SELECT *
   FROM (SELECT total_amt_usd
         FROM orders
         ORDER BY total_amt_usd
         LIMIT 3457) AS Table1
   ORDER BY total_amt_usd DESC
   LIMIT 2
```     

  
**5. GROUP BY** 
   - can be used to aggregate data within subsets of the data. 
   - Any column in the SELECT statement that is not within an aggregator must be in the GROUP BY clause. 
   - The `GROUP BY` always goes between `WHERE` and `ORDER BY`.
   - SQL evaluates the aggregations before the LIMIT clause.
   
```SQL
   SELECT a.name, SUM(total_amt_usd) total_sales
   FROM orders o
   JOIN accounts a
   ON a.id = o.account_id
   GROUP BY a.name;
```

   - `GROUP BY` and `ORDER BY` can be used with multiple columns
   
     - The order of column names in your GROUP BY clause doesn’t matter—the results will be the same regardless. If we run the same query and reverse the order in the GROUP BY clause, you can see we get the same results.
     - The order of columns listed in the ORDER BY clause does make a difference. You are ordering the columns from left to right.
     - A reminder here that any column that is not within an aggregation must show up in your GROUP BY statement. 
     - As with ORDER BY, you can substitute numbers for column names in the GROUP BY clause. It’s generally recommended to do this only when you’re grouping many columns, or if something else is causing the text in the GROUP BY clause to be excessively long.

```SQL
   SELECT s.name, w.channel, COUNT(*) num_events
   FROM accounts a
   JOIN web_events w
   ON a.id = w.account_id
   JOIN sales_reps s
   ON s.id = a.sales_rep_id
   GROUP BY s.name, w.channel
   ORDER BY num_events DESC;
```   
     
**6. DISTINCT**
    - provides the unique rows for all columns written in the SELECT statement.
    - you only use DISTINCT once in any particular SELECT statement.

```SQL
SELECT DISTINCT a.name, w.channel
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
WHERE a.id = '1001';
```


**7. HAVING** 

 - `WHERE` clause doesnt work on aggregate columns - you need to use `HAVING` instead 

 - `WHERE SUM(total_usd)` is not possible so use  `HAVING SUM(total_usd)` 
    
```SQL
SELECT s.id, s.name, COUNT(*) num_accounts
FROM accounts a
JOIN sales_reps s
ON s.id = a.sales_rep_id
GROUP BY s.id, s.name
HAVING COUNT(*) > 5
ORDER BY num_accounts;
```
    - WHERE subsets the returned data based on a logical condition.
    - WHERE appears after the FROM, JOIN, and ON clauses, but before GROUP BY.
    - HAVING appears after the GROUP BY clause, but before the ORDER BY clause.
    - HAVING is like WHERE, but it works on logical statements involving aggregations.

```
SELECT a.id, a.name, w.channel, COUNT(*) use_of_channel
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
GROUP BY a.id, a.name, w.channel
HAVING COUNT(*) > 6 AND w.channel = 'facebook'
ORDER BY use_of_channel;
```

      
**8. DATE FUNCTIONS**

   - IN SQL; Dates are stored in year, month, day, hour, minute, second, which helps in truncating
   - `DATE_TRUNC` allows you to truncate your date to a particular part of your date-time column. Common trunctions        are day, month, and year.  
   - `DATE_PART` can be useful for pulling a specific portion of a date, but notice pulling month or day of the week  (dow) means that you are no longer keeping the years in order. Rather you are grouping for certain components  regardless of which year they belonged in.
   
```SQL
SELECT DATE_PART('year', occurred_at) ord_year,  SUM(total_amt_usd) total_spent
FROM orders
GROUP BY 1

SELECT DATE_PART('month', occurred_at) ord_month, COUNT(*) total_sales
FROM orders
WHERE occurred_at BETWEEN '2014-01-01' AND '2017-01-01'
GROUP BY 1
```

```SQL
SELECT DATE_TRUNC('month', o.occurred_at) ord_date, 
       SUM(o.gloss_amt_usd) tot_spent
FROM orders o 
JOIN accounts a
ON a.id = o.account_id
WHERE a.name = 'Walmart'
GROUP BY 1
```

### CASE STATEMENTS

 - The CASE statement always goes in the SELECT clause.
 - CASE must include the following components: WHEN, THEN, and END. ELSE is an optional component to catch cases that    didn’t meet any of the other previous CASE conditions.
 - You can make any conditional statement using any conditional operator between WHEN and THEN. This includes   
   stringing together multiple conditional statements using AND and OR.
 - You can include multiple WHEN statements, as well as an ELSE statement again, to deal with any unaddressed conditions
 -  using a WHERE clause returns only one set of data whereas it's possible to have separate rows with CASE clause

=> skip division by zero 
```SQL
SELECT account_id, 
       CASE WHEN standard_qty = 0 OR standard_qty IS NULL THEN 0
       ELSE standard_amt_usd/standard_qty END AS unit_price
FROM orders
LIMIT 10
```
=> used by aggregation 
```SQL
SELECT s.name, COUNT(*), SUM(o.total_amt_usd) total_spent, 
     CASE WHEN COUNT(*) > 200 OR SUM(o.total_amt_usd) > 750000 THEN 'top'
     WHEN COUNT(*) > 150 OR SUM(o.total_amt_usd) > 500000 THEN 'middle'
     ELSE 'low' END AS sales_rep_level
FROM orders o
JOIN accounts a
ON o.account_id = a.id 
JOIN sales_reps s
ON s.id = a.sales_rep_id
GROUP BY s.name
ORDER BY 3 DESC
```
