# Lesson 7: Basic SQL

`ERD = Entity Relationship Diagram`

This describes the relationship between tables in a database



# Writing queries

Every query you write will have at least two parts: `SELECT` and `FROM`. The `SELECT` statement is where you put the columns for which you would like to show the data. The `FROM` statement is where you put the tables from which you would like to pull data. For example:

`SELECT *
FROM orders;`

`SELECT id, account_id, occurrred_at
FROM orders;`

It is considered best practice to put a __semicolon__ at the end of each SQL statement, which also allows multiple commands at once if your environment is able to show multiple results at once.

## `LIMIT` 

Use this to minimise the number of rows returned. Functions like `.head()` in pandas. For example:
    
`SELECT * 
FROM orders
LIMIT 10;`

This will only return 10 rows from the orders table.

## `ORDER BY`

Allows you to order the rows in a column. By default this is in ascending order (`ASC`) which you do not need to specify. If you want descending order, you need to specify (`DESC`).

The `ORDER BY` statement is always after the `SELECT` and `FROM` statements, but it is before the `LIMIT` statement. 

*Example:* Write a query to return the 10 earliest orders in the orders table. Include the id, occurred_at, and total_amt_usd.

`SELECT id, occurred_at, total_amt_usd
FROM orders
ORDER BY occurred_at
LIMIT 10;`

*Example:* Write a query to return the top 5 orders in terms of largest total_amt_usd. Include the id, account_id, and total_amt_usd.

`SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC 
LIMIT 5;`

*Example:* Write a query to return the bottom 20 orders in terms of least total. Include the id, account_id, and total.

`SELECT id, account_id, total
FROM orders
ORDER BY total
LIMIT 20;`


## `ORDER BY` (multiple columns)

The statement sorts according to columns listed from left first and those listed on the right after that. We still have the ability to flip the way we order using `DESC`.

*Example:* Write a query that returns the top 5 rows from orders ordered according to newest to oldest, but with the largest total_amt_usd for each date listed first for each date. 

`SELECT * 
FROM orders
ORDER BY occurred_at DESC, total_amt_usd DESC
LIMIT 5`

*Example:* Write a query that returns the top 10 rows from orders ordered according to oldest to newest, but with the smallest total_amt_usd for each date listed first for each date. 

`SELECT * 
FROM orders
ORDER BY occurred_at, total_amt_usd 
LIMIT 10`

## `WHERE (numeric)`

This is just like an if statement. It let's you filter data. Common symbols to use: `>`, `<`, `>=`, `<=`, `=`, `!=`. 

*Example:* Pull the first 5 rows and all columns from the orders table that have a dollar amount of gloss_amt_usd greater than or equal to 1000.

`SELECT *
FROM orders
WHERE gloss_amt_usd >= 1000
LIMIT 5`

*Example:* Pull the first 10 rows and all columns from the orders table that have a total_amt_usd less than 500.

`SELECT *
FROM orders
WHERE gloss_amt_usd < 500
LIMIT 10`






## `WHERE (non-numeric / strings)`

The `WHERE` statement can also be used with non-numerical data. We can use the = and != operators here. You also need to be sure to use **single quotes** (just be careful if you have quotes in the original text) with the text data. Commonly when we are using `WHERE` with non-numeric data fields, we use the `LIKE`, `NOT`, or `IN` operators. 

*Example:* Filter the accounts table to include the company name, website, and the primary point of contact (primary_poc) for Exxon Mobil in the accounts table.

`SELECT name, website, primary_poc
FROM accounts
WHERE name = 'Exxon Mobil'`


    

## `AS` (for Derived Columns)

Creating a new column that is a combination of two existing ones is called a Derived column. 

*Example:* Create a column that divides the standard_amt_usd by the standard_qty to find the unit price for standard paper for each order. Limit the results to the first 10 orders, and include the id and account_id fields. 

`SELECT id, account_id, standard_amt_usd/standard_qty AS unit_price
FROM orders
LIMIT 10;`


*Example:* Write a query that finds the percentage of revenue that comes from poster paper for each order. You will need to use only the columns that end with _usd. (Try to do this without using the total column). Include the id and account_id fields.

`SELECT id, account_id, 
       poster_amt_usd/(standard_amt_usd + gloss_amt_usd + poster_amt_usd) AS post_per
FROM orders;`

    

## `LIKE`

This allows you to perform operations similar to using `WHERE` and `=`, but for cases when you might not know exactly what you are looking for. You will normally use `LIKE` with a `WHERE` clause. It is also frequently used with **`%`** which is a **wildcard** (* for Excel).

*Example:* All the companies whose names start with 'C'. 

`SELECT name 
FROM accounts
WHERE name LIKE 'C%';`

*Example:* All companies whose names contain the string 'one' somewhere in the name.

`SELECT name 
FROM accounts
WHERE name LIKE '%one%';`

*Example:* All companies whose names end with 's'. 

`SELECT name 
FROM accounts
WHERE name LIKE '%s';`


## `IN`

This allows you to perform operations similar to using `WHERE` and `=`, but for more than one condition. You can use it on text and numeric columns. You can use `OR` but `IN` is much cleaner. **Note:** In most SQL environments, you can use single or double quotation marks - and you may NEED to use double quotation marks if you have an apostrophe within the text you are attempting to pull. Example Macy's in our work space would be 'Macy''s'.

*Example*: Use the accounts table to find the account name, primary_poc, and sales_rep_id for Walmart, Target, and Nordstrom.

`SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name IN ('Walmart', 'Target', 'Nordstrom');`

*Example*: Use the web_events table to find all information regarding individuals who were contacted via the channel of organic or adwords.

`SELECT *
FROM web_events
WHERE channel IN ('organic', 'adwords');`


## `NOT`

This is used with `IN` and `LIKE` to select all of the rows `NOT LIKE` or `NOT IN` a certain condition. 

*Example:* Use the accounts table to find the account name, primary poc, and sales rep id for all stores except Walmart, Target, and Nordstrom.

`SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name NOT IN ('Walmart', 'Target', 'Nordstrom')`

*Example:* Use the web_events table to find all information regarding individuals who were contacted via any method except using organic or adwords methods.

`SELECT *
FROM web_events
WHERE channel NOT IN ('organic', 'adwords')`

*Example*: Use the accounts table to find:

All the companies whose names do not start with 'C'.

`SELECT *
FROM accounts
WHERE name NOT LIKE 'C%'`

All companies whose names do not contain the string 'one' somewhere in the name.

`SELECT *
FROM accounts
WHERE name NOT LIKE '%one%'`


All companies whose names do not end with 's'.

`SELECT *
FROM accounts
WHERE name NOT LIKE '%s'`



## `AND & BETWEEN`

These allow you to combine operations where all combined conditions must be true. It is recommended to use `BETWEEN`, particularly if you're working with data from the same column instead of using logical operators. 

*Example*: Write a query that returns all the orders where the standard_qty is over 1000, the poster_qty is 0, and the gloss_qty is 0.

`SELECT * 
FROM orders
WHERE standard_qty > 1000 AND poster_qty = 0 AND gloss_qty =0;`

*Example*: Using the accounts table find all the companies whose names do not start with 'C' and end with 's'.

`SELECT name 
FROM accounts
WHERE name NOT LIKE 'C%' AND name LIKE '%s';`

*Example*: Use the web_events table to find all information regarding individuals who were contacted via organic or adwords and started their account at any point in 2016 sorted from newest to oldest.

`SELECT *
FROM web_events
WHERE channel IN ('organic', 'adwords') AND occurred_at BETWEEN '2016-01-01' AND '2017-01-01'
ORDER BY occurred_at DESC;`

You will notice that using BETWEEN is tricky for dates! While BETWEEN is generally inclusive of endpoints, it assumes the time is at 00:00:00 (i.e. midnight) for dates. This is the reason why we set the right-side endpoint of the period at '2017-01-01'.



## `OR`

This allows you to combine operations where at least one of the combined conditions must be true. his operator works with all of the operations we have seen so far including arithmetic operators (+, *, -, /), `LIKE`, `IN`, `NOT`, `AND`, and `BETWEEN` logic can all be linked together using the OR operator.

*Example*: Find list of orders ids where either gloss_qty or poster_qty is greater than 4000. Only include the id field in the resulting table.

`SELECT id
FROM orders
WHERE gloss_qty > 4000 OR poster_qty > 4000;`

*Example*: Write a query that returns a list of orders where the standard_qty is zero and either the gloss_qty or poster_qty is over 1000.

`SELECT *
FROM orders
WHERE standard_qty = 0 AND (gloss_qty > 1000 OR poster_qty > 1000);`

*Example*: Find all the company names that start with a 'C' or 'W', and the primary contact contains 'ana' or 'Ana', but it doesn't contain 'eana'.

`SELECT *
FROM accounts
WHERE (name LIKE 'C%' OR name LIKE 'W%') AND ((primary_poc LIKE '%ana%' OR primary_poc LIKE '%Ana%') AND (primary_poc NOT LIKE '%eana%'));`



In [None]:
aa