# Entity Relationship Diagrams (ERD)

ERD is a diagram shows how data is structured in a realtional database that tells:
- The names of the tables.
- The columns in each table.
- The way the tables work together (one to one, one to many relationships)


# Why SQL


There are some major advantages to using traditional relational databases, which we interact with using SQL. The five most apparent are:

- SQL is easy to understand.
- Traditional databases allow us to access data directly.
- Traditional databases allow us to audit and replicate our data.
- SQL is a great tool for analyzing multiple tables at once.
- SQL allows you to analyze more complex questions than dashboard tools like Google Analytics.


# SQL vs. NoSQL

NoSQL environments tend to be particularly popular for web based data, but less popular for data that lives in spreadsheets the way we have been analyzing data up to this point. 

- One of the most popular NoSQL languages is called MongoDB. 

- Udacity Free Course: [Data Wrangling with MondgoDB](https://www.udacity.com/course/data-wrangling-with-mongodb--ud032)


# Types of Databases


[Understand RDBMS and Differences among SQLite, MySQL and PostgreSQL](https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems)


# Major Statements

- `CREATE TABLE`
- `DROP TABLE`
- `SELECT` - focus of this course to query data


# Basic Statements

- `SELECT FROM`
- `LIMIT`
- `ORDER BY` / `ORDER BY <col> DESC`


## Practices

- Write a query to return the top 5 orders in terms of largest `total_amt_usd`.

```SQL
SELECT id, occurred_at, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC
LIMIT 5
```

- Return the orders sorted by the largest `total_amt_usd` and then `account_id` alphabetically

```SQL
SELECT id, occurred_at, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC, account_id
```


# SQL with Conditions

### `WHERE` Clause

```SQL
SELECT *
FROM orders
WHERE gloss_amt_usd >= 1000
ORDER BY gloss_amt_usd
LIMIT 5;
```

### Logical Operators

#### `=` or `!=` 

`=` or `!=` can be used with numeric and non-numeric values, for example:

```SQL
SELECT name, website, primary_poc
FROM accounts
WHERE name='Exxon Mobile';
```
> remember that SQL requires single-quotes, not double-quotes, around text values.


#### `LIKE`, often used with `wild card`

`LIKE` is often used with wildcard `%`, which tells us that we might want any number of characters leading up to a particular set of characters or following a certain set of characters, 
- e.g. `%google%` is to search any text containing `"google"`

Find the companies whose names start with 'C':

```SQL
SELECT name
FROM accounts
WHERE name LIKE 'C%'
```
> [SQL wild card quick reference](https://www.w3schools.com/sql/sql_wildcards.asp) 

#### `IN`

`IN` can be used with both numeric and non-numeric data

Find the account name, primary_poc, and sales_rep_id for Walmart, Target, and Macy's.
```SQL
SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name IN('Walmart', 'Target', 'Macy"s')
```

Find all information from web_event where account_id is either 1001 or 1021.
```SQL
SELECT *
FROM web_events
WHERE account_id IN(1001, 1021)
```

#### `NOT`

`NOT` are often used with `IN` `LIKE`
- For example, `WHERE url NOT LIKE '%google%'` is to search for all sites other than Google.

Find name, primary_poc and sales_rep_id for those companies other than Walmart, Target and Nordstrom.
```SQL
SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name NOT IN('Walmart', 'Target', 'NORDSTROM')
```

Find all the companies whose names do not start with 'C'.
```SQL
SELECT name
FROM accounts
WHERE name NOT LIKE 'C%'
```

#### `BETWEEN ... AND`

`BETWEEN ... AND` is used when we try to select a range of value in the same column
- Note that the result is equivalent to `>= and <=`.
- For example, 

```SQL
WHERE column >= 6 AND column <= 10
```
    shall be better written as:

```SQL
WHERE column BEWEEN 6 AND 10
```


### Connecting multiple statements

#### `AND`

The `AND` operator is used within a `WHERE` statement to consider more than one logical clause at a time.  


Returns all the orders where the standard_qty is over 1000, the poster_qty is 0, and the gloss_qty is 0
```SQL
SELECT *
FROM orders
WHERE standard_qty > 1000 AND poster_qty = 0 AND gloss_qty = 0
```

Find all the companies whose names do not start with 'C' and end with 's'.
```SQL
SELECT name
FROM accounts
WHERE name NOT LIKE 'C%' AND name LIKE '%s'
```

Find all information regarding individuals who were contacted via the organic or adwords channels, and started their account at any point in 2016, sorted from newest to oldest.
> Note that `occurred_at` is timestamp in the format such as `2015-12-31T11:01:11.000Z`, so need to use single quote for `'2016-01-01'` in comparison. 

```SQL
SELECT *
FROM web_events
WHERE channel IN('organic', 'adwords') AND occurred_at > '2016-01-01' 
ORDER BY occurred_at DESC
```


#### `OR`

When combining multiple of these operations, we frequently might need to use parentheses to assure that logic we want to perform is being executed correctly, for example:

```SQL
WHERE (standard_qty=0 OR gloss_qty=0 OR poster_qty=0)
AND occured_at >= '2016-01-01'
```

Returns a list of orders where the standard_qty is zero and either the gloss_qty or poster_qty is over 1000:

```SQL
SELECT id
FROM orders
WHERE standard_qty =0 AND (gloss_qty > 1000 OR poster_qty > 1000)
```

Find all the company names that start with a 'C' or 'W', and the primary contact contains 'ana' or 'Ana', but it doesn't contain 'eana'.
```SQL
SELECT name
FROM accounts
WHERE (name LIKE 'C%' OR name LIKE 'W%') 
AND ((primary_poc LIKE '%ana%' OR primary_poc LIKE '%Ana%')
AND primary_poc NOT LIKE '%eana%')
```




# Derived column (manipulated with simple arithmetic opererations)

```SQL
SELECT id, account_id, standard_amt_usd/standard_qty AS unit_price
FROM orders
LIMIT 10
```

```SQL
SELECT id, account_id, poster_amt_usd/total_amt_usd AS percnt_poster
FROM orders
```