## Part 2 - The Northwind Database

In [1]:
import sqlite3_helper as sql3

### Create Connection

In [2]:
conn = sql3.create_connection('northwind_small.sqlite3', verbose=True)

Using SQLite version: 2.6.0
Creating Connection to northwind_small.sqlite3...


### Northwind ERD

![](northwind_erd.png)

### Print all tables in database

In [3]:
tables = sql3.get_sql_tables(conn, verbose=True)

-------------- TABLES IN SQLITE3 DATABASE --------------
('Category',)
('Customer',)
('CustomerCustomerDemo',)
('CustomerDemographic',)
('Employee',)
('EmployeeTerritory',)
('Order',)
('OrderDetail',)
('Product',)
('Region',)
('Shipper',)
('Supplier',)
('Territory',)


### Get `Customer` table info

In [4]:
customer_info = sql3.get_table_info(conn, 'Customer', verbose=True)

-------------- COLUMN INFO FOR Customer --------------
cid            name           type           notnull        dflt_value     pk             
0              Id             VARCHAR(8000)  0                             1              
1              CompanyName    VARCHAR(8000)  0                             0              
2              ContactName    VARCHAR(8000)  0                             0              
3              ContactTitle   VARCHAR(8000)  0                             0              
4              Address        VARCHAR(8000)  0                             0              
5              City           VARCHAR(8000)  0                             0              
6              Region         VARCHAR(8000)  0                             0              
7              PostalCode     VARCHAR(8000)  0                             0              
8              Country        VARCHAR(8000)  0                             0              
9              Phone          VARCH

### Get `Product` table info

In [5]:
product_info = sql3.get_table_info(conn, 'Product', verbose=True)

-------------- COLUMN INFO FOR Product --------------
cid            name           type           notnull        dflt_value     pk             
0              Id             INTEGER        0                             1              
1              ProductName    VARCHAR(8000)  0                             0              
2              SupplierId     INTEGER        1                             0              
3              CategoryId     INTEGER        1                             0              
4              QuantityPerUnitVARCHAR(8000)  0                             0              
5              UnitPrice      DECIMAL        1                             0              
6              UnitsInStock   INTEGER        1                             0              
7              UnitsOnOrder   INTEGER        1                             0              
8              ReorderLevel   INTEGER        1                             0              
9              Discontinued   INTEGE

### Get `Employee` table info

In [6]:
employee_info = sql3.get_table_info(conn, 'Employee', verbose=True)

-------------- COLUMN INFO FOR Employee --------------
cid            name           type           notnull        dflt_value     pk             
0              Id             INTEGER        0                             1              
1              LastName       VARCHAR(8000)  0                             0              
2              FirstName      VARCHAR(8000)  0                             0              
3              Title          VARCHAR(8000)  0                             0              
4              TitleOfCourtesyVARCHAR(8000)  0                             0              
5              BirthDate      VARCHAR(8000)  0                             0              
6              HireDate       VARCHAR(8000)  0                             0              
7              Address        VARCHAR(8000)  0                             0              
8              City           VARCHAR(8000)  0                             0              
9              Region         VARCH

### Get `Category` table info

In [7]:
category_info = sql3.get_table_info(conn, 'Category', verbose=True)

-------------- COLUMN INFO FOR Category --------------
cid            name           type           notnull        dflt_value     pk             
0              Id             INTEGER        0                             1              
1              CategoryName   VARCHAR(8000)  0                             0              
2              Description    VARCHAR(8000)  0                             0              


### Answer the following questions (each is from a single table):

#### What are the ten most expensive items (per unit price) in the database?

In [8]:
most_expensive_items_query = """SELECT ProductName, UnitPrice
FROM Product
ORDER BY UnitPrice DESC
LIMIT 10;
"""
results = sql3.select_query(conn, most_expensive_items_query, verbose=True)

('Côte de Blaye', 263.5)
('Thüringer Rostbratwurst', 123.79)
('Mishi Kobe Niku', 97)
("Sir Rodney's Marmalade", 81)
('Carnarvon Tigers', 62.5)
('Raclette Courdavault', 55)
('Manjimup Dried Apples', 53)
('Tarte au sucre', 49.3)
('Ipoh Coffee', 46)
('Rössle Sauerkraut', 45.6)


#### What is the average age of an employee at the time of their hiring? (Hint: a lot of arithmetic works with dates.)

In [9]:
avg_age_employee_at_hiring_query = """SELECT ROUND(AVG(HireDate-BirthDate), 2) as `Average Age of Employee at Hire` 
                                      FROM Employee;"""
results = sql3.select_query(conn, avg_age_employee_at_hiring_query, verbose=True)

(37.22,)


#### (*Stretch*) How does the average age of employee at hire vary by city?

In [10]:
avg_age_employee_at_hire_vary_by_city_query = """SELECT ROUND(AVG(HireDate-BirthDate), 2) as `Average Age of Employee at Hire`, City
FROM Employee
GROUP BY City;"""
results = sql3.select_query(conn, avg_age_employee_at_hire_vary_by_city_query, verbose=True)

(29.0, 'Kirkland')
(32.5, 'London')
(56.0, 'Redmond')
(40.0, 'Seattle')
(40.0, 'Tacoma')


## Part 3 - Sailing the Northwind Seas

#### What are the ten most expensive items (per unit price) in the database and their suppliers?

In [11]:
most_expensive_items_supplier_query = """SELECT ProductName, UnitPrice, CompanyName
FROM Product
JOIN Supplier 
    ON Product.SupplierId = Supplier.Id
ORDER BY UnitPrice DESC
LIMIT 10;"""
results = sql3.select_query(conn, most_expensive_items_supplier_query, verbose=True)

('Côte de Blaye', 263.5, 'Aux joyeux ecclésiastiques')
('Thüringer Rostbratwurst', 123.79, 'Plutzer Lebensmittelgroßmärkte AG')
('Mishi Kobe Niku', 97, 'Tokyo Traders')
("Sir Rodney's Marmalade", 81, 'Specialty Biscuits, Ltd.')
('Carnarvon Tigers', 62.5, 'Pavlova, Ltd.')
('Raclette Courdavault', 55, 'Gai pâturage')
('Manjimup Dried Apples', 53, "G'day, Mate")
('Tarte au sucre', 49.3, "Forêts d'érables")
('Ipoh Coffee', 46, 'Leka Trading')
('Rössle Sauerkraut', 45.6, 'Plutzer Lebensmittelgroßmärkte AG')


#### What is the largest category (by number of unique products in it)?

In [12]:
largest_category_query = """SELECT CategoryName, COUNT(DISTINCT ProductName) AS Count
FROM Category
JOIN Product 
    ON Category.Id = Product.CategoryId
GROUP BY CategoryName
ORDER BY Count DESC
LIMIT 1;"""
results = sql3.select_query(conn, largest_category_query, verbose=True)

('Confections', 13)


#### (*Stretch*) Who's the employee with the most territories? Use TerritoryId (not name, region, or other fields) as the unique identifier for territories.

In [13]:
most_territories_query = """SELECT E.FirstName, E.LastName, COUNT(ET.TerritoryId) AS TerritoryCount 
FROM Employee AS E
JOIN EmployeeTerritory AS ET 
    ON E.Id = ET.EmployeeId
GROUP BY E.Id
ORDER BY TerritoryCount DESC
LIMIT 1;"""
results = sql3.select_query(conn, most_territories_query, verbose=True)

('Robert', 'King', 10)


#### Closing connection

In [14]:
conn.close()

## Part 4 - Questions (and your Answers)

Answer the following questions, baseline ~3-5 sentences each, as if they were interview screening questions (a form you fill when applying for a job):

- In the Northwind database, what is the type of relationship between the Employee and Territory tables?

The relationship between `Employee` and `Territory` is **Many-to-Many (M:M)**, which is resolved using a *junction table* - `EmployeeTerritory` which connects both `Employee` and `Territory` with **two** **One-to-Many (1:M)** relationship. So, we can say `Employee` and `Territory` is pair of **1:M** relationships.

Reference: [Many-to-Many](https://en.wikipedia.org/wiki/Many-to-many_%28data_model%29)

- What is a situation where a document store (like MongoDB) is appropriate, and what is a situation where it is not appropriate?

Document Store like MongoDB store documents as **Key-Value** pairs, they also allow nested **Key-Value** pairs which allows for *flexibility in storing data without preemptively specifying structure*. Which allows for faster prototyping, and is good for situations where you need to rapidly develop and deploy to get quick feedback. As oppose to Relational databases which require a rigid schema structure, NoSQl is schema-free. 

One situation where MongoDB would be appropriate is: When you are a small startup and need to rapidly develop and iterate through prototypes and scale easily, in order to demonstrate functionality to investors, and keep up with increasing demand.

Alternatively, when it's not appropriate: Is when you are a dealing with financial data, like Banks, where mission critical-data demands for high reliability and integrity than scalability. For example, Banks need a Relational database, that has a up-front schema, as oppose schema-free Document Store database.  

- What is "NewSQL", and what is it trying to achieve?

**NewSQL** is a buzz-word that defines a class of RDBMS that seek to provide the scalability of NoSQL databases while maintaining the ACID guarantees of a traditional databases. Goal they are trying to achieve is high scalability of NoSQL databases and relational data model that confirms ACID properties. 