## Final Project Submission
* Student name: Pedro Jofre Lora
* Student pace: self paced
* Scheduled project review date/time: 
* Instructor name: Eli Thomas
* Blog post URL: 

### The Deliverables
The goal of your project is to query the database to get the data needed to perform a statistical analysis. In this statistical analysis, you'll need to perform a hypothesis test (or perhaps several) to answer the following question:

Do discounts have a statistically significant effect on the number of products customers order? If so, at what level(s) of discount?

In addition to answering this question with a hypothesis test, you will also need to come up with at least 3 other hypotheses to test on your own. These can by anything that you think could be imporant information for the company.

For this hypothesis, be sure to specify both the null hypothesis and the alternative hypothesis for your question. You should also specify if this is one-tail or a two-tail test

# Making recommendations to the Northwind Trading Company to improve performance based on hypothesis testing using frequentist and bayesian methods. 
### Table of Contents
1. [Introduction](#1) <br>
2. [Previewing Data](#2) <br>
3. [Forming Questions for Analysis](#3) <br>
4. [Modeling Data](#4) <br>
5. [Interpreting Data](#5)

<a id="1"></a>
## 1. Introduction

<a id="2"></a>
## 2. Previewing Data
Given that Northwind Trading Company is a fictitious entity, I need to look at the example data that is stored in the database in order to orient myself and begin to ask meaningful questions. I will look at the first 10 or so entries of every table in the database without being selective of the data. The function will also let me look at some of the metadata and the properties of the dataframe.

In [21]:
import pandas as pd
import sqlite3 as sql
import numpy as np
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
from ipywidgets import interact, interactive
# I'll learn to use plotly in this project since there's a lot of talk about how much more powerful it is than matplotlib.

In [69]:
# Open a connection to the sql server. I'll use sqlite3 for now, but I may switch to another sql module later if I need it.
connect = sql.connect('Northwind_small.sqlite')
c = connect.cursor()
# Get table names
raw = c.execute("""select name from sqlite_master where type = 'table'""").fetchall()
tables = []
for table in raw:
    tables.append(table[0])

In [72]:
@interact
def table_preview(Table = tables, Preview = ['Head','Description', 'Shape','Unique','Types']):
    statement = "SELECT * FROM [" + Table + "]"
    df = pd.read_sql_query(statement, connect)
    preview_return = {'Head':df.head(10), 'Description':df.describe(), 
                      'Shape':df.shape, 'Unique':df.nunique(axis = 0), 'Types':df.dtypes}
    out = preview_return[Preview]
    return out

interactive(children=(Dropdown(description='Table', options=('Employee', 'Category', 'Customer', 'Shipper', 'S…

The following is an overview of the available, pertinent data:
*  **Employees**
    - There are 9 employees. 5 are in the USA, and 4 are in the UK
    - The notes include information about education
    - The hire date is likely incorrect, since the data is in the future (2024) and the order dates are in 2012.
*  **Categories**
    - Northwind Trading Company trades in food products
    - There are 8 categories of food products
*  **Customers**
    - There are 91 customers
    - Contact Title may be interesting to explore (e.g. President, Sales Rep). There are 12 unique values
    - Region and Country may also be interesting to explore. There are 9 regions and 21 countries
*  **Shipper**
    - There are three shippers
*  **Suppliers**
    - There are 29 suppliers
    - The suppliers are scattered globally in 11 distinct regions
*  **Orders**
    - There are 830 orders
    - Only 89 customers have made purchases
    - There are only 387 confirmed ship dates. This is concerning...
*  **Products**
    - There are 77 unique products, 2 of which are discontinued
    - 62 products have unit prices
*  **Order Details**
    - There are a total of 2155 unique product orders. These represent the products that were ordered in the 830 customer orders.
    - The discount information resides in the Order Details
    - There are more Unit Prices than unique products, which is a signal that either 1) customers receive different unit prices, or 2) Unit Prices include the applied discount
*  **Territories and Regions**
    - There are 4 nondescript territories, and 52 unique regions.
    - Only 49 territories are assigned to employees
*  **There Are No Customer Demographic Data**
    


<a id="3"></a>
## 3. Forming Questions for Analysis

Let's assume that we're consultants tasked with improving the performance of the Northwind Trading Company by giving broad recommendations about new procedures to implement. These procedures could involve anything from the rate and frequency of discounts, to the distribution of employees to customers. In order to give recommendations, we should look to determine what differences, if any, exist between a set of conditions. We can, and should, try to control for other factors whenever possible.

It is helpful to identify the metrics by which to measure success in order to narrow the field of questions that can be asked. The important metrics of success, or performance, are the following listed from most to least important:
1. Gross Profit (Greater is better)
2. Gross Profit Margin (Greater is better)
3. Order Regularity as measured by variability (Lesser is better)
4. Order Volume (Greater is better)
5. Customer Satisfaction as measured by a synthetic analog (e.g. shipping times) (Greater is Better)
6. Employee Productivity as measured by a synthetic analog (e.g. profit normalized by number of assigned customers) (Greater is Better)

We must assume that the order of importance listed above aligns with the Northwind Trading Company. In reality, the metrics of success would likely be codeveloped between the client and the consultant to ensure that the expectations about the results are congruent between both parties. The consultant brings information into the equation that the client may be blind to (e.g. knowledge from other industries, best practices, etc), just as the client brings information into the equation that the consultant simply cannot know (e.g. the values of the business, the vision, etc). The above list represents a likely way that a corporate entity would prioritize success.

Having defined the metrics that can be measured, it is now possible to begin to ask questions that will likely modify those metrics. Some questions are listed below in association with their metric. This is not an exhaustive list, and again some of these questions would be coconstructed with the client:

1. Gross Profit
    - Do discounts improve the profit of an order? 
    - Does any one category produce more profit than every other on a per order basis? 
    - Does any one employee produce more profit than every other on a per order basis?
    - Are there month(s) of the year that produce more profit?
    - Does a customer with a higher corporate rank (e.g. President vs Associate) produce more profit?
2. Gross Profit Margin
    - Does any one category produce a higher profit margin than every other after accounting for order quantities?
    - Does any one supplier have a higher profit margin than every other supplier?
3. Order Regularity
    - Does any one employee have more order regularity than every other employee?
    - Is any one product category more regular than every other?
    - Does any one region order with more regularity than every other region?
4. Order Volume
    - Do discounts increase an order volume?
        - Are discounts more successful in certain regions at increasing order volume?
    - Does any one customer have a higher order volume more than every other?
    - Is any one supplier's foods favored?
5. Customer Satisfaction
    - Does customer satisfaction impact gross profits?
6. Employee Productivity
    - Does any one employee produce more orders after normalizing for their number of customers?
   

Each of the above questions can be followed up by asking, "if so, to what extent is there a difference?" It is important to determine that there is, in fact, a statistical difference in the first place.

For the sake of this notebook, I will attend to three of the above questions to show the kind of work that is necessary in order to perform the full analysis. One of the questsions will be answered using a frequentist approach (two-sample t-test of means), while the other two questions will be answered using a bayesian approach. 