# Project 1 - Iowa Liquor 

You are a data scientist in residence at the Iowa State tax board. The Iowa State legislature is considering changes in the liquor tax rates and wants a report of current liquor sales by county and projections for the rest of the year. 

Your task is as follows:

* Calculate the yearly liquor sales for each store using the provided data. You can add up the transactions for each year, and store sales in 2015 specifically will be used later as your target variable.
* Use the data from 2015 to make a linear model using as many variables as you find useful to predict the yearly sales of all stores. You must use the sales from Jan to March as one of your variables.
* Use your model for 2015 to estimate total sales in 2016, extrapolating from the sales so far for Jan-March of 2016.
* Report your findings, including any projected increase or decrease in total sales (over the entire state) for the tax committee of the Iowa legislature.
* Use cross-validation to check how your model predicts to held out data compared to the model metrics on the full dataset.
* Fit your model(s) using one or both of the regularization tactics covered. Explain whether the regularized or the non-regularized model performed better and what the selected regression(s) are doing.



# Part 1

### Data Cleaning and EDA

In Part 1 of this two-part project, you will apply the skills you have learned manipulating data in Python with Pandas, Numpy, Matplotlib, Seaborn and other tools to import the Iowa Liquor data, clean the dataset, then perform exploratory analysis using visual and statistical methods.

### Requirements:

**Identify the problem**
- Write a high quality problem statement
- Describe the goals of your study and criteria for success

**Acquire and clean the data**
- Verify the dataset is in the 'Assets' folder of this project - the data is from [Iowa.gov](https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy), filtered and
reduced
- Import data using the Pandas Library
- Format, clean, slice, and combine the data in Python

**Explore the data**
- Perform exploratory analysis methods with visualization and statistical analysis
- Determine outliers, skew distribution of important variables (if any)
- Determine correlations / causations in the data
- State the risks and assumptions of your data
- Identify 5 relationships, trends, or other intersting attributes of the data set

# Identify the Problem

The goal of this two-part project is to build a model predicting future sales. Write a problem statement and identify SMART goals 

### Problem Statement

_write your problem statement here_

### SMART Goals

_write your SMART goals here_

# Download Data



If you navigate to the [data.iowa.gov](https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy) page for the liquor sales data and click "download" - you'll end up with a 3.4GB/12.6 million row file containing all transactions by product for every class-E liquor store in the state since January 1, 2012

For this project, we are providing a trimmed version of this dataset which contains a sample of the data. If you would like to try using a larger sample of the data, reach out to John or Joseph


** The folowing code verifies that the dataset is in the right location **

If the code below returns 'false' - reach out to your instructors.

In [1]:
import os
os.path.isfile('../Assets/Iowa_Liquor_sample.csv') 

True

# Load Data and Clean

Start by loading the data with pandas. You may need to parse the date columns appropriately.

In [2]:
import pandas as pd
import numpy as np

In [3]:
## Load the data into a DataFrame
liquor=pd.read_csv('../Assets/Iowa_Liquor_sample.csv')

### Explore the head and tail

View the head and tail of the data set; take a look at the columns. 

Can you identify what each of the columns are describing? 

How many rows / columns are there in the dataset?

In [4]:
# How many rows / columns are there in the dataset?
liquor.shape

(270955, 18)

In [5]:
# View the head of the data set
liquor.head()

Unnamed: 0,Date,Store Number,City,Zip Code,County Number,County,Category,Category Name,Vendor Number,Item Number,Item Description,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons)
0,11/04/2015,3717,SUMNER,50674,9.0,Bremer,1051100.0,APRICOT BRANDIES,55,54436,Mr. Boston Apricot Brandy,750,$4.50,$6.75,12,$81.00,9.0,2.38
1,03/02/2016,2614,DAVENPORT,52807,82.0,Scott,1011100.0,BLENDED WHISKIES,395,27605,Tin Cup,750,$13.75,$20.63,2,$41.26,1.5,0.4
2,02/11/2016,2106,CEDAR FALLS,50613,7.0,Black Hawk,1011200.0,STRAIGHT BOURBON WHISKIES,65,19067,Jim Beam,1000,$12.59,$18.89,24,$453.36,24.0,6.34
3,02/03/2016,2501,AMES,50010,85.0,Story,1071100.0,AMERICAN COCKTAILS,395,59154,1800 Ultimate Margarita,1750,$9.50,$14.25,6,$85.50,10.5,2.77
4,08/18/2015,3654,BELMOND,50421,99.0,Wright,1031080.0,VODKA 80 PROOF,297,35918,Five O'clock Vodka,1750,$7.20,$10.80,12,$129.60,21.0,5.55


In [6]:
# view the tail of the dataset
liquor.tail()

Unnamed: 0,Date,Store Number,City,Zip Code,County Number,County,Category,Category Name,Vendor Number,Item Number,Item Description,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons)
270950,12/22/2015,4057,DES MOINES,50316,77.0,Polk,1022100.0,TEQUILA,410,88291,Patron Tequila Silver Mini,300,$20.30,$30.45,4,$121.80,1.2,0.32
270951,11/04/2015,5151,IDA GROVE,51445,47.0,Ida,1011200.0,STRAIGHT BOURBON WHISKIES,259,17956,Evan Williams Str Bourbon,750,$7.47,$11.21,3,$33.63,2.25,0.59
270952,10/20/2015,5152,WATERLOO,50702,7.0,Black Hawk,1011300.0,TENNESSEE WHISKIES,85,26826,Jack Daniels Old #7 Black Lbl,750,$15.07,$22.61,6,$135.66,4.5,1.19
270953,11/20/2015,3562,WEST BURLINGTON,52655,29.0,Des Moines,1082900.0,MISC. IMPORTED CORDIALS & LIQUEURS,192,65258,Jagermeister Liqueur,1750,$26.05,$39.08,6,$234.48,10.5,2.77
270954,01/27/2015,4446,URBANDALE,50322,77.0,Polk,1031080.0,VODKA 80 PROOF,260,37993,Smirnoff Vodka 80 Prf,200,$2.75,$4.13,8,$33.04,1.6,0.42


### Parse dates

Using '.dtypes' on our dataframe allows us to view the data type of each of our collumns. Pandas does its best to infer data types on ingest, but we may still need to make assumptions. [Pandas Dtype Basics](https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dtypes)

In [7]:
# use .dtypes on the liquor dataframe to view the data type of each column
liquor.dtypes

Date                      object
Store Number               int64
City                      object
Zip Code                  object
County Number            float64
County                    object
Category                 float64
Category Name             object
Vendor Number              int64
Item Number                int64
Item Description          object
Bottle Volume (ml)         int64
State Bottle Cost         object
State Bottle Retail       object
Bottles Sold               int64
Sale (Dollars)            object
Volume Sold (Liters)     float64
Volume Sold (Gallons)    float64
dtype: object

Note that the 'Date' column has the dtype 'object' - this is the pandas data type designation for a string or a data type it doesn't recognise. 

We want our 'Date' column to be interpreted as datatime by pandas so we can perform time-based grouping and other functions on this column, so we have to convert it to a datetime datatype.

Pandas gives us some options:
- Adjust our pd.read_csv to infer datetimes on import
- Directly convert the date column

[Pandas pd.read_csv datetime handling documentation](https://pandas.pydata.org/pandas-docs/stable/io.html#datetime-handling)

[Pandas pd.to_datetime documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html)

In [8]:
# Option: convert the 'Date' column
liquor["Date"] = pd.to_datetime(liquor["Date"],errors='coerce',infer_datetime_format=True)
liquor.dtypes

Date                     datetime64[ns]
Store Number                      int64
City                             object
Zip Code                         object
County Number                   float64
County                           object
Category                        float64
Category Name                    object
Vendor Number                     int64
Item Number                       int64
Item Description                 object
Bottle Volume (ml)                int64
State Bottle Cost                object
State Bottle Retail              object
Bottles Sold                      int64
Sale (Dollars)                   object
Volume Sold (Liters)            float64
Volume Sold (Gallons)           float64
dtype: object

In [9]:
# Option: adjust pd.read_csv
liquor = pd.read_csv('../Assets/Iowa_Liquor_sample.csv',parse_dates=['Date'],infer_datetime_format=True)
liquor.dtypes

Date                     datetime64[ns]
Store Number                      int64
City                             object
Zip Code                         object
County Number                   float64
County                           object
Category                        float64
Category Name                    object
Vendor Number                     int64
Item Number                       int64
Item Description                 object
Bottle Volume (ml)                int64
State Bottle Cost                object
State Bottle Retail              object
Bottles Sold                      int64
Sale (Dollars)                   object
Volume Sold (Liters)            float64
Volume Sold (Gallons)           float64
dtype: object

### Clean Column Names

Are there spaces in the column names? When columns have spaces in there name, it makes it difficult to use pandas in the form of df.column.method() (ie, liquor.City.value_counts)

For a multi-word column name, you would have to use df['Zip Code'] instead of df.ZipCode

Additionally, some of our columns have parantheses in their names. 

We can remove spaces and special characters from our columns, which will help keep our code clean.

In [10]:
# view  column names
liquor.columns.values

array(['Date', 'Store Number', 'City', 'Zip Code', 'County Number',
       'County', 'Category', 'Category Name', 'Vendor Number',
       'Item Number', 'Item Description', 'Bottle Volume (ml)',
       'State Bottle Cost', 'State Bottle Retail', 'Bottles Sold',
       'Sale (Dollars)', 'Volume Sold (Liters)', 'Volume Sold (Gallons)'], dtype=object)

In [11]:
# remove spaces
liquor.columns = liquor.columns.str.replace(' ','')
liquor.columns.values

array(['Date', 'StoreNumber', 'City', 'ZipCode', 'CountyNumber', 'County',
       'Category', 'CategoryName', 'VendorNumber', 'ItemNumber',
       'ItemDescription', 'BottleVolume(ml)', 'StateBottleCost',
       'StateBottleRetail', 'BottlesSold', 'Sale(Dollars)',
       'VolumeSold(Liters)', 'VolumeSold(Gallons)'], dtype=object)

In [12]:
# remove left parentheses
liquor.columns = liquor.columns.str.replace('(','')
liquor.columns.values


array(['Date', 'StoreNumber', 'City', 'ZipCode', 'CountyNumber', 'County',
       'Category', 'CategoryName', 'VendorNumber', 'ItemNumber',
       'ItemDescription', 'BottleVolumeml)', 'StateBottleCost',
       'StateBottleRetail', 'BottlesSold', 'SaleDollars)',
       'VolumeSoldLiters)', 'VolumeSoldGallons)'], dtype=object)

In [13]:
# remove right parentheses
liquor.columns = liquor.columns.str.replace(')','')
liquor.columns.values

array(['Date', 'StoreNumber', 'City', 'ZipCode', 'CountyNumber', 'County',
       'Category', 'CategoryName', 'VendorNumber', 'ItemNumber',
       'ItemDescription', 'BottleVolumeml', 'StateBottleCost',
       'StateBottleRetail', 'BottlesSold', 'SaleDollars',
       'VolumeSoldLiters', 'VolumeSoldGallons'], dtype=object)

### Clean Numerics

Look at the .dtypes and the .head() of the dataframe again - are there any columns that should be a numeric data type (float, int, etc) that are still objects? (hint - follow the $money)

[Pandas Series.replace()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.replace.html)

[Pandas pd.to_numeric()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_numeric.html)

In [14]:
liquor.dtypes

Date                 datetime64[ns]
StoreNumber                   int64
City                         object
ZipCode                      object
CountyNumber                float64
County                       object
Category                    float64
CategoryName                 object
VendorNumber                  int64
ItemNumber                    int64
ItemDescription              object
BottleVolumeml                int64
StateBottleCost              object
StateBottleRetail            object
BottlesSold                   int64
SaleDollars                  object
VolumeSoldLiters            float64
VolumeSoldGallons           float64
dtype: object

In [15]:
liquor['StateBottleCost'].head()

0     $4.50
1    $13.75
2    $12.59
3     $9.50
4     $7.20
Name: StateBottleCost, dtype: object

In [16]:
# remove characters from a series of strings/objects
liquor['StateBottleCost'] = liquor['StateBottleCost'].str.replace('$','')
liquor['StateBottleCost'].head()

0     4.50
1    13.75
2    12.59
3     9.50
4     7.20
Name: StateBottleCost, dtype: object

In [17]:
liquor['StateBottleRetail'] = liquor['StateBottleRetail'].str.replace('$','')
liquor['StateBottleRetail'].head()

0     6.75
1    20.63
2    18.89
3    14.25
4    10.80
Name: StateBottleRetail, dtype: object

In [18]:
liquor['SaleDollars'] = liquor['SaleDollars'].str.replace('$','')
liquor['SaleDollars'].head()

0     81.00
1     41.26
2    453.36
3     85.50
4    129.60
Name: SaleDollars, dtype: object

In [19]:
# convert string/object columns to numeric
liquor['StateBottleCost'] = pd.to_numeric(liquor['StateBottleCost'], errors='coerce')
liquor['StateBottleRetail'] = pd.to_numeric(liquor['StateBottleRetail'], errors='coerce')
liquor['SaleDollars'] = pd.to_numeric(liquor['SaleDollars'], errors='coerce')
liquor.dtypes

Date                 datetime64[ns]
StoreNumber                   int64
City                         object
ZipCode                      object
CountyNumber                float64
County                       object
Category                    float64
CategoryName                 object
VendorNumber                  int64
ItemNumber                    int64
ItemDescription              object
BottleVolumeml                int64
StateBottleCost             float64
StateBottleRetail           float64
BottlesSold                   int64
SaleDollars                 float64
VolumeSoldLiters            float64
VolumeSoldGallons           float64
dtype: object

In [20]:
# examine your .dtypes and .head() to confirm your type adjustements
liquor.head()

Unnamed: 0,Date,StoreNumber,City,ZipCode,CountyNumber,County,Category,CategoryName,VendorNumber,ItemNumber,ItemDescription,BottleVolumeml,StateBottleCost,StateBottleRetail,BottlesSold,SaleDollars,VolumeSoldLiters,VolumeSoldGallons
0,2015-11-04,3717,SUMNER,50674,9.0,Bremer,1051100.0,APRICOT BRANDIES,55,54436,Mr. Boston Apricot Brandy,750,4.5,6.75,12,81.0,9.0,2.38
1,2016-03-02,2614,DAVENPORT,52807,82.0,Scott,1011100.0,BLENDED WHISKIES,395,27605,Tin Cup,750,13.75,20.63,2,41.26,1.5,0.4
2,2016-02-11,2106,CEDAR FALLS,50613,7.0,Black Hawk,1011200.0,STRAIGHT BOURBON WHISKIES,65,19067,Jim Beam,1000,12.59,18.89,24,453.36,24.0,6.34
3,2016-02-03,2501,AMES,50010,85.0,Story,1071100.0,AMERICAN COCKTAILS,395,59154,1800 Ultimate Margarita,1750,9.5,14.25,6,85.5,10.5,2.77
4,2015-08-18,3654,BELMOND,50421,99.0,Wright,1031080.0,VODKA 80 PROOF,297,35918,Five O'clock Vodka,1750,7.2,10.8,12,129.6,21.0,5.55


### Null Values

Evaluate the null values in the dataset. 

Questions to guide your process:
- What columns have null values?
- How many nulls values are in each column?
- Will the missing values effect your analysis? 
- Can you afford to remove (drop) the null values from your dataset?

[df.isnull()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isnull.html)

[df.dropna()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html)

[O'Reilly article on dealing with nulls using pandas](https://www.oreilly.com/learning/handling-missing-data)

[Data School video on missing values in pandas](https://www.youtube.com/watch?v=fCMrO_VzeL8)

In [21]:
liquor.isnull().sum()

Date                    0
StoreNumber             0
City                    0
ZipCode                 0
CountyNumber         1077
County               1077
Category               68
CategoryName          632
VendorNumber            0
ItemNumber              0
ItemDescription         0
BottleVolumeml          0
StateBottleCost         0
StateBottleRetail       0
BottlesSold             0
SaleDollars             0
VolumeSoldLiters        0
VolumeSoldGallons       0
dtype: int64

In [22]:
# Show rows with missing data.
liquor[liquor.isnull().any(axis=1)]

Unnamed: 0,Date,StoreNumber,City,ZipCode,CountyNumber,County,Category,CategoryName,VendorNumber,ItemNumber,ItemDescription,BottleVolumeml,StateBottleCost,StateBottleRetail,BottlesSold,SaleDollars,VolumeSoldLiters,VolumeSoldGallons
135,2016-01-20,5222,CEDAR RAPIDS,52402,,,1051010.0,AMERICAN GRAPE BRANDIES,115,53214,Paul Masson Grande Amber Brandy,375,3.22,4.83,24,115.92,9.00,2.38
198,2016-03-02,3820,SIOUX CITY,51103,,,1032080.0,IMPORTED VODKA,35,34359,Grey Goose Vodka,200,5.00,7.50,12,90.00,2.40,0.63
272,2016-03-21,4222,EVANSDALE,50707,,,1062300.0,FLAVORED RUM,370,42716,Malibu Coconut Rum,750,7.49,11.24,3,33.72,2.25,0.59
290,2016-03-21,5236,ANAMOSA,52205,,,1081600.0,WHISKEY LIQUEUR,421,64868,Fireball Cinnamon Whiskey,1750,15.33,23.00,6,138.00,10.50,2.77
321,2016-02-23,4203,WAVERLY,50677,,,1051100.0,APRICOT BRANDIES,434,55084,Paramount Blackberry Brandy,375,3.55,5.33,24,127.92,9.00,2.38
863,2016-01-11,2460,HAMPTON,50441,,,1011200.0,STRAIGHT BOURBON WHISKIES,461,77776,Wild Turkey American Honey,750,10.50,15.75,3,47.25,2.25,0.59
896,2015-02-05,4829,DES MOINES,50314,77.0,Polk,1022200.0,,85,3657,Herradura Gold Reposado 6pak,750,23.58,35.37,6,212.22,4.50,1.19
901,2016-02-25,4647,WATERLOO,50707,7.0,Black Hawk,1052100.0,,420,48099,Hennessy VS,200,5.74,8.61,24,206.64,4.80,1.27
964,2015-05-19,4247,BELMOND,50421,,,1012100.0,CANADIAN WHISKIES,55,12408,Canadian Ltd Whisky,1750,9.14,13.71,6,82.26,10.50,2.77
982,2016-03-30,5222,CEDAR RAPIDS,52402,,,1031080.0,VODKA 80 PROOF,300,36904,Mccormick Vodka Pet,375,1.80,2.70,24,64.80,9.00,2.38


In [23]:
# Drop any column with all missing data.
liquor.dropna(axis=1, how='all').inplace=True
liquor.isnull().sum()

Date                    0
StoreNumber             0
City                    0
ZipCode                 0
CountyNumber         1077
County               1077
Category               68
CategoryName          632
VendorNumber            0
ItemNumber              0
ItemDescription         0
BottleVolumeml          0
StateBottleCost         0
StateBottleRetail       0
BottlesSold             0
SaleDollars             0
VolumeSoldLiters        0
VolumeSoldGallons       0
dtype: int64

In [24]:
# Drop any row with some missing data.
liquor.dropna(axis=0, how='any').inplace=True
liquor.isnull().sum()

Date                    0
StoreNumber             0
City                    0
ZipCode                 0
CountyNumber         1077
County               1077
Category               68
CategoryName          632
VendorNumber            0
ItemNumber              0
ItemDescription         0
BottleVolumeml          0
StateBottleCost         0
StateBottleRetail       0
BottlesSold             0
SaleDollars             0
VolumeSoldLiters        0
VolumeSoldGallons       0
dtype: int64

In [25]:
# Not really sure what this code is for. It was just here.
test = {'col1':[1,2,3],'col2':['a','b','c']}
test['col1'] = [4,5,6]
pd.liquor(test)
liquor[liquor.isnull().any(axis=1)]
cols = ['col1','col2']

for col in cols:
    df[col] = df[col].replace

AttributeError: 'module' object has no attribute 'liquor'

## Exploratory Data Analysis

Using pandas (and other tools such as NumPy, Matplotlib, or Seaborn) explore the Iowa Liquor dataset.

**Remember** Your goal in Part 2 will be to predict future sales. Look for relationships, trends, and features that may assist you in that task.

### Identify 5 trends, relationships, or things that stand out to you in the dataset

Display yout findings below

In [26]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline


### Some questions to get you started:

 - What categories had the highest number of sales?
 - What categories had the highest sales in dollars?
 - Are there any strong relationships between any of the features and sales (in dollars)?
 - Are there any outliers? Are there any individual sales, stores, locations, or categories that show strange behavior?
 - What categories are the most profitable?
 - What stores are the most profitable?
 
 


** Example ** 

Relationship between mean Bottles Sold and average SaleDollars (by store)

There is a strong possitive relationship between the average number of bottles sold per transaction at a store and the average total sale amount per transaction at the same store. 

Digging deeper, by comparing BottlesSold to StateBottleRetai we can see that there is almost no relationship between the average bottle price per transaction at a store and the average number of bottles sold per transaction; that is stores which are selling more bottles per transaction are not nescesarily selling cheaper items, and stores that are selling less bottles are not selling more expensive items.


In [None]:
# group by store.
liquor.groupby('StoreNumber').BottlesSold.mean()

In [None]:
liquor.groupby('StoreNumber').BottlesSold.mean().plot(kind='bar')

In [None]:
# sort by store and category.
liquor.sort_values(by=['StoreNumber']).inplace=True
liquor.head()

In [28]:
# graphing relationship between mean Bottles Sold per store and mean Sale total per store
liquor_grouped_mean = liquor.groupby('StoreNumber')
liquor_grouped_mean.head()

Unnamed: 0,Date,StoreNumber,City,ZipCode,CountyNumber,County,Category,CategoryName,VendorNumber,ItemNumber,ItemDescription,BottleVolumeml,StateBottleCost,StateBottleRetail,BottlesSold,SaleDollars,VolumeSoldLiters,VolumeSoldGallons
0,2015-11-04,3717,SUMNER,50674,9.0,Bremer,1051100.0,APRICOT BRANDIES,55,54436,Mr. Boston Apricot Brandy,750,4.50,6.75,12,81.00,9.00,2.38
1,2016-03-02,2614,DAVENPORT,52807,82.0,Scott,1011100.0,BLENDED WHISKIES,395,27605,Tin Cup,750,13.75,20.63,2,41.26,1.50,0.40
2,2016-02-11,2106,CEDAR FALLS,50613,7.0,Black Hawk,1011200.0,STRAIGHT BOURBON WHISKIES,65,19067,Jim Beam,1000,12.59,18.89,24,453.36,24.00,6.34
3,2016-02-03,2501,AMES,50010,85.0,Story,1071100.0,AMERICAN COCKTAILS,395,59154,1800 Ultimate Margarita,1750,9.50,14.25,6,85.50,10.50,2.77
4,2015-08-18,3654,BELMOND,50421,99.0,Wright,1031080.0,VODKA 80 PROOF,297,35918,Five O'clock Vodka,1750,7.20,10.80,12,129.60,21.00,5.55
5,2015-04-20,2569,CEDAR RAPIDS,52402,57.0,Linn,1041100.0,AMERICAN DRY GINS,205,31473,New Amsterdam Gin,1750,13.32,19.98,6,119.88,10.50,2.77
6,2015-08-05,2596,OTTUMWA,52501,90.0,Wapello,1051010.0,AMERICAN GRAPE BRANDIES,85,52806,Korbel Brandy,750,6.66,9.99,3,29.97,2.25,0.59
7,2015-06-25,3456,CLEAR LAKE,50428,17.0,Cerro Gordo,1012100.0,CANADIAN WHISKIES,65,10628,Canadian Club Whisky,1750,15.75,23.63,2,47.26,3.50,0.92
8,2016-01-04,4757,BONDURANT,50035,77.0,Polk,1032080.0,IMPORTED VODKA,370,34006,Absolut Swedish Vodka 80 Prf,750,11.49,17.24,4,68.96,3.00,0.79
9,2015-11-10,4346,SHELLSBURG,52332,6.0,Benton,1081315.0,CINNAMON SCHNAPPS,65,82610,Dekuyper Hot Damn!,1000,7.62,11.43,2,22.86,2.00,0.53


In [None]:
liquor_grouped_mean.tail()

In [None]:
# seaborn's jointplot provides a scatter of my variables, histograms, and a pearson correlation coefficient

sns.jointplot(x='BottlesSold',y='SaleDollars',data=liquor_grouped_mean)
plt.show()

In [None]:
# graphing relationship between mean Bottles Sold per store and mean Bottle Retail price per store
sns.jointplot(x='BottlesSold',y='StateBottleRetail',data=liquor_grouped_mean)
plt.show()

In [None]:
# histogram of average bottle retail price
liquor_grouped_mean[liquor_grouped_mean.StateBottleRetail <= 50].StateBottleRetail.plot(kind='hist')

liquor_grouped_mean.StateBottleRetail.describe()

### Finding #1

*description of your finding*

In [None]:
# your code here



### Finding #2

*description of your finding*

In [None]:
# your code here



### Finding #3

*description of your finding*

In [None]:
# your code here



### Finding #4

*description of your finding*

In [None]:
# your code here



### Finding #5

*description of your finding*

In [None]:
# your code here

