# Iowa Liquor Sales Functions and Graphs

Below we have functions to analyze data regarding the purchase of spirits in the state of Iowa by commercial establishments holding a Class “E” liquor license. The original dataset (LiquorSalesSamplev2.csv) contains purchasing data from January 1, 2012 to current. Separate functions are divided up and graphed where applicable: 
1. What times of the year have liquor sales been the highest?
2. Are there hotspots in the state where liquor sales have been higher than the average?
3. Are there preferred liquor types in the state of Iowa?
4. Is there a surprising/unexpected time of year when liquor sales have gone up?
5. Are there purchase trends during holidays and college football season?
6. Are there any alcohol types that are frequently bought together? 


Libraries used:

In [None]:
import numpy as np
import pandas as pd
import mlxtend
from mlxtend.frequent_patterns import apriori, association_rules
import numpy as np
import matplotlib.pyplot as plt
import csv

Reading data, row count and splitting up the date for graphing by specific dates:

In [None]:
data = pd.read_csv("LiquorSalesSamplev2.csv")
data.head(10)
print("Number of lines present: ", len(data))


In [None]:
data[['Month', 'Day', 'Year']] = data['Date'].str.split('/', expand = True)
data.head()

### Question 1: What times of the year have liquor sales been the highest?

#### Function to get mean of liquor sales per month and year:

In [None]:
def timeofyear_sum(filename, Time, Sales, year): 
    data = pd.read_csv(filename) 
    data[['Month', 'Day', 'Year']] = data['Date'].str.split('/', expand=True) 
    data['Month'] = pd.to_numeric(data['Month']) 
    data = data[data['Year'] == year]
    data = data.sort_values(by='Month') 
    data = data.groupby(data[Time]) 
    mean_of_col = data[Sales].sum() 
    print(mean_of_col)
    return mean_of_col 

#### Average mean of every month in 2012:

In [None]:
year_2012 = timeofyear_sum('LiquorSalesSamplev2.csv', 'Month', 'Sale (Dollars)', '12')
year_2012.plot();
year_2012.sort_values()

#### Average mean of every month in 2013:

In [None]:
year_2013 = timeofyear_sum('LiquorSalesSamplev2.csv', 'Month', 'Sale (Dollars)', '13')
year_2013.plot();
year_2013.sort_values()

#### Average mean of every month in 2014:

In [None]:
year_2014 = timeofyear_sum('LiquorSalesSamplev2.csv', 'Month', 'Sale (Dollars)', '14')
year_2014.plot();
year_2014.sort_values()

#### Average mean of every month in 2015:

In [None]:
year_2015 = timeofyear_sum('LiquorSalesSamplev2.csv', 'Month', 'Sale (Dollars)', '15')
year_2015.plot();
year_2015.sort_values()

### Question 2: Are there hotspots in the state where liquor sales have been higher than the average?

#### Function to get total liquor sales by county:

In [None]:
def location_sum(filename, County, Sales):
    data = pd.read_csv(filename)
    data[['Month', 'Day', 'Year']] = data['Date'].str.split('/', expand = True)
    data = data.groupby(data[County])
    sum_of_col = data[Sales].sum() 
    return sum_of_col.sort_values()

#### Function to get average liquor sales by county:

In [None]:
def location_mean(filename, County, Sales):
    data = pd.read_csv(filename)
    data[['Month', 'Day', 'Year']] = data['Date'].str.split('/', expand = True)
    data = data.groupby(data[County])
    mean_of_col = data[Sales].sum().mean() 
    return mean_of_col

#### Total average liquor sales for counties and cities:

In [None]:
county_mean = location_mean('LiquorSalesSamplev2.csv', 'County', 'Sale (Dollars)')
city_mean = location_mean('LiquorSalesSamplev2.csv', 'City', 'Sale (Dollars)')

print(county_mean)
print(city_mean)

#### Total sales per county (graph of highest sales counties):

In [None]:
county_sales = location_sum('LiquorSales.csv', 'County', 'Sale (Dollars)')
print(county_sales)
county_sales.nlargest(5).plot(kind='barh');
for index, value in enumerate(county_sales.nlargest(5)): 
    plt.text(value, index,str(value))
plt.axvline(county_mean, color='red');
plt.xticks(rotation=45);
plt.xlabel('Total Sales in Millions');
plt.title('Total Sales per County');

#### Total sales per city (graph of highest sales cities):

In [None]:
city_sales = location_sum('LiquorSales.csv', 'City', 'Sale (Dollars)')
print(city_sales)
city_sales.nlargest(5).plot(kind='barh');
for index, value in enumerate(city_sales.nlargest(5)): 
    plt.text(value, index,str(value))
plt.axvline(city_mean, color='red');
plt.xticks(rotation=45);
plt.xlabel('Total Sales in Millions');
plt.title('Total Sales per City');