### EDA IOWA Dataset - Filter the dataset to have full years til end 2023

**Status:** PUBLIC Distribution <br>

**Author:** Jaume Manero IE<br>
**Date created:** 2021/02/1<br>
**Last modified:** 2024/01/30<br>
**Description:** Analysis of IOWA Dataset

This dataset contains every wholesale purchase of liquor in the State of Iowa by retailers for sale to individuals since January 1, 2012 til 2023
The State of Iowa controls the wholesale distribution of liquor intended for retail sale, which means this dataset offers a complete view of retail liquor sales in the entire state. The dataset contains every wholesale order of liquor by all grocery stores, liquor stores, convenience stores, etc., with details about the store and location, the exact liquor brand and size, and the number of bottles ordered.

In [1]:
# Liquor Sales : 
#    file: https://mydata.iowa.gov/Sales-Distribution/Iowa-Liquor-Sales/m3tr-qhgy
# US County Boundaries & FIP Codes
#    file: https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_20m.zip
# Counties Population
#    file: https://data.iowa.gov/Community-Demographics/County-Population-in-Iowa-by-Year/qtnr-zsrc
# Cities in IOWA population
#    file: https://data.iowa.gov/Community-Demographics/Total-City-Population-by-Year/acem-thbp

In [1]:
import pandas as pd
from thefuzz import fuzz
import folium
import warnings
import geopandas
import matplotlib.pyplot as plt
import matplotlib as mpl
import plotly.figure_factory as ff
import plotly.graph_objects as go
%matplotlib inline
warnings.filterwarnings('ignore')

In [2]:
file = 'Iowa_Liquor_Sales_20240319.csv'
df = pd.read_csv(file, header=0)

In [3]:
# let's see the years in the dataset
# first we create a datetime column
df['date_datetime'] = pd.to_datetime(df['Date'])
# Sort file by datetime
df = df.sort_values(by='date_datetime')
# Let's see the unique years
print(df['date_datetime'].dt.year.unique())
# 

[2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024]


In [4]:
df.describe()

Unnamed: 0,Store Number,County Number,Category,Vendor Number,Pack,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons),date_datetime
count,28592690.0,24132840.0,28575710.0,28592680.0,28592690.0,28592690.0,28592680.0,28592680.0,28592690.0,28592680.0,28592690.0,28592690.0,28592688
mean,3893.087,57.25913,1050259.0,272.5597,12.14036,879.1242,10.64285,15.97455,10.79254,144.5597,9.147597,2.413841,2018-05-25 08:37:02.129630976
min,2106.0,1.0,101220.0,10.0,1.0,0.0,0.0,0.0,-768.0,-9720.0,-1344.0,-355.04,2012-01-03 00:00:00
25%,2621.0,31.0,1012200.0,125.0,6.0,750.0,5.65,8.48,3.0,34.96,1.5,0.4,2015-06-09 00:00:00
50%,3886.0,62.0,1031200.0,260.0,12.0,750.0,8.49,12.74,6.0,77.28,4.8,1.26,2018-07-25 00:00:00
75%,4794.0,77.0,1062400.0,395.0,12.0,1000.0,12.99,19.49,12.0,148.56,10.5,2.77,2021-06-05 00:00:00
max,10370.0,99.0,1901200.0,987.0,336.0,378000.0,24989.02,37483.53,15000.0,279557.3,15000.0,3962.58,2024-02-29 00:00:00
std,1268.075,27.2871,83038.33,144.8834,7.760099,626.4891,13.04729,19.56967,30.37771,509.4256,36.15571,9.551434,


In [5]:
df.tail(10)

Unnamed: 0,Invoice/Item Number,Date,Store Number,Store Name,Address,City,Zip Code,Store Location,County Number,County,...,Item Description,Pack,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons),date_datetime
4174989,INV-67803000005,02/29/2024,5859,LIQUOR TOBACCO & GROCERY - MASON CITY,18 NORTH MONROE AVENUE,MASON CITY,50401,POINT (-93.209221975 43.152355982),,CERRO GORDO,...,FIVE STAR,24,375,2.09,3.14,48,150.72,18.0,4.75,2024-02-29
4164568,INV-67791500076,02/29/2024,4829,CENTRAL CITY 2,1501 MICHIGAN AVE,DES MOINES,50314,POINT (-93.613286034 41.605834999),,POLK,...,SMIRNOFF KISSED CARAMEL,12,750,8.25,12.38,24,297.12,18.0,4.75,2024-02-29
4174986,INV-67799600035,02/29/2024,6129,EAST END LIQUOR / DES MOINES,3804 HUBBELL AVE,DES MOINES,50317,POINT (-93.541625959 41.631082004),,POLK,...,STATE VODKA,12,750,3.5,5.25,12,63.0,9.0,2.37,2024-02-29
4164562,INV-67792200014,02/29/2024,2106,HILLSTREET NEWS AND TOBACCO,2217 COLLEGE,CEDAR FALLS,50613,POINT (-92.456175029 42.517074986),,BLACK HAWK,...,TEMPLETON RYE 4YR,6,750,15.4,23.1,6,138.6,4.5,1.18,2024-02-29
4174983,INV-67797100009,02/29/2024,10057,KWIK STAR #1186 / ALTOONA,2030 21ST ST NW,ALTOONA,50009,POINT (-93.48478601 41.670812979),,POLK,...,CEDAR RIDGE MALTED RYE,6,750,19.67,29.51,6,177.06,4.5,1.18,2024-02-29
4174980,INV-67799600062,02/29/2024,6129,EAST END LIQUOR / DES MOINES,3804 HUBBELL AVE,DES MOINES,50317,POINT (-93.541625959 41.631082004),,POLK,...,CIROC PINEAPPLE,12,750,16.49,24.74,3,74.22,2.25,0.59,2024-02-29
4174977,INV-67800300005,02/29/2024,2201,HAPPY'S WINE & SPIRITS WHOLESALE,5925 UNIVERSITY AVE,CEDAR FALLS,50613,POINT (-92.429482021 42.512313981),,BLACK HAWK,...,BULLEIT RYE 12 YR,12,750,27.49,41.24,1,41.24,0.75,0.19,2024-02-29
4174976,INV-67802400014,02/29/2024,10280,QUICK SHOP LIQUOR AND VAPE / CLEAR LAKE,904 NORTH 8TH STREET,CLEARLAKE,50428,POINT (-93.37843198 43.142923994),,CERRO GORDO,...,EMPRESS 1908 GIN,6,1000,24.33,36.5,2,73.0,2.0,0.52,2024-02-29
4174973,INV-67800200079,02/29/2024,4988,HAPPY'S WINE & SPIRITS,5925 UNIVERSITY AVE,CEDAR FALLS,50613,POINT (-92.429482021 42.512313981),,BLACK HAWK,...,GOLDSCHLAGER CINNAMON SCHNAPPS,12,750,13.5,20.25,2,40.5,1.5,0.39,2024-02-29
4174411,INV-67797700004,02/29/2024,5982,CASEY'S GENERAL STORE #6 / ALTOONA,407 8TH ST SW,ALTOONA,50009,POINT (-93.468928036 41.644324992),,POLK,...,CROWN ROYAL REGAL APPLE MINI,10,50,5.91,8.87,20,298.6,1.0,0.26,2024-02-29


In [6]:
df = df[df['date_datetime'] < '2023-12']

In [7]:
print(df['date_datetime'].dt.year.unique())


[2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023]


In [8]:
# We save the file 2012-2023
# first we drop the datetime column for cleaninless
df = df.drop('date_datetime', axis=1)
df.to_csv('Iowa_Liquor_Sales_DEC2023.csv', index=False)  

In [9]:
import session_info
session_info.show(html=False)

-----
folium              0.14.0
geopandas           0.14.1
matplotlib          3.7.2
pandas              2.0.3
plotly              5.9.0
session_info        1.0.0
thefuzz             0.19.0
-----
IPython             8.15.0
jupyter_client      8.1.0
jupyter_core        5.3.0
-----
Python 3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:15:57) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.22631-SP0
-----
Session information updated at 2024-03-19 13:18
