# Liquor Dataset Example 
Using Autonormalize to normalize a kaggle dataset about liquor purchasing orders.

This dataset can be found on kaggle at https://www.kaggle.com/residentmario/iowa-liquor-sales. 

In [None]:
import os
import time

import pandas as pd
from demo.liquor import load_sample
from featuretools.autonormalize import autonormalize as an

In [None]:
df = load_sample()
print("Rows: " + str(df.shape[0]))
print("Columns: " + str(df.shape[1]))
df.head(3)

In [None]:
df.dtypes

In [None]:
df = df.astype({"County Number": "int64", "Category": "int64"})

We load our data into a pandas dataframe. For the purpose of manageability we keep the first 13 columns and 1000 rows

In [None]:
start = time.time()
entityset = an.auto_entityset(df, accuracy=0.96, name="liquor orders")
time.time() - start

To detect the dependencies, normalize the data and create an entity set all at once, all we need to do is call an.auto_entityset(). We pass 96% as the desired accuracy, and 'liquor orders" as the desired name for our entity set. This takes around around 10 seconds for 1000 rows and around 10 minutes for 1.5 million rows

In [None]:
entityset.plot()

Above you can see plotted the entities within entityset, and the relationships between them. Each entity contains the data from the original df with duplication removed. 

In [None]:
entityset