## 1. Collecting relevant data

In this step, we collected the quantitative data that could be used by a ML system to make predictions. Before collecting this data, we needed to identify the channels through which an ML system could realistically harvest data within the identified domestic practice. These channels could be, for instance: (1) device sensors that track user interactions; (2) wearable sensors that monitor biometric data, the user’s current activity, or their location; (3) fixed sensors that detect environmental data, such as temperature, occupancy, or light intensity; (4) services that collect information about their users' behavior, such as purchase patterns or resource consumption patterns. The data that was collected for the grocery automation case study was in the form of grocery receipts(figure~\ref{fig:receipts}). This data is automatically collected by supermarkets, and can be linked to individual customers through their customer cards or online purchases.

In this notebook we:
1. Import the data we collected in an excel file
2. Preview our dataset

### Import libraries 

In [1]:
#Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
#NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np
# Matplotlib is a plotting library for python and pyplot gives us a MatLab like plotting framework. We will use this in our plotter function to plot data.
import matplotlib.pyplot as plt
#Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics
import seaborn as sns
import dataframe_image as dfi
from matplotlib.ticker import StrMethodFormatter

### Load and view data 

In [2]:
# The data has been collected in an excel file, and needs to be converted to csv:
read_file = pd.read_excel (open (r"/workspaces/Plenty-in-the-Pantry/database/Groceries_onehousehold.xlsx", 'rb'), sheet_name='Household 1m+1f rural')
read_file.to_csv (r"/workspaces/Plenty-in-the-Pantry/database/Groceries_onehousehold.csv", index = None, header=True)

In [3]:
df = pd.read_csv (r"/workspaces/Plenty-in-the-Pantry/database/Groceries_onehousehold.csv")
df.describe(include='all')

Unnamed: 0,week,order_ID,item_id,amount,price_unit,price_total,date,day,timestamp,time,store_type,store_name,promo,item_type,category
count,372.0,372.0,372,372.0,372.0,372.0,372,372,372,372,372,372,372.0,372,372
unique,,,314,,,,26,7,35,4,5,9,,126,16
top,,,GROF BROOD GESN.,,,,2022-01-08,Saturday,17:25:00,morning,supermarket,Okay,,charcuterie,fruit & vegetables
freq,,,6,,,,45,87,44,186,306,127,,25,103
mean,4.056452,18.88172,,1.274194,2.602328,2.867247,,,,,,,0.083333,,
std,2.06747,10.908193,,1.103819,1.947725,2.0192,,,,,,,0.276758,,
min,1.0,1.0,,1.0,0.06468,0.06468,,,,,,,0.0,,
25%,2.0,8.0,,1.0,1.3,1.54397,,,,,,,0.0,,
50%,4.0,20.0,,1.0,2.24025,2.46286,,,,,,,0.0,,
75%,5.0,26.25,,1.0,3.29,3.875,,,,,,,0.0,,


In [5]:
df

Unnamed: 0,week,order_ID,item_id,amount,price_unit,price_total,date,day,timestamp,time,store_type,store_name,promo,item_type,category
0,1,5,RABEKO choco light 250g,2,2.82,5.64,2021-11-23,Tuesday,12:32:00,noon,supermarket,Okay,0,chocolate spread,breakfast & spreads
1,1,5,JOYVALLE pudding griesmeel natuur 135g,4,0.99,3.96,2021-11-23,Tuesday,12:32:00,noon,supermarket,Okay,0,pudding,dairy & plant based
2,1,5,BONI tomatensoep met balletjes 950ml,1,1.99,1.99,2021-11-23,Tuesday,12:32:00,noon,supermarket,Okay,0,soup,canned foods
3,1,5,LIEBIG DELISOUP 9 groenten brik 1L,1,2.59,2.59,2021-11-23,Tuesday,12:32:00,noon,supermarket,Okay,0,soup,canned foods
4,1,5,LIEBIG DELISOUP tom. Balletjes brik 1L,1,2.59,2.59,2021-11-23,Tuesday,12:32:00,noon,supermarket,Okay,0,soup,canned foods
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
367,8,37,AH GR BIET GEITENKAAS SALADE,1,2.99,2.99,2022-01-26,Wednesday,12:36:00,noon,supermarket,Albert Heijn,0,lunch salad,fruit & vegetables
368,8,38,EIERKOEKEN,1,1.69,1.69,2022-01-27,Thursday,11:20:00,morning,supermarket,Albert Heijn,1,egg cakes,snacks
369,8,38,AH MLTSALADE,1,5.49,5.49,2022-01-27,Thursday,11:20:00,morning,supermarket,Albert Heijn,1,lunch salad,fruit & vegetables
370,8,38,MENTOS,1,3.89,3.89,2022-01-27,Thursday,11:20:00,morning,supermarket,Albert Heijn,0,mints,snacks
