Gala Groceries is a technology-led grocery store chain based in the USA. They rely heavily on new technologies, such as IoT to give them a competitive edge over other grocery stores. 

They pride themselves on providing the best quality, fresh produce from locally sourced suppliers. However, this comes with many challenges to consistently deliver on this objective year-round.

Gala Groceries approached Cognizant to help them with a supply chain issue. Groceries are highly perishable items. If you overstock, you are wasting money on excessive storage and waste, but if you understock, then you risk losing customers. They want to know how to better stock the items that they sell.

This is a high-level business problem and will require you to dive into the data in order to formulate some questions and recommendations to the client about what else we need in order to answer that question.

In [15]:
import pandas as pd

groceries = pd.read_csv('sample_sales_data.csv', index_col=0)
groceries.head()

Unnamed: 0,transaction_id,timestamp,product_id,category,customer_type,unit_price,quantity,total,payment_type
0,a1c82654-c52c-45b3-8ce8-4c2a1efe63ed,2022-03-02 09:51:38,3bc6c1ea-0198-46de-9ffd-514ae3338713,fruit,gold,3.99,2,7.98,e-wallet
1,931ad550-09e8-4da6-beaa-8c9d17be9c60,2022-03-06 10:33:59,ad81b46c-bf38-41cf-9b54-5fe7f5eba93e,fruit,standard,3.99,1,3.99,e-wallet
2,ae133534-6f61-4cd6-b6b8-d1c1d8d90aea,2022-03-04 17:20:21,7c55cbd4-f306-4c04-a030-628cbe7867c1,fruit,premium,0.19,2,0.38,e-wallet
3,157cebd9-aaf0-475d-8a11-7c8e0f5b76e4,2022-03-02 17:23:58,80da8348-1707-403f-8be7-9e6deeccc883,fruit,gold,0.19,4,0.76,e-wallet
4,a81a6cd3-5e0c-44a2-826c-aea43e46c514,2022-03-05 14:32:43,7f5e86e6-f06f-45f6-bf44-27b095c9ad1d,fruit,basic,4.49,2,8.98,debit card


In [16]:
print('The dataset shape is: ', groceries.shape)

The dataset shape is:  (7829, 9)


In [17]:
# Checking the datatypes
print('The datatypes in our dataset:\n\n', groceries.dtypes)

The datatypes in our dataset:

 transaction_id     object
timestamp          object
product_id         object
category           object
customer_type      object
unit_price        float64
quantity            int64
total             float64
payment_type       object
dtype: object


In [18]:
# Checking for duplicates
print(f'Duplicates in our data: {groceries.duplicated().sum()}, ({100*groceries.duplicated().sum()/len(groceries)})%')

Duplicates in our data: 0, (0.0)%


In [22]:
# Checking for missing values
df = groceries.isna().sum().to_frame().rename(columns={0:'Num. of Missing Values'})
df['% of Missing Values']=round((100*groceries.isna().sum()/len(groceries)),2)
df

Unnamed: 0,Num. of Missing Values,% of Missing Values
transaction_id,0,0.0
timestamp,0,0.0
product_id,0,0.0
category,0,0.0
customer_type,0,0.0
unit_price,0,0.0
quantity,0,0.0
total,0,0.0
payment_type,0,0.0
