# Tesco Clubcard Data Analysis Notebook

This notebook serves to help me understand how the JSON file provided of Tesco Clubcard user data is structured and what various information can be drawn from it. I must pre-process, parse through, and identify key figures from the JSON data of transactions from a user's clubcard. 



## Imports

In [1]:
import json
import pandas as pd

## Load Transactions from JSON into DF

In [None]:
# Load raw JSON
with open("../Tesco-Customer-Data.json", "r") as json_file: 
    json_data = json.load(json_file)

# Extract Transactions
transactions = json_data["Purchase"][0]

#Create Trnasaction_DF
df_transactions = pd.json_normalize(transactions)
df_transactions.head()

Unnamed: 0,basketValueGross,purchaseType,overallBasketSavings,storeId,storeAddress,paymentType,timeStamp,basketValueNet,says,storeName,storeFormat,product
0,5.2,IN_STORE,1.35,4667,"Units 2&3 Distillers Building,Smithfield, Dubl...","[{'type': 'VisaCredit', 'category': 'NA', 'amo...",2024-12-15 17:17:32.514,5.2,,Smithfield Express,Express,[{'name': 'Tesco Slievenamon Irish Still Sprin...
1,4.74,IN_STORE,1.06,5980,51 - 52 Thomas Street,"[{'type': 'VisaCredit', 'category': 'NA', 'amo...",2024-12-12 16:04:56.304,0.74,,Thomas Street Express,Express,"[{'name': 'Kerrygold Butter Sticks 100G', 'qua..."
2,3.45,IN_STORE,0.7,5505,41 Upper Camden Street,"[{'type': 'VisaCredit', 'category': 'NA', 'amo...",2024-12-06 17:59:38.685,3.45,,Camden Street Express,Express,[{'name': 'Tesco Slievenamon Irish Still Sprin...
3,3.45,IN_STORE,0.7,5980,51 - 52 Thomas Street,"[{'type': 'VisaCredit', 'category': 'NA', 'amo...",2024-12-06 14:47:26.859,3.45,,Thomas Street Express,Express,[{'name': 'Tesco Slievenamon Irish Still Sprin...
4,4.9,IN_STORE,0.85,4667,"Units 2&3 Distillers Building,Smithfield, Dubl...","[{'type': 'VisaCredit', 'category': 'NA', 'amo...",2024-11-28 19:40:34.480,4.9,,Smithfield Express,Express,[{'name': 'Cadbury Dairy Milk Whole Nut Chocol...


## Initial Data Analysis

In [None]:
# Convert timeStamp column to datetime and handle that some are formatted differently
df_transactions["timeStamp"] = pd.to_datetime(df_transactions["timeStamp"], format="mixed", utc=True)

In [26]:
print("First Transaction Date:", df_transactions["timeStamp"].min())
print("Last Transaction Date:", df_transactions["timeStamp"].max())
print("Number of Transactions:", len(df_transactions))
print("Account Open for %d days" % ((df_transactions["timeStamp"].max() - df_transactions["timeStamp"].min()).days))
print("Visited %d different stores." % (len(df_transactions["storeId"].unique())))


First Transaction Date: 2019-09-11 18:46:52+00:00
Last Transaction Date: 2024-12-15 17:17:32.514000+00:00
Number of Transactions: 250
Account Open for 1921 days
Visited 17 different stores.


In [27]:
# View all the different stores visited
df_transactions["storeName"].unique()

array(['Smithfield Express', 'Thomas Street Express',
       'Camden Street Express', 'Crumlin Express', 'Castlebar Superstore',
       'Spencer Dock Express', 'Temple Bar Express',
       'Baggot Street Upper Metro', 'Dorset Street Express',
       'Cabra Superstore', 'Donnybrook Express', 'Bloomfields Superstore',
       'Parnell Street Metro', 'Merrion Superstore', 'BLOOMFIELDS',
       'CABRA', 'FINGLAS CLEARWATER', 'MERRION', 'TEMPLE BAR METRO',
       'THOMAS STREET EXPRESS', 'DONNYBROOK EXPRESS', 'STILLORGAN',
       'TALBOT STREET EXPRESS'], dtype=object)

### 