**Customer Purchases Data**

Dataset Columns:
* Customer ID
* First Name
* Last Name
* Email Address
* Phone Number
* Product ID
* Purchase Date
* Quantity
* Price per Item
* Total Purchase Value
* Discount Applied (Yes/No)

Features:
* Simulate random customer purchases for a store.
* Calculate total purchase value based on quantity and price per item.
* Apply random discounts to simulate promotional offers.

Manipulation Ideas:
* Visualize total purchases per customer over time.
* Calculate the total revenue and average purchase value.
* Identify frequent shoppers (customers who purchase often).
* Compare sales with and without discounts.

In [54]:
# import required libraries
import pandas as pd
from faker import Faker
import datetime
import numpy as np

In [70]:
# create faker object
fake = Faker()

# list of project categories
product_categories = [
    "Groceries",
    "Clothing & Apparel",
    "Electronics",
    "Home & Kitchen",
    "Sports & Outdoors",
    "Office Supplies",
    "Automotive",
    "Video Games & Consoles"
]


# function to determine quantitiy based on product category
def get_quantity_by_category(category):
    if category == "Groceries":
        return np.random.randint(1, 10)
    elif category == "Clothing & Apparel":
        return np.random.randint(1, 5)
    else:
        return np.random.randint(1, 3)


# Function to determine price based on product category
def get_price_by_category(category):
    if category == "Groceries":
        return round(np.random.uniform(1, 50), 2)
    elif category == "Clothing & Apparel":
        return round(np.random.uniform(20, 200), 2)
    elif category == "Electronics":
        return round(np.random.uniform(50, 1000), 2)
    elif category == "Home & Kitchen":
        return round(np.random.uniform(20, 500), 2)
    elif category == "Sports & Outdoors":
        return round(np.random.uniform(20, 300), 2)
    elif category == "Office Supplies":
        return round(np.random.uniform(5, 100), 2)
    elif category == "Automotive":
        return round(np.random.uniform(30, 1000), 2)
    elif category == "Video Games & Consoles":
        return round(np.random.uniform(10, 500), 2)
    else:
        # default price range for any unspecified category
        return round(np.random.uniform(1, 50), 2)


In [71]:
# generate the data
data = {
    "Customer ID": [f"CT{np.random.randint(0, 9999999):07d}" for ci in range(300)],
    "First Name": [fake.first_name() for fn in range(300)],
    "Last Name": [fake.last_name() for ln in range(300)],
    "Contact Number": [f"+447{np.random.randint(0, 999999999):09d}" for cn in range(300)],
    "Product ID": [f"PR{np.random.randint(0, 999999):06d}" for pi in range(300)],
    "Purchase Date": [fake.date_this_century() for p in range(300)],
    "Quantity Purchased": [
        get_quantity_by_category(np.random.choice(product_categories)) for qp in range(300)
    ],
    "Price Per Item £": [
        get_price_by_category(np.random.choice(product_categories)) for ppi in range(300)
    ]
}

In [72]:
# convert data into pandas dataframe
df = pd.DataFrame(data)

In [73]:
# display the first 10 rows
print(df.head(10))

  Customer ID First Name  Last Name Contact Number Product ID Purchase Date  \
0   CT4149598    Timothy     Garcia  +447993402637   PR350808    2017-10-05   
1   CT2677864      Kathy     Morton  +447519059848   PR037274    2010-08-28   
2   CT1439671    Kenneth    Johnson  +447045984332   PR766177    2024-03-29   
3   CT2439695      David       West  +447475428077   PR081892    2016-12-12   
4   CT6158489      Diana    Russell  +447368380545   PR551203    2008-06-07   
5   CT8531957    Crystal   Odonnell  +447672503752   PR818735    2020-03-06   
6   CT6577092     Curtis     Malone  +447635806411   PR986683    2009-07-02   
7   CT8023840       Beth  Armstrong  +447026684678   PR225749    2004-04-05   
8   CT0051805    Jessica         Wu  +447912937021   PR029945    2016-10-08   
9   CT5017697       Sara  Rodriguez  +447823104339   PR980094    2001-12-26   

   Quantity Purchased  Price Per Item £  
0                   1             30.71  
1                   1            101.36  
2   