<a href="https://colab.research.google.com/github/DecodeTheCode-p/Customer-Purchase-Analytics-using-PySpark/blob/main/Customer_Purchase_Analytics_using_PySpark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Customer Purchase Analytics using PySpark**

**🧠 Objective:**

Analyze a sample dataset of customer transactions to find:

Total purchases per customer

Most popular product

Customers who spent more than a threshold



In [1]:


import csv

data = [
    ['CustomerID', 'Product', 'Amount'],
    ['C001', 'Laptop', '1000'],
    ['C002', 'Mobile', '500'],
    ['C001', 'Mouse', '50'],
    ['C003', 'Keyboard', '75'],
    ['C002', 'Laptop', '1000'],
    ['C003', 'Mouse', '50'],
    ['C004', 'Monitor', '150']
]

with open('Purchase.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)


In [3]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import sum, col, count

# 1. Create a SparkSession
spark = SparkSession.builder.appName("CustomerPurchaseAnalytics").getOrCreate()

# 2. Load CSV data into DataFrame
df = spark.read.option("header", True).option("inferSchema", True).csv("/content/Purchase.csv")

# 3. Show the raw dataset
print("Original Dataset:")
df.show()

# 4. Total amount spent by each customer
print("Total Amount Spent Per Customer:")
total_spent = df.groupBy("CustomerID").agg(sum("Amount").alias("TotalSpent"))
total_spent.show()

# 5. Most popular product (by number of purchases)
print("Most Popular Product:")
popular_products = df.groupBy("Product").agg(count("*").alias("Count"))
popular_products.orderBy(col("Count").desc()).show(1)

# 6. Customers who spent more than $100
print("High-Value Customers (Spent > 100):")
high_value = total_spent.filter(col("TotalSpent") > 100)
high_value.show()

# 7. Stop Spark Session
spark.stop()


Original Dataset:
+----------+--------+------+
|CustomerID| Product|Amount|
+----------+--------+------+
|      C001|  Laptop|  1000|
|      C002|  Mobile|   500|
|      C001|   Mouse|    50|
|      C003|Keyboard|    75|
|      C002|  Laptop|  1000|
|      C003|   Mouse|    50|
|      C004| Monitor|   150|
+----------+--------+------+

Total Amount Spent Per Customer:
+----------+----------+
|CustomerID|TotalSpent|
+----------+----------+
|      C003|       125|
|      C004|       150|
|      C001|      1050|
|      C002|      1500|
+----------+----------+

Most Popular Product:
+-------+-----+
|Product|Count|
+-------+-----+
| Laptop|    2|
+-------+-----+
only showing top 1 row

High-Value Customers (Spent > 100):
+----------+----------+
|CustomerID|TotalSpent|
+----------+----------+
|      C003|       125|
|      C004|       150|
|      C001|      1050|
|      C002|      1500|
+----------+----------+

