# Project Overview: Analyzing Customer Behavior and Market Trends in a UK-Based Online Retail Company

# Objective:
The objective of this project is to analyze customer behavior and market trends for a UK-based online retail company specializing in unique all-occasion gifts. Using a transactional dataset spanning from 01/12/2010 to 09/12/2011, the project aims to derive actionable insights to inform strategic decision-making and business growth.

# Dataset Information:

Transnational Dataset: Contains all transactions occurring within the specified timeframe.
Company Profile: The company primarily sells unique all-occasion gifts, with many customers being wholesalers.
# Columns:
1. InvoiceNo: A 6-digit integral number uniquely assigned to each transaction. If starting with letter 'c', it indicates a cancellation.
2. StockCode: A 5-digit integral number uniquely assigned to each distinct product.
3. Description: Product name.
4. Quantity: The quantities of each product per transaction.
5. InvoiceDate: The date and time when each transaction was generated.
6. UnitPrice: Product price per unit (in sterling).
7. CustomerID: A 5-digit integral number uniquely assigned to each customer.
8. Country: The name of the country where each customer resides.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans  
from sklearn.preprocessing import StandardScaler 
from sklearn.metrics import silhouette_score  
from scipy import stats  
from mlxtend.frequent_patterns import apriori, association_rules  

In [7]:
data = pd.read_csv("Online Retail.csv", encoding='latin1')



In [9]:
data.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,1/12/10 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,1/12/10 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,1/12/10 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,1/12/10 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,1/12/10 8:26,3.39,17850.0,United Kingdom


In [12]:
data.tail()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
541904,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,9/12/11 12:50,0.85,12680.0,France
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,9/12/11 12:50,2.1,12680.0,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,9/12/11 12:50,4.15,12680.0,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,9/12/11 12:50,4.15,12680.0,France
541908,581587,22138,BAKING SET 9 PIECE RETROSPOT,3,9/12/11 12:50,4.95,12680.0,France


In [15]:
data.shape

(541909, 8)

In [None]:
# they are 541909 rows and 8 columns in this dataset 

In [11]:
data.describe()

Unnamed: 0,Quantity,UnitPrice,CustomerID
count,541909.0,541909.0,406829.0
mean,9.55225,4.611114,15287.69057
std,218.081158,96.759853,1713.600303
min,-80995.0,-11062.06,12346.0
25%,1.0,1.25,13953.0
50%,3.0,2.08,15152.0
75%,10.0,4.13,16791.0
max,80995.0,38970.0,18287.0


# summary statistics 

1. Quantity Insights:

The average quantity per transaction is approximately 9.55, with a standard deviation of approximately 218.08. This indicates a wide variation in the quantity of items purchased per transaction.

There are negative values for quantity, such as the minimum value of -80,995, which may represent returns or cancellations, These negative values will be explored  further to understand their impact on the analysis.

The 25th percentile (first quartile) is 1, indicating that 25% of transactions involve purchasing only one item.

2. Unit Price Insights:

The average unit price is approximately £4.61, with a standard deviation of approximately £96.76. This suggests a wide range of prices for the items sold by the company.

There are negative values for unit price, which may require further investigation to determine their cause and impact on the analysis.
The minimum unit price is negative, indicating potential data anomalies .

3. CustomerID Insights:

The number of non-null values for CustomerID is lower (406,829) compared to the other columns (541,909). This indicates missing values in the CustomerID column