# Introduction


 - This notebook works with a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

# Objective

 - Identify customer at risk of churn

 - Highlight high-revenue products
 
 - Suggest actions to increase customer retention or cross-selling opportunities

#  Import Statements

In [24]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# Data Loading and Initial Exploration

In [25]:
# Load the dataset

df_raw = pd.read_csv(
    '../data/raw/Online_Retail.csv',
    encoding='latin1',
    parse_dates=["InvoiceDate"],
    date_format="%m/%d/%Y %H:%M",
    low_memory=False
)



In [26]:
# Inspect the first few rows of the dataset
df_raw.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/10 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/10 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/10 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/10 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/10 8:26,3.39,17850.0,United Kingdom


In [31]:
df_raw["InvoiceDate"] = pd.to_datetime(
    df_raw["InvoiceDate"],
    format="%m/%d/%y %H:%M",
    errors="coerce"
)


In [35]:
summary = {
    "rows": len(df_raw),
    "columns": df_raw.shape[1],
    "date_min": df_raw["InvoiceDate"].min(),
    "date_max": df_raw["InvoiceDate"].max(),
    "countries": df_raw["Country"].nunique(),
    "customers": df_raw["CustomerID"].nunique(),
    "products": df_raw["StockCode"].nunique(),
}

summary


{'rows': 541909,
 'columns': 8,
 'date_min': Timestamp('2010-12-01 08:26:00'),
 'date_max': Timestamp('2011-12-09 12:50:00'),
 'countries': 38,
 'customers': 4372,
 'products': 4070}

In [34]:
df_raw["InvoiceDate"].dtype


dtype('<M8[ns]')

In [28]:
# Clean the dataset by removing rows with missing values.
df_cleaned = df_raw.dropna()

In [29]:
df = pd.read_csv(
    '../data/raw/Online_Retail.csv',
    encoding='latin1',
    low_memory=False
)


