# Retail Data Analysis

### Customer Segmentation:
- Algorithm: K-Means Clustering, Hierarchical Clustering
- Libraries: scikit-learn, KMeans from sklearn.cluster, AgglomerativeClustering from sklearn.cluster

### Churn Prediction:
- Algorithm: Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machines (SVM)
- Libraries: scikit-learn, xgboost, lightgbm

### Customer Lifetime Value (CLV) Analysis:
- Algorithm: Regression (e.g., Linear Regression, Poisson Regression), Survival Analysis
- Libraries: scikit-learn, lifelines

### Market Basket Analysis:
- Algorithm: Association Rule Mining (e.g., Apriori algorithm)
- Libraries: mlxtend, apriori from mlxtend.frequent_patterns

### Time Series Analysis and Forecasting:
- Algorithm: ARIMA (AutoRegressive Integrated Moving Average), SARIMA, Prophet
- Libraries: statsmodels, fbprophet

### Fraud Detection:
- Algorithm: Anomaly Detection (e.g., Isolation Forest, One-Class SVM), Autoencoders
- Libraries: scikit-learn, PyOD

### Recommendation Engine:
- Algorithm: Collaborative Filtering (e.g., User-based, Item-based), Matrix Factorization (e.g., Singular Value Decomposition, Alternating Least Squares)
- Libraries: Surprise, Implicit, LightFM

### Customer Purchase Funnel Analysis:
- Algorithm: Sequential Pattern Mining
- Libraries: PrefixSpan from mlxtend

### A/B Testing:
- Algorithm: Hypothesis Testing (e.g., t-tests, chi-squared tests)
- Libraries: scipy, statsmodels

### Customer Sentiment Analysis:
- Algorithm: Natural Language Processing (NLP) with techniques like Bag-of-Words, TF-IDF, Sentiment Analysis (e.g., using Naive Bayes, SVM)
- Libraries: scikit-learn, NLTK, spaCy

# 1. Import Libraries

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# 2. Read In Data

In [25]:
cust = pd.read_csv("/Users/brian/Documents/GitHub/Retail-Data-Project/Customer.csv")
trans = pd.read_csv("/Users/brian/Documents/GitHub/Retail-Data-Project/Transactions.csv")
prod = pd.read_csv("/Users/brian/Documents/GitHub/Retail-Data-Project/prod_cat_info.csv")

# 3. Understand Data

## Customer Data

In [3]:
cust.head()

Unnamed: 0,customer_Id,DOB,Gender,city_code
0,268408,02-01-1970,M,4.0
1,269696,07-01-1970,F,8.0
2,268159,08-01-1970,F,8.0
3,270181,10-01-1970,F,2.0
4,268073,11-01-1970,M,1.0


In [5]:
cust.shape

(5647, 4)

In [8]:
cust.describe()

Unnamed: 0,customer_Id,city_code
count,5647.0,5645.0
mean,271037.281034,5.472631
std,2451.261711,2.859918
min,266783.0,1.0
25%,268912.0,3.0
50%,271028.0,5.0
75%,273180.0,8.0
max,275265.0,10.0


In [9]:
cust.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5647 entries, 0 to 5646
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   customer_Id  5647 non-null   int64  
 1   DOB          5647 non-null   object 
 2   Gender       5645 non-null   object 
 3   city_code    5645 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 176.6+ KB


In [14]:
cust.isnull().sum()

customer_Id    0
DOB            0
Gender         2
city_code      2
dtype: int64

In [16]:
cust_missing = cust[cust.isnull().any(axis=1)]
cust_missing

Unnamed: 0,customer_Id,DOB,Gender,city_code
24,267199,14-02-1970,,2.0
87,271626,02-06-1970,,6.0
115,268447,14-07-1970,M,
149,268709,09-09-1970,F,


In [22]:
cust_drop_na = cust.dropna()

In [23]:
cust_drop_na.head()

Unnamed: 0,customer_Id,DOB,Gender,city_code
0,268408,02-01-1970,M,4.0
1,269696,07-01-1970,F,8.0
2,268159,08-01-1970,F,8.0
3,270181,10-01-1970,F,2.0
4,268073,11-01-1970,M,1.0


In [None]:
# cust_impute = cust

## Transaction Data

In [26]:
trans.head()

Unnamed: 0,transaction_id,cust_id,tran_date,prod_subcat_code,prod_cat_code,Qty,Rate,Tax,total_amt,Store_type
0,80712190438,270351,28-02-2014,1,1,-5,-772,405.3,-4265.3,e-Shop
1,29258453508,270384,27-02-2014,5,3,-5,-1497,785.925,-8270.925,e-Shop
2,51750724947,273420,24-02-2014,6,5,-2,-791,166.11,-1748.11,TeleShop
3,93274880719,271509,24-02-2014,11,6,-3,-1363,429.345,-4518.345,e-Shop
4,51750724947,273420,23-02-2014,6,5,-2,-791,166.11,-1748.11,TeleShop


## Product Head

In [27]:
prod.head()

Unnamed: 0,prod_cat_code,prod_cat,prod_sub_cat_code,prod_subcat
0,1,Clothing,4,Mens
1,1,Clothing,1,Women
2,1,Clothing,3,Kids
3,2,Footwear,1,Mens
4,2,Footwear,3,Women


# 4. Data Cleaning

# 5. Exploratory Data Analysis