<a href="https://colab.research.google.com/github/TechPius/Interactive-_Dashboard-_with_Streamlit/blob/main/Copy_of_Cohort_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##  Project Title
 Cohort Analysis for assessing customer retention in e-commerce industry

# Cohort Analysis
Cohort analysis involves dividing users into distinct groups or cohorts based on certain criteria. In this case study, theses cohorts are defined by the time of user acquisition by grouping users together based on time of acquisition. we can study how different cohorts behave in terms of engagement and retention

## Cohort Analysis Example
You might create cohorts based on the month users first signed up for your service. You can then track how each cohorts activity such as their conversion rate or lifetime value evolves over time. This approach allows you to identify trends, anomalies and areas where you may need to makeadjustments to improve user retention and satisfaction.

## Retention Rate Time-Based Cohort Analysis:


*   create cohorts based on user acquisition dates
*   Measure the percentage of users from each cohort who  continue to engage with your product or serviceover time (e.g after 1 month, 3 months etc)

*   Analyse how retention rates vary across differentcohortsand time periods enabling you to identify trends and make data driven decisions
This Analysis can reveal whether certain cohorts have better or worse retention rates, helping the business understand the factors contributiongto user retention or attention. it can also assist in optimizing marketing, product development and customer support strategies






In [21]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [22]:
!git clone https://github.com/TechPius/cohort_analysis.git

fatal: destination path 'cohort_analysis' already exists and is not an empty directory.


In [23]:
!ls /content/cohort_analysis # Listing repo files

Dataset_ecommerce.csv  main.ipynb  README.md


## Importing Libraries

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

## Load Dataset

In [25]:
Dataset_ecommerce =pd.read_csv('/content/cohort_analysis/Dataset_ecommerce.csv')
Dataset_ecommerce.head()

Unnamed: 0,InvoiceNo,InvoiceDate,CustomerID,StockCode,Description,Quantity,UnitPrice,Country
0,536365,2010-12-01 08:26:00,17850.0,SC1734,Electronics,65,10.23,Egypt
1,536365,2010-12-01 08:26:00,17850.0,SC2088,Furniture,95,19.61,Mali
2,536365,2010-12-01 08:26:00,17850.0,SC3463,Books,78,61.49,Mali
3,536365,2010-12-01 08:26:00,17850.0,SC6228,Toys,15,24.73,South Africa
4,536365,2010-12-01 08:26:00,17850.0,SC2149,Toys,50,38.83,Rwanda


## Descriptive Statistics

In [26]:
Dataset_ecommerce.describe(include='all')

Unnamed: 0,InvoiceNo,InvoiceDate,CustomerID,StockCode,Description,Quantity,UnitPrice,Country
count,541909.0,541909,406829.0,541909,541909,541909.0,541909.0,541909
unique,25900.0,23260,,9000,10,,,28
top,573585.0,2011-10-31 14:41:00,,SC2014,Sports Equipment,,,Cote d'Ivoire
freq,1114.0,1114,,96,54765,,,19651
mean,,,15287.69057,,,50.534748,50.476354,
std,,,1713.600303,,,28.849367,28.564775,
min,,,12346.0,,,1.0,1.0,
25%,,,13953.0,,,26.0,25.75,
50%,,,15152.0,,,51.0,50.43,
75%,,,16791.0,,,76.0,75.18,


## check for missing values

In [27]:
Dataset_ecommerce.isnull().sum()

Unnamed: 0,0
InvoiceNo,0
InvoiceDate,0
CustomerID,135080
StockCode,0
Description,0
Quantity,0
UnitPrice,0
Country,0


In [28]:
Dataset_ecommerce.dropna(inplace=True)
Dataset_ecommerce.isnull().sum()

Unnamed: 0,0
InvoiceNo,0
InvoiceDate,0
CustomerID,0
StockCode,0
Description,0
Quantity,0
UnitPrice,0
Country,0


In [29]:
#get more information on the data
Dataset_ecommerce.info()

<class 'pandas.core.frame.DataFrame'>
Index: 406829 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   InvoiceNo    406829 non-null  object 
 1   InvoiceDate  406829 non-null  object 
 2   CustomerID   406829 non-null  float64
 3   StockCode    406829 non-null  object 
 4   Description  406829 non-null  object 
 5   Quantity     406829 non-null  int64  
 6   UnitPrice    406829 non-null  float64
 7   Country      406829 non-null  object 
dtypes: float64(2), int64(1), object(5)
memory usage: 27.9+ MB


In [30]:
#convert invoice date to datetime format
Dataset_ecommerce['InvoiceDate'] = pd.to_datetime(Dataset_ecommerce['InvoiceDate'])
Dataset_ecommerce.info()

<class 'pandas.core.frame.DataFrame'>
Index: 406829 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    406829 non-null  object        
 1   InvoiceDate  406829 non-null  datetime64[ns]
 2   CustomerID   406829 non-null  float64       
 3   StockCode    406829 non-null  object        
 4   Description  406829 non-null  object        
 5   Quantity     406829 non-null  int64         
 6   UnitPrice    406829 non-null  float64       
 7   Country      406829 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 27.9+ MB


## Exploratory Data Analysis

In [31]:
quantity_by_country = Dataset_ecommerce.groupby(['Country'])['Quantity'].sum().reset_index()
quantity_by_country = quantity_by_country.sort_values('Quantity',ascending=False).reset_index()
quantity_by_country

Unnamed: 0,index,Country,Quantity
0,23,Togo,741223
1,20,South Africa,740589
2,4,Cote d'Ivoire,740229
3,15,Nigeria,739708
4,9,Libya,739206
5,1,Benin,738133
6,27,Zimbabwe,737522
7,19,Somalia,736219
8,0,Algeria,736181
9,18,Sierra Leone,735477


In [32]:
#visualize top 10 countries where most products are sold
top_ten_countries = Dataset_ecommerce.groupby(['Country'])['Quantity'].sum().reset_index()
top_ten_countries.head(10)




Unnamed: 0,Country,Quantity
0,Algeria,736181
1,Benin,738133
2,Burkina Faso,728332
3,Cameroon,730881
4,Cote d'Ivoire,740229
5,Egypt,729299
6,Ethiopia,732940
7,Ghana,735354
8,Kenya,732247
9,Libya,739206
