# Project: WEB ANALYTICS INSIGHTS

 ## Table of Contents

<ul>
    <li><a href="#intro">Introduction</a></li>
    <li><a href="#wrangling">Data Wrangling</a></li>
    <li><a href="#feat_eng">Feature Engineering</a></li>
    <li><a href="#eda">Exploratory Data Analysis</a></li>
    <li><a href="#conclusions">Conclusions</a></li>
</ul>


<a id='intro'></a>
# Introduction

In 2013, the e-commerce website has had flactuating visits. We'd like to understand the probable cause of this situation and discover possible insights to be used develop a data-driven strategy to engage and retain site visitors. The dataset contains traffic records of an ecommerce website.


**Questions to help understand the dataset**
- What is the daily distribution of site visitors.
- What is the monthly distribution of site visitors.
- What is the average orders made by clients.
- What is the conversion rate?
- What is the bounce rate?



In [41]:
# Importing libraries

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
import warnings 
import statsmodels.api as sm
import scipy.stats as scistat

%matplotlib inline
warnings.filterwarnings('ignore')

In [4]:
# Load Data
df = pd.read_excel('Data\Web_Analytics_Data.xlsx')
df.head()

Unnamed: 0,DAY,VISITS,ORDERS,HAS_PURCHASED_PRIOR,DEVICE,BOUNCES,ADD_TO_CART,PRODUCT_PAGE_VIEWS,SEARCH_PAGE_VIEWS,GENDER,AGE,INCOME
0,2013-01-01,64340,2312,N,iPhone,21755,21501,41587,45582,F,24,451529
1,2013-01-02,63958,2427,N,iPhone,15675,21355,41392,45456,F,22,384768
2,2013-01-03,67390,2230,Y,iPhone,28199,17086,46559,51972,M,71,283793
3,2013-01-04,58305,1814,N,iPhone,24380,17172,35612,41043,M,51,417355
4,2013-01-05,74434,2333,Y,iPhone,15518,19392,44692,55954,F,32,99205


In [3]:
df.shape

(5110, 13)

<a id='wrangling'></a>
# Data Wrangling

**Assessing Data**

In [5]:
data = df.copy()

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5110 entries, 0 to 5109
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   DAY                  5110 non-null   datetime64[ns]
 1   VISITS               5110 non-null   int64         
 2   ORDERS               5110 non-null   int64         
 3   HAS_PURCHASED_PRIOR  5110 non-null   object        
 4   DEVICE               5110 non-null   object        
 5   BOUNCES              5110 non-null   int64         
 6   ADD_TO_CART          5110 non-null   int64         
 7   PRODUCT_PAGE_VIEWS   5110 non-null   int64         
 8   SEARCH_PAGE_VIEWS    5110 non-null   int64         
 9   GENDER               5110 non-null   object        
 10  AGE                  5110 non-null   int64         
 11  INCOME               5110 non-null   int64         
dtypes: datetime64[ns](1), int64(8), object(3)
memory usage: 479.2+ KB


In [9]:
data.head(5).T

Unnamed: 0,0,1,2,3,4
DAY,2013-01-01 00:00:00,2013-01-02 00:00:00,2013-01-03 00:00:00,2013-01-04 00:00:00,2013-01-05 00:00:00
VISITS,64340,63958,67390,58305,74434
ORDERS,2312,2427,2230,1814,2333
HAS_PURCHASED_PRIOR,N,N,Y,N,Y
DEVICE,iPhone,iPhone,iPhone,iPhone,iPhone
BOUNCES,21755,15675,28199,24380,15518
ADD_TO_CART,21501,21355,17086,17172,19392
PRODUCT_PAGE_VIEWS,41587,41392,46559,35612,44692
SEARCH_PAGE_VIEWS,45582,45456,51972,41043,55954
GENDER,F,F,M,M,F


In [19]:
# Converting objects to category

data.loc[:5, data.dtypes == 'object'] = data.select_dtypes(['object']).apply(lambda x: x.astype('category'))

Checking for missing values.

In [34]:
data.isnull().sum()

DAY                    0
VISITS                 0
ORDERS                 0
HAS_PURCHASED_PRIOR    0
DEVICE                 0
BOUNCES                0
ADD_TO_CART            0
PRODUCT_PAGE_VIEWS     0
SEARCH_PAGE_VIEWS      0
GENDER                 0
AGE                    0
INCOME                 0
dtype: int64

In [40]:
# data.isnull().sum(axis=1)

Check for duplicates

In [35]:
data.duplicated().sum()

0

**Findings and Actions**

- No missing values.
- Convert `object` data types to `category`.
- No duplicates.


In [36]:
data.head()

Unnamed: 0,DAY,VISITS,ORDERS,HAS_PURCHASED_PRIOR,DEVICE,BOUNCES,ADD_TO_CART,PRODUCT_PAGE_VIEWS,SEARCH_PAGE_VIEWS,GENDER,AGE,INCOME
0,2013-01-01,64340,2312,N,iPhone,21755,21501,41587,45582,F,24,451529
1,2013-01-02,63958,2427,N,iPhone,15675,21355,41392,45456,F,22,384768
2,2013-01-03,67390,2230,Y,iPhone,28199,17086,46559,51972,M,71,283793
3,2013-01-04,58305,1814,N,iPhone,24380,17172,35612,41043,M,51,417355
4,2013-01-05,74434,2333,Y,iPhone,15518,19392,44692,55954,F,32,99205


In [37]:
data.describe()

Unnamed: 0,VISITS,ORDERS,BOUNCES,ADD_TO_CART,PRODUCT_PAGE_VIEWS,SEARCH_PAGE_VIEWS,AGE,INCOME
count,5110.0,5110.0,5110.0,5110.0,5110.0,5110.0,5110.0,5110.0
mean,183480.110959,6410.258904,60133.768689,54968.698239,119285.09726,137587.489237,48.517417,258453.877886
std,250006.872086,8775.732587,85182.11411,75401.482153,162983.051841,187651.313442,18.147166,139310.205224
min,1518.0,46.0,343.0,394.0,941.0,1068.0,18.0,15049.0
25%,22844.0,799.0,7101.5,6775.5,14765.0,17177.75,33.0,138866.0
50%,60452.5,2095.5,19077.5,18086.0,39293.0,45558.0,48.0,260371.5
75%,284524.25,9646.75,77862.5,82423.75,183667.5,214094.75,64.0,379528.75
max,824880.0,32895.0,369338.0,284697.0,575068.0,655291.0,80.0,499838.0


Checking for unique values in every column

In [42]:
data.nunique()

DAY                     365
VISITS                 5024
ORDERS                 3706
HAS_PURCHASED_PRIOR       2
DEVICE                    7
BOUNCES                4904
ADD_TO_CART            4869
PRODUCT_PAGE_VIEWS     4979
SEARCH_PAGE_VIEWS      5023
GENDER                    2
AGE                      63
INCOME                 5079
dtype: int64

<a id="feat_eng"></a>
# Feature Engineering

In [52]:
bin_labels = ["{0} - {1}".format(age+1, age + 5) for age in range(15, 80, 5)]
bin_labels

['16 - 20',
 '21 - 25',
 '26 - 30',
 '31 - 35',
 '36 - 40',
 '41 - 45',
 '46 - 50',
 '51 - 55',
 '56 - 60',
 '61 - 65',
 '66 - 70',
 '71 - 75',
 '76 - 80']

In [53]:
data['AGE_GROUP'] = pd.cut(x=data['AGE'], bins=len(bin_labels), labels=bin_labels)


In [60]:
data['CONVERSION_RATE'] = data['ORDERS']/data['VISITS']
data['BOUNCE_RATE'] = data['BOUNCES']/data['VISITS']
data['ADD_TO_CART_RATE'] = data['ADD_TO_CART']/data['VISITS']

<a id="eda"></a>
# Exploratory Data Analysis

### Univariate Analysis