## **Data Description**

1. `Session primary channel group:` The marketing channel (e.g., Direct, Organic Social)
2. `Date + hour (YYYYMMDDHH):` The specific date and hour of the session
3. `Users:` Number of users in a given time period
4. `Sessions:` Number of sessions in that period
5. `Engaged sessions:` Number of sessions with significant user engagement
6. `Average engagement time per session:` The average time a user is engaged per session
7. `Engaged sessions per user:` Ratio of engaged sessions to total sessions per user
8. `Events per session:` Average number of events (actions taken) per session
9. `Engagement rate:` The proportion of sessions that were engaged
10. `Event count:` Total number of events during the period

In [1]:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# importing the data
data = pd.read_csv('website-performance-data.csv')

In [3]:
data.head()

Unnamed: 0,# ----------------------------------------,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9
0,Session primary channel group (Default channel...,Date + hour (YYYYMMDDHH),Users,Sessions,Engaged sessions,Average engagement time per session,Engaged sessions per user,Events per session,Engagement rate,Event count
1,Direct,2024041623,237,300,144,47.526666666666700,0.6075949367088610,4.673333333333330,0.48,1402
2,Organic Social,2024041719,208,267,132,32.09737827715360,0.6346153846153850,4.295880149812730,0.4943820224719100,1147
3,Direct,2024041723,188,233,115,39.93991416309010,0.6117021276595740,4.587982832618030,0.49356223175965700,1069
4,Organic Social,2024041718,187,256,125,32.16015625,0.6684491978609630,4.078125,0.48828125,1044


In [4]:
# getting hold of the column names
data.columns = data.iloc[0]

# removing the 0 row
data = data.drop(0).reset_index(drop=True)

In [5]:
data.head()

Unnamed: 0,Session primary channel group (Default channel group),Date + hour (YYYYMMDDHH),Users,Sessions,Engaged sessions,Average engagement time per session,Engaged sessions per user,Events per session,Engagement rate,Event count
0,Direct,2024041623,237,300,144,47.526666666666706,0.607594936708861,4.67333333333333,0.48,1402
1,Organic Social,2024041719,208,267,132,32.0973782771536,0.634615384615385,4.29588014981273,0.49438202247191,1147
2,Direct,2024041723,188,233,115,39.9399141630901,0.611702127659574,4.58798283261803,0.493562231759657,1069
3,Organic Social,2024041718,187,256,125,32.16015625,0.668449197860963,4.078125,0.48828125,1044
4,Organic Social,2024041720,175,221,112,46.9185520361991,0.64,4.52941176470588,0.506787330316742,1001


In [6]:
# checking the shape of the dataset
data.shape

(3182, 10)

In [7]:
# checking dtypes
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3182 entries, 0 to 3181
Data columns (total 10 columns):
 #   Column                                                 Non-Null Count  Dtype 
---  ------                                                 --------------  ----- 
 0   Session primary channel group (Default channel group)  3182 non-null   object
 1   Date + hour (YYYYMMDDHH)                               3182 non-null   object
 2   Users                                                  3182 non-null   object
 3   Sessions                                               3182 non-null   object
 4   Engaged sessions                                       3182 non-null   object
 5   Average engagement time per session                    3182 non-null   object
 6   Engaged sessions per user                              3182 non-null   object
 7   Events per session                                     3182 non-null   object
 8   Engagement rate                                        318

In [8]:
# changing datatype of 'Date + hour (YYYYMMDDHH)' to datetime
data['Date + hour (YYYYMMDDHH)'] = pd.to_datetime(data['Date + hour (YYYYMMDDHH)'], format='%Y%m%d%H')

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3182 entries, 0 to 3181
Data columns (total 10 columns):
 #   Column                                                 Non-Null Count  Dtype         
---  ------                                                 --------------  -----         
 0   Session primary channel group (Default channel group)  3182 non-null   object        
 1   Date + hour (YYYYMMDDHH)                               3182 non-null   datetime64[ns]
 2   Users                                                  3182 non-null   object        
 3   Sessions                                               3182 non-null   object        
 4   Engaged sessions                                       3182 non-null   object        
 5   Average engagement time per session                    3182 non-null   object        
 6   Engaged sessions per user                              3182 non-null   object        
 7   Events per session                                     3182 non-null 

In [10]:
# converting the columns datatype to numeric
for col in data.columns[2:]:
    data[col] = pd.to_numeric(data[col])

In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3182 entries, 0 to 3181
Data columns (total 10 columns):
 #   Column                                                 Non-Null Count  Dtype         
---  ------                                                 --------------  -----         
 0   Session primary channel group (Default channel group)  3182 non-null   object        
 1   Date + hour (YYYYMMDDHH)                               3182 non-null   datetime64[ns]
 2   Users                                                  3182 non-null   int64         
 3   Sessions                                               3182 non-null   int64         
 4   Engaged sessions                                       3182 non-null   int64         
 5   Average engagement time per session                    3182 non-null   float64       
 6   Engaged sessions per user                              3182 non-null   float64       
 7   Events per session                                     3182 non-null 

In [12]:
data.describe()

Unnamed: 0,Date + hour (YYYYMMDDHH),Users,Sessions,Engaged sessions,Average engagement time per session,Engaged sessions per user,Events per session,Engagement rate,Event count
count,3182,3182.0,3182.0,3182.0,3182.0,3182.0,3182.0,3182.0,3182.0
mean,2024-04-20 01:17:07.278441216,41.935889,51.192646,28.325581,66.644581,0.60645,4.675969,0.503396,242.27247
min,2024-04-06 00:00:00,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0
25%,2024-04-13 02:15:00,20.0,24.0,13.0,32.103034,0.561404,3.75,0.442902,103.0
50%,2024-04-20 02:00:00,42.0,51.0,27.0,49.020202,0.666667,4.410256,0.545455,226.0
75%,2024-04-26 22:00:00,60.0,71.0,41.0,71.487069,0.75,5.21769,0.633333,339.0
max,2024-05-03 23:00:00,237.0,300.0,144.0,4525.0,2.0,56.0,1.0,1402.0
std,,29.582258,36.919962,20.650569,127.200659,0.264023,2.795228,0.228206,184.440313


In [13]:
data.columns

Index(['Session primary channel group (Default channel group)',
       'Date + hour (YYYYMMDDHH)', 'Users', 'Sessions', 'Engaged sessions',
       'Average engagement time per session', 'Engaged sessions per user',
       'Events per session', 'Engagement rate', 'Event count'],
      dtype='object', name=0)