## Main Question
To analyze and optimize web traffic and user engagement!

### Data Understanding
The given dataset contains the following columns:

1. Session primary channel group: The marketing channel (e.g., Direct, Organic Social)
2. Date + hour (YYYYMMDDHH): The specific date and hour of the session
3. Users: Number of users in a given time period
4. Sessions: Number of sessions in that period
5. Engaged sessions: Number of sessions with significant user engagement
6. Average engagement time per session: The average time a user is engaged per session
7. Engaged sessions per user: Ratio of engaged sessions to total sessions per user
8. Events per session: Average number of events (actions taken) per session
9. Engagement rate: The proportion of sessions that were engaged
10. Event count: Total number of events during the period

The primary objective is to analyze and optimize web traffic and user engagement, focusing on:

1. **Session Analysis**: Understanding the temporal distribution and trends in web sessions and user visits to identify peak times and low-traffic periods.
2. **User Engagement Analysis**: Evaluating how engaged users are during their sessions across different channels, aiming to enhance user interaction and satisfaction.
3. **Channel Performance**: Assessing the effectiveness of various traffic channels in attracting and retaining users, to optimize marketing spend and strategy.
4. **Website Traffic Forecasting**: Predicting future traffic trends to better allocate resources and tailor content delivery according to predicted user demand.

In [1]:
import pandas as pd

In [4]:
dataset = pd.read_csv("Datasets/data-export.csv")

In [5]:
dataset.head()

Unnamed: 0,Session primary channel group (Default channel group),Date + hour (YYYYMMDDHH),Users,Sessions,Engaged sessions,Average engagement time per session,Engaged sessions per user,Events per session,Engagement rate,Event count
0,Direct,2024041623,237,300,144,47.526667,0.607595,4.673333,0.48,1402
1,Organic Social,2024041719,208,267,132,32.097378,0.634615,4.29588,0.494382,1147
2,Direct,2024041723,188,233,115,39.939914,0.611702,4.587983,0.493562,1069
3,Organic Social,2024041718,187,256,125,32.160156,0.668449,4.078125,0.488281,1044
4,Organic Social,2024041720,175,221,112,46.918552,0.64,4.529412,0.506787,1001


In [6]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3182 entries, 0 to 3181
Data columns (total 10 columns):
 #   Column                                                 Non-Null Count  Dtype  
---  ------                                                 --------------  -----  
 0   Session primary channel group (Default channel group)  3182 non-null   object 
 1   Date + hour (YYYYMMDDHH)                               3182 non-null   int64  
 2   Users                                                  3182 non-null   int64  
 3   Sessions                                               3182 non-null   int64  
 4   Engaged sessions                                       3182 non-null   int64  
 5   Average engagement time per session                    3182 non-null   float64
 6   Engaged sessions per user                              3182 non-null   float64
 7   Events per session                                     3182 non-null   float64
 8   Engagement rate                                 

In [8]:
dataset.describe()

Unnamed: 0,Date + hour (YYYYMMDDHH),Users,Sessions,Engaged sessions,Average engagement time per session,Engaged sessions per user,Events per session,Engagement rate,Event count
count,3182.0,3182.0,3182.0,3182.0,3182.0,3182.0,3182.0,3182.0,3182.0
mean,2024043000.0,41.935889,51.192646,28.325581,66.644581,0.60645,4.675969,0.503396,242.27247
std,2695.099,29.582258,36.919962,20.650569,127.200659,0.264023,2.795228,0.228206,184.440313
min,2024041000.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0
25%,2024041000.0,20.0,24.0,13.0,32.103034,0.561404,3.75,0.442902,103.0
50%,2024042000.0,42.0,51.0,27.0,49.020202,0.666667,4.410256,0.545455,226.0
75%,2024043000.0,60.0,71.0,41.0,71.487069,0.75,5.21769,0.633333,339.0
max,2024050000.0,237.0,300.0,144.0,4525.0,2.0,56.0,1.0,1402.0


#### Data Prepration
Preparing and summarizing the dataset for Time Series Analysis, focusing on how user engagement (through sessions) varies by time.  
By converting data into appropriate types and grouping it by time, we can easily perform operations like plotting TimeSeries graphs, calculating moving averages, or applying TimeSeries forecasting models.

In [9]:
# Converting the date column into an appropriate datetime format!
dataset['Date + hour (YYYYMMDDHH)'] = pd.to_datetime(dataset['Date + hour (YYYYMMDDHH)'], format='%Y%m%d%H')

In [13]:
dataset['Users'] = pd.to_numeric(dataset['Users'])
dataset['Sessions'] = pd.to_numeric(dataset['Sessions'])

In [14]:
# Group data by date and sum up the users and sessions
groupped_data = dataset.groupby(dataset['Date + hour (YYYYMMDDHH)']).agg({'Users':'sum', 'Sessions':'sum'})