# Toronto.ca Website Traffic and Behavior Analysis
*This is a personal portfolio project by Thamilini P.*

This project focuses on website traffic and users' behavior. Toronto.ca is a city website that markets the city's attractions, provides information and services for residents.  

### A breakdown of the 5 Phases of this project
* **[Ask Phase](#section-one)** : 
    - Define the problem
* **[Prepare Phase](#section-two)** : 
    - Make sure the data is credible and unbiased 
    - Organize data 
    - save datasets as R dataframes
* **[Process Phase](#section-three)** : 
    - Explore and clean data
    - Create and transform data
* **[Analyze Phase](#section-four)** :
    - Format and transform data
    - Identify patterns and draw conclusions
    - Make predictions and recommendations
    - Make data-driven decisions
* **Share Phase (Completed using Tableau)** :
    - Understand visulization
    - Create effective visuals
    - Bring data to life
    - Use Data storytelling
    - Communicate to help others understand results



<a id="section-one"></a>
## Ask Phase

### Purpose/Objective

Analyze traffic and behavior of visitors to a website to improve users' experience.

<a id="section-two"></a>
## Prepare Phase

### Datasets License

All datasets used were made available by the [City of Toronto Open Data Portal]("https://open.toronto.ca/about/") under the [Open Government Licence – Toronto]("https://open.toronto.ca/open-data-license/").

### Load R Packages

In [None]:
library(tidyverse) # metapackage of all tidyverse packages
install.packages("skimr")
library(skimr)
library(janitor)
library(lubridate)
library(ggplot2)

### Import datasets as dataframe

In [None]:

Key_Metrics <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Key Metrics.csv")

Browser <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Browser.csv")

Cities <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Cities.csv")

Countries <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Countries.csv")

Hits_by_Hour_of_Day <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Hits by Hour of Day.csv")

Mobile_Devices <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Mobile Devices.csv")

New_Return_Visitors <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/New vs. Return Visitors.csv")

Referring_Site <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Referring Site.csv")

Top_Pages <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Top Pages.csv")

Visits_by_Weekday <- read_csv("/kaggle/input/webanalyticsmonthlyreport/202211/Visits by Day of Week.csv")

### Data Credibility/Bias

All datasets were downloaded from the City of Toronto Open Data Portal and contains web analytics during November 2022 (Last refreshed Dec 4, 2022). Therefore this is original data that is cited and current.

<a id="section-three"></a>
## Process Phase

#### Data Understanding

We have 10 different dataframes (i.e. data tables) that do not seem to have a relationship with each other. We will explore each dataframe and check if there is a column that can be used to merge with another dataframe. For each dataframe we will use the following:
* get_dupe() will be used to check for duplicate rows
* skim_without_charts() will list column names, its datatype, number of missing values, and summary statistics.
* head() will print the first 6 rows of the data frame

In [None]:
skim_without_charts(Key_Metrics)
get_dupes(Key_Metrics)
head(Key_Metrics)

Let define some terms:

* Bounce Rate is the percentage of users that enters a site and leaves without taking an action (like viewing other pages, clicking a link, filling a form, etc).
* Screen Views Per Session indicates the average number of screens/pages a user views per session.
* Sessions(New) is the number of sessions that consists of new visitors to the website.

Let's add a new column with the exact date.

#### Total Sessions

Calculate total sessions for November

In [None]:
total_sessions <- sum(Key_Metrics$Sessions[-1])

total_sessions

#### Total Views

In [None]:
total_views <- sum(Key_Metrics$Views[-1])

total_views

#### Average Session Duration

Let's determine the average session duration for the entire month of November.

In [None]:
average_session_duration <- Key_Metrics %>% 
  summarise(avg_session_duration = mean(`Avg Session Duration (Sec)`))

#convert to a period
seconds_to_period(average_session_duration)

So the average session duration for November 2022 was about 5 Minutes 36 Seconds.

#### Average Bounce Rate

Let's calculate the average bounce rate.

In [None]:
avg_bounce_rate <- Key_Metrics %>% 
  summarise(avg_bounce_rate = mean(`Bounce Rate %`))

avg_bounce_rate

#### Average Screen Views per Session

Let's calculuate the average screen views per session.

In [None]:
avg_screen_views <- Key_Metrics %>% 
  summarise(avg_screen_views = mean(`Screen Views per Session`))

avg_screen_views

In [None]:
skim_without_charts(Browser)
get_dupes(Browser)
head(Browser)

There are 18 Unique Browser Names

In [None]:
skim_without_charts(Cities)
get_dupes(Cities)
head(Cities)

In [None]:
skim_without_charts(Countries)
get_dupes(Countries)
head(Countries)

In [None]:
skim_without_charts(Hits_by_Hour_of_Day)
get_dupes(Hits_by_Hour_of_Day)
head(Hits_by_Hour_of_Day)

Let's create a column with the 12 hour time.
#### WHY? 
*12 hour time is easier for viewers to read and understand*

In [None]:
Hits_by_Hour_of_Day <- Hits_by_Hour_of_Day %>% 
  arrange(`Hour of Day`) %>% 
  mutate(time = case_when(
    `Hour of Day` > 12 ~ paste(`Hour of Day`-12,"PM",sep = " "),
    `Hour of Day` == 12 ~ "12 PM",
    `Hour of Day` == 0 ~ "12 AM",
    `Hour of Day` < 12 & `Hour of Day` > 0  ~ paste(`Hour of Day`,"AM",sep = " ")) 
  )

head(Hits_by_Hour_of_Day)

In [None]:
skim_without_charts(New_Return_Visitors)
get_dupes(New_Return_Visitors)
head(New_Return_Visitors)

In [None]:
skim_without_charts(Referring_Site)
get_dupes(Referring_Site)
head(Referring_Site)

In [None]:
skim_without_charts(Top_Pages)
get_dupes(Top_Pages)
head(Top_Pages)

Lets create a column with the average view time in minutes.

In [None]:
Top_Pages <- Top_Pages %>% 
  mutate(avg_view_time_mins = seconds_to_period(`Avg View Time (Sec)`))

head(Top_Pages)

In [None]:
skim_without_charts(Visits_by_Weekday)
get_dupes(Visits_by_Weekday)
head(Visits_by_Weekday)

Now that we've explored the structure of each dataframe, it's clear that no two dataframes can be merged together *(because there is no foreign key)*. We will derive insights from the tables separately.

<a id="section-four"></a>
## Analyze Phase



Let's determine when website traffic (by sessions) peaks. So let's plot Hits_by_Hour_of_Day by Hour and Session to view trends.

In [None]:
Hits_by_Hour_of_Day <- slice(Hits_by_Hour_of_Day,1:24)
ggplot(Hits_by_Hour_of_Day,aes(`Hour of Day`,Sessions))+geom_bar(stat='identity')

We can see that a significant amount of sessions occurred between 9 AM and 4 PM (16th Hour). Let's calculate the total amount of sessions.

#### Hourly and Weekday Activity

In [None]:
sessions_between_9AM_4PM <- Hits_by_Hour_of_Day %>%
  filter(`Hour of Day`>= 9 & `Hour of Day` <= 16) %>% 
  summarise(session_pct = sum(Sessions))

sessions_between_9AM_4PM

(sessions_between_9AM_4PM/total_sessions)*100

This means that 54.7% of total sessions occurred between 9 AM and 4 PM.

Let's take a look at sessions by weekday

In [None]:
Visits_by_Weekday <- slice(Visits_by_Weekday,2:n())
ggplot(Visits_by_Weekday, aes(`Day of Week`,Sessions)) + geom_bar(stat='identity')

We can see that majority of sessions occurred on a weekday versus weekend. Let's determine exactly how much.

In [None]:
#determine number of sessions occurred on a weekday

weekday_sessions <- Visits_by_Weekday %>% 
  filter(`Day of Week` != "Saturday" & `Day of Week` != "Sunday") %>% 
  summarise(weekday_sessions = sum(Sessions))

weekday_sessions_pct <- (weekday_sessions/total_sessions)*100

weekend_sessions <- Visits_by_Weekday %>% 
  filter(`Day of Week` == "Saturday" | `Day of Week` == "Sunday") %>% 
  summarise(weekend_sessions = sum(Sessions))

weekend_sessions_pct <- (weekend_sessions/total_sessions)*100 

weekday_sessions 
weekday_sessions_pct
weekend_sessions
weekend_sessions_pct

Therefore 82.5% of Sessions occurred on a Weekday and 17.5% Sessions occurred on a Weekend.

#### % of New vs. Returning Users

Let's calculate the percentage of new users and returning users.

In [None]:
#Remove the first row
New_Return_Visitors <- slice(New_Return_Visitors, 2:n())

total_users <- New_Return_Visitors %>% 
  summarise(total_users = sum(Users))

total_users

New_Return_Visitors <- New_Return_Visitors %>% 
  mutate(Users_pct = (Users/sum(Users))*100)

head(New_Return_Visitors)

ggplot(New_Return_Visitors, aes(x="", y=Users_pct, fill=`New-Returning User`)) + geom_bar(stat="identity", width=1) + coord_polar("y", start=0)

So 81.9% of total users are new users, while only 18.1% of total users are returning users.

#### Top 5 Browsers and Mobile Devices

In [None]:
head(Browser)

head(Mobile_Devices)

This filter is useful to determine the website is accessible through the listed Browsers and Devices.
Let calculate the % of sessions involved a Mobile Device.

In [None]:
mobile_devices_total_session <- Mobile_Devices %>% 
  summarise(mobile_devices_total_session = sum(Sessions[-1]))

mobile_devices_pct <- (mobile_devices_total_session/total_sessions)*100

mobile_devices_total_session 

mobile_devices_pct

Only 6.4% of Total Sessions involved a Mobile Device. 

#### Most Popular Page

In [None]:
head(Top_Pages)

The toronto.ca Homepage was the most popular page (by session) and was viewed for an average of 2 Minutes and 3 Seconds.

#### Recommendations

The purpose of a city website is to provide information and services for residents. 

City website are meant to market the city attractions and help residents be informed with official announcements and alerts. Therefore it is important that users regularly visit the website.

* **Insight**: Only 18.1% of total users are returning users and toronto.ca is the most popular page.
  + **Recommendations**: Retain users by the following:
    - Collect user email address and create an newsletter
    - Use Push Notifications on browser or mobile app
    - Get Users to follow you on social media

This steps can contain links that bring users back to your site. 
Since toronto.ca is the most popular page, this page should have links to email subscriptions and push notifications. The site could also integrate and embed the social media directly on the page. 

* **Insight**: People are more likely to visit your site between 9 AM and 4 PM on a weekday (i.e. during working hours). 
  + **Recommendations**: 
    - Post social media content and email campaigns during these times.
    - Plan when to make changes to your website
    - Plan site maintenance outside of these hours 
    
* **Insight**: Only 6.4% of total Sessions involved a Mobile Device.
One reason for this could be that the website is not mobile friendly. We could conduct a survey to ask users which device they prefer to use on the website and why.
  + **Recommendations**: As of November 2022, about half of worldwide web traffic came from mobile devices. So it is imporant that the site is mobile friendly. 
    - Make website responsive to different mobile devices screen sizes.
    - Optimise online forms for mobiles
    - Improve mobile navigation
    - Optimize mobile page speed
    
* **Insight**: The average session duration is 5 minutes 36 seconds
  + **Recommendations**: An AI Chatbot or live chat can help increase session duration and help users get answers quickly 
  
  
#### Appendix
* [https://drudesk.com/blog/build-municipal-website]("https://drudesk.com/blog/build-municipal-website")
* [https://www.monsterinsights.com/proven-ways-to-increase-your-returning-visitor-rate/]("https://www.monsterinsights.com/proven-ways-to-increase-your-returning-visitor-rate/")
* [https://explodingtopics.com/blog/mobile-internet-traffic]("https://explodingtopics.com/blog/mobile-internet-traffic")
* [https://www.vertical-leap.uk/blog/9-reasons-website-doesnt-work-mobile/]("https://www.vertical-leap.uk/blog/9-reasons-website-doesnt-work-mobile/")