# BIG DATA & AI Bootcamp

### Machine Learning Track (2)
### Capstone Project
### FinTech startup data

### 2nd Notebook: EDA Notebook

#### Team Name: Desert Ninjas
#### Team Members:
1. Reema Alaswad
2. Maha  Alhazzani
3. Aljohara Alkanhal
4. Raghad Aleisa
5. Eman Aldosari

***

#### Project Objective:
###### Predict customer behavior and activity logs to see if the customer would invest in the listed company, not or did he/she leave the drop out of the investment last minute?. 

#### Dataset Description:

This dataset provides customers activity logs, including the type of triggers they make on the website, e.g. Pageleave, the operating system, browser, browser version, device type, screen height, and width, and the view port’s height and width of the device they’re using, the pathname and the current URL they’re browsing, the timestamp for each URL, the host they’re using, i.e. whether they’re a user or a system tester, from where were they referred to the company’s website. 
The dataset consists of 13 features and 95818 records, which were taken for one week only, from September 8th to September 15th, 2022.

#### Dataset Columns:
1. Type: The type of trigger that gets fired when the user makes an action with the curser.[Autocapture, Pageview, Pageleave]
2. OS: Type of operating system. [IOS, Windows, Android, Mac OS X, Linux, Chrome OS]
3. Host: Company name. [company.sa,prelive- company.manafatech.com, devlocal. company.sa]
4. Pathname: The parent path of the current URL.
5. Current_URL: The current URL the user is in.
6. Referrer: The previous URL made the user progress to the current one.
7. Referring Domain: The parent path of the previous URL made the user progress to the current one.
8. Browser: Browser type. [Chrome, Mobile Safari, Microsoft Edge, Chrome iOS, Safari, Samsung Internet, Firefox, Android Mobile]
9. Browser Version: The version of the browser the user is using.
10. Screen Height: The screen height of the device the user is using.
11. Screen Width: The screen width of the device the user is using.
12. Viewport Height: The viewport of the device the user is using.
13. Viewport Width: The viewport of the device the user is using.
14. Time: The page open time (Timestamp).
15. Event type: What action the user did on a specific page. [Click, Change, Submit]
16. Device Type: The device type the user is using while browsing the website. [Mobile, Desktop, Tablet]
17. Session ID: A unique ID for each user’s visit.
18. Window ID: A unique ID for each user.



#### 🔹 2nd Notebook: EDA Notebook🔹
1. Importing Packages
2. Dataset uploading (Full Pre-processed Dataset/ Pre-processed Dataset "unique visitations")
3. EDA - Exploratory Data Analysis
   

### 1. Importing Packages

In [2]:
# Importing all necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns
import datetime
from pandas_profiling import ProfileReport

''' Plots '''
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
from plotly.offline import plot

In [3]:
#from google.colab import drive
#drive.mount('/content/drive')

### 2. Datasets uploading

In [42]:
# Uploading the first full preprossesed dataset 
# df = dataframe
df = pd.read_csv("Preprocessed_full_dataset.csv")

In [7]:
# Uploading the second dataset which represent unique user visits; grouped by sessions id 
# df = dataframe
df_visits = pd.read_csv('Final_extracted_dataset.csv')

In [8]:
# 1- Supseting the df_visits dataframe by Investment visitations only 
df_visits_investor = df_visits.loc[(df_visits.invest == "Yes")]

# 2- Supseting the df_visits dataframe by droped out of investment visitations 
df_visits_Potential = df_visits.loc[(df_visits.invest == "Maybe")]

# 3- Supseting the df_visits dataframe by non-investment visitations 
df_visits_No = df_visits.loc[(df_visits.invest == "No")]

#### 🔽 2.1 Basic Statistics of dataframes:

In [43]:
#View the first dataframe
pd.set_option('display.max_columns', None) # to take a glance at all df coulmns
df.head()

Unnamed: 0.1,Unnamed: 0,Type,session_id,window_id,browser,device_type,os,host,current_url,referrer,referring_domain,pathname,path1,path2,path3,path4,path5,number_of_pages,event_type,date,year,month,day,month_name,day_name,week_label,time,day_parts,browser_version,screen_height,screen_width,viewport_height,viewport_width
0,0,autocapture,1831e4150303a0-04a83b401e337e-26021c51-144000-...,1831e4150327d1-0f02fd620ee793-26021c51-144000-...,Chrome,Desktop,Windows,company.sa,https//company.sa/investor/dashboard,direct,direct,/investor/dashboard,investor,dashboard,,,,2,click,08/09/2022,2022,9,8,September,Thursday,Weekend,17:59:02,Evening,105.0,864,1536,714.0,1536.0
1,1,autocapture,1831e4150303a0-04a83b401e337e-26021c51-144000-...,1831e4150327d1-0f02fd620ee793-26021c51-144000-...,Chrome,Desktop,Windows,company.sa,https//company.sa/investor/investment-portfolio,https//company.sa/investor/investment-portfolio,company.sa,/investor/investment-portfolio,investor,investment-portfolio,,,,2,click,08/09/2022,2022,9,8,September,Thursday,Weekend,18:00:28,Evening,105.0,864,1536,714.0,1536.0
2,2,autocapture,1831e4150303a0-04a83b401e337e-26021c51-144000-...,1831e4150327d1-0f02fd620ee793-26021c51-144000-...,Chrome,Desktop,Windows,company.sa,https//company.sa/investor/investment-portfolio,https//company.sa/investor/investment-portfolio,company.sa,/investor/investment-portfolio,investor,investment-portfolio,,,,2,click,08/09/2022,2022,9,8,September,Thursday,Weekend,18:06:05,Evening,105.0,864,1536,714.0,1536.0
3,3,autocapture,1831e4150303a0-04a83b401e337e-26021c51-144000-...,1831e4150327d1-0f02fd620ee793-26021c51-144000-...,Chrome,Desktop,Windows,company.sa,https//company.sa/investor/investment-portfolio,https//company.sa/investor/investment-portfolio,company.sa,/investor/investment-portfolio,investor,investment-portfolio,,,,2,click,08/09/2022,2022,9,8,September,Thursday,Weekend,18:05:46,Evening,105.0,864,1536,714.0,1536.0
4,4,autocapture,1831e4150303a0-04a83b401e337e-26021c51-144000-...,1831e4150327d1-0f02fd620ee793-26021c51-144000-...,Chrome,Desktop,Windows,company.sa,https//company.sa/investor/investment-portfolio,https//company.sa/investor/investment-portfolio,company.sa,/investor/investment-portfolio,investor,investment-portfolio,,,,2,click,08/09/2022,2022,9,8,September,Thursday,Weekend,18:01:07,Evening,105.0,864,1536,714.0,1536.0


In [44]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95818 entries, 0 to 95817
Data columns (total 33 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        95818 non-null  int64  
 1   Type              95818 non-null  object 
 2   session_id        95818 non-null  object 
 3   window_id         95818 non-null  object 
 4   browser           95818 non-null  object 
 5   device_type       95818 non-null  object 
 6   os                95818 non-null  object 
 7   host              95818 non-null  object 
 8   current_url       95818 non-null  object 
 9   referrer          95818 non-null  object 
 10  referring_domain  95818 non-null  object 
 11  pathname          95818 non-null  object 
 12  path1             95818 non-null  object 
 13  path2             95818 non-null  object 
 14  path3             38929 non-null  object 
 15  path4             24716 non-null  object 
 16  path5             2517 non-null   object

In [45]:
df.describe()

Unnamed: 0.1,Unnamed: 0,number_of_pages,year,month,day,browser_version,screen_height,screen_width,viewport_height,viewport_width
count,95818.0,95818.0,95818.0,95818.0,95818.0,95818.0,95818.0,95818.0,95818.0,95818.0
mean,47908.5,2.690497,2022.0,9.0,13.000417,63.145437,889.078962,853.406364,746.811075,849.292873
std,27660.418384,0.913613,0.0,0.0,0.740582,43.743653,126.07445,638.672932,144.454272,622.996206
min,0.0,2.0,2022.0,9.0,8.0,12.0,378.0,320.0,173.0,116.0
25%,23954.25,2.0,2022.0,9.0,13.0,15.6,812.0,390.0,661.0,390.0
50%,47908.5,2.0,2022.0,9.0,13.0,100.0,864.0,414.0,722.0,414.0
75%,71862.75,4.0,2022.0,9.0,13.0,105.0,926.0,1368.0,793.0,1368.0
max,95817.0,5.0,2022.0,9.0,15.0,106.0,1440.0,3440.0,2652.0,3440.0


In [46]:
df.describe(include= object)

Unnamed: 0,Type,session_id,window_id,browser,device_type,os,host,current_url,referrer,referring_domain,pathname,path1,path2,path3,path4,path5,event_type,date,month_name,day_name,week_label,time,day_parts
count,95818,95818,95818,95818,95818,95818,95818,95818,95818,95818,95818,95818,95818,38929,24716,2517,95818,95818,95818,95818,95818,95818,95818
unique,3,2949,3499,8,3,6,3,976,563,5,887,5,24,133,732,11,4,8,1,7,2,34808,3
top,autocapture,1833ad7c7a5471-041a2e00eda85f-26021c51-144000-...,1833ad7c7a7f33-0370f7fe91e184-26021c51-144000-...,Chrome,Mobile,iOS,company.sa,https//company.sa/investor/investment-portfolio,https//company.sa/investor/investment-portfolio,company.sa,/investor/investment-portfolio,investor,investment-portfolio,Vn18,form-step1,RsSnrZKP0kY,click,13/09/2022,September,Tuesday,Weekend,10:00:17,Morning
freq,60404,576,576,40285,59881,42498,95651,21921,18885,81913,25770,86384,25770,14081,22252,2338,49033,50417,95818,50417,76060,134,63137


In [13]:
#View the second dataframe
pd.set_option('display.max_columns', None) # to take a glance at all df coulmns
df_visits.head()

Unnamed: 0.1,Unnamed: 0,session_id,Type,window_id,browser,device_type,os,host,current_url,referrer,referring_domain,pathname,path1,path2,path3,path4,path5,number_of_pages,event_type,date,year,month,day,month_name,day_name,week_label,time,day_parts,browser_version,screen_height,screen_width,viewport_height,viewport_width,min_time,max_time,duration_hours,duration_minutes,duration_seconds,total_pages,invest
0,0.0,1831e4150303a0-04a83b401e337e-26021c51-144000-...,pageview,1831e4150327d1-0f02fd620ee793-26021c51-144000-...,Chrome,Desktop,Windows,company.sa,https//company.sa/support/investor/list,https//company.sa/investor/investment-portfolio,direct,/support/investor/list,support,opportunity,list,,,3.0,no action,08/09/2022,2022.0,9.0,8.0,September,Thursday,Weekend,18:21:10,Evening,105.0,864.0,1536.0,714.0,1536.0,17:59:00,18:21:10,0.37,22.17,1330.0,28.0,No
1,1.0,1831fa67f29acf-03acc8ab0c94e8-26021c51-144000-...,pageview,1831fa67f2b1161-0c60cc1a8e9b28-26021c51-144000...,Chrome,Desktop,Windows,company.sa,https//company.sa/investor/opportunity/Vn16,direct,direct,/investor/opportunity/Vn16,investor,opportunity,Vn16,,,3.0,no action,09/09/2022,2022.0,9.0,9.0,September,Friday,Weekend,0:39:01,Night,105.0,864.0,1536.0,746.0,1536.0,0:29:08,0:39:01,0.16,9.88,593.0,2.0,No
2,2.0,18327baf015d6c-01519b84418c49-26021c51-144000-...,pageview,18327e6980b4a0-0c68f89b440b64-26021c51-144000-...,Chrome,Desktop,Windows,company.sa,https//company.sa/investor/transactions,https//company.sa/investor/transactions,company.sa,/investor/transactions,investor,transactions,VYZ_,,,3.0,no action,10/09/2022,2022.0,9.0,10.0,September,Saturday,Weekend,14:56:09,Evening,105.0,864.0,1536.0,714.0,1536.0,14:08:26,14:56:09,0.8,47.72,2863.0,25.0,No
3,3.0,1832bff5050393-0f084138f6836c-26021c51-1fa400-...,pageview,1832bff50533f-0a979546b04846-26021c51-1fa400-1...,Chrome,Desktop,Windows,company.sa,https//company.sa/investor/transactions,https//company.sa/investor/transactions,direct,/investor/transactions,investor,transactions,Vn18,,,3.0,no action,11/09/2022,2022.0,9.0,11.0,September,Sunday,Weekday,10:43:55,During work,105.0,1080.0,1920.0,961.0,1920.0,10:01:36,10:43:55,0.71,42.32,2539.0,24.0,No
4,4.0,1832ce7991833a-0fd268fb672d8b-1b525635-fa000-1...,pageview,1832ce7991b1dc-0b3f97fbda707a-1b525635-fa000-1...,Chrome,Desktop,Mac OS X,company.sa,https//company.sa/investor/dashboard,direct,direct,/investor/dashboard,investor,dashboard,,,,2.0,no action,11/09/2022,2022.0,9.0,11.0,September,Sunday,Weekday,14:15:56,During work,104.0,800.0,1280.0,880.0,1168.0,14:15:19,14:15:56,0.01,0.62,37.0,2.0,No


In [14]:
df_visits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2950 entries, 0 to 2949
Data columns (total 40 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        2949 non-null   float64
 1   session_id        2949 non-null   object 
 2   Type              2949 non-null   object 
 3   window_id         2949 non-null   object 
 4   browser           2949 non-null   object 
 5   device_type       2949 non-null   object 
 6   os                2949 non-null   object 
 7   host              2949 non-null   object 
 8   current_url       2949 non-null   object 
 9   referrer          2949 non-null   object 
 10  referring_domain  2949 non-null   object 
 11  pathname          2949 non-null   object 
 12  path1             2949 non-null   object 
 13  path2             2949 non-null   object 
 14  path3             2949 non-null   object 
 15  path4             2949 non-null   object 
 16  path5             2949 non-null   object 


In [15]:
df_visits.describe()

Unnamed: 0.1,Unnamed: 0,number_of_pages,year,month,day,browser_version,screen_height,screen_width,viewport_height,viewport_width,duration_hours,duration_minutes,duration_seconds,total_pages
count,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0,2949.0
mean,1474.0,3.246863,2022.0,9.0,13.147508,55.475341,882.034249,711.350627,762.925599,712.461021,0.180929,10.858962,651.539844,32.491692
std,851.447297,1.094312,0.0,0.0,0.788923,43.477255,113.998378,564.687972,131.454175,557.859819,0.391241,23.474394,1408.465581,48.112335
min,0.0,2.0,2022.0,9.0,8.0,12.0,568.0,320.0,353.0,178.0,0.0,0.0,0.0,1.0
25%,737.0,2.0,2022.0,9.0,13.0,15.6,812.0,390.0,690.0,390.0,0.01,0.43,26.0,4.0
50%,1474.0,3.0,2022.0,9.0,13.0,16.0,864.0,414.0,746.0,414.0,0.05,2.75,165.0,14.0
75%,2211.0,4.0,2022.0,9.0,14.0,105.0,926.0,1005.0,819.0,980.0,0.2,12.12,727.0,43.0
max,2948.0,5.0,2022.0,9.0,15.0,106.0,1440.0,3440.0,2652.0,3440.0,7.69,461.65,27699.0,576.0


In [16]:
df_visits.describe(include= object)

Unnamed: 0,session_id,Type,window_id,browser,device_type,os,host,current_url,referrer,referring_domain,pathname,path1,path2,path3,path4,path5,event_type,date,month_name,day_name,week_label,time,day_parts,min_time,max_time,invest
count,2949,2949,2949,2949,2949,2949,2949,2949,2949,2949,2949,2949,2949,2949.0,2949.0,2949.0,2949,2949,2949,2949,2949,2949,2949,2949,2949,2949
unique,2949,3,2949,8,3,6,3,672,423,4,639,5,21,82.0,54.0,12.0,3,8,1,7,2,2807,5,2809,2807,3
top,1831e4150303a0-04a83b401e337e-26021c51-144000-...,pageview,1831e4150327d1-0f02fd620ee793-26021c51-144000-...,Mobile Safari,Mobile,iOS,company.sa,https//company.sa/investor/opportunity/Vn18,https//company.sa/investor/investment-portfolio,direct,/investor/opportunity/Vn18,investor,opportunity,,,,no action,13/09/2022,September,Tuesday,Weekend,10:03:43,Morning,9:54:44,10:03:43,No
freq,1,2536,1,1463,2175,1630,2945,526,547,1851,531,2731,1088,882.0,1948.0,2340.0,2596,1442,2949,1442,2140,4,1471,4,4,2339


In [17]:
df_visits = df_visits.dropna()

### 3. EDA 📊

####  3.1 Visitation distribution
##### Did the visitor make an investmet? was just scrolling? or did he/she drop out on the investment (Potential investors)?

In [18]:
# interactive visitaions distribution pie chart using plotly 
fig = go.Figure(data=[go.Pie(labels=df_visits['invest'].unique(), 
                             values=df_visits['invest'].value_counts(), 
                             title_text="<b>Did the visitor make an investmet?</b>")])
fig.update_traces(marker=dict(colors=['darkblue', 'mediumturquoise', 'gold']))

fig.show()

✨ **Insights**:

*   Majority of visitors who visited the website on the period on the analysis did not make an investment, where they represent 79% ⛔
*   17% represent visitors who made a successful investment ✅
*   While 4% represents visitors who tying to initiate an investment, however they dropped out (Potential investors)






####  3.2 Sessions duration of visits (minutes) 


In [19]:
# interactive histogram using plotly 
fig = px.histogram(x=df_visits['duration_minutes'], nbins=50)
fig.update_layout(
    title="Distribution of sessions duration (minutes)",
    xaxis_title="Duration of website sessions in minutes",
    yaxis_title="Number of sessions made"
)

fig.show()

✨ **Insights**:

*   Majority of website visits remain for less than 5 minutes ⏲



We will check if weather investors stay less or more duration wise ❓

In [22]:
import plotly.express as px
fig = px.histogram(df_visits, x="invest",  nbins=50,  y="duration_minutes" ,histfunc='avg')
fig.show()

✨ **Insights**:

*  Investors have the highest average of sessions duration staying on average 19 minutes, followed by the Potential investors visitors with around 11 minutes and finally the non investor visitors staying on the website for around 9 minutes on average
* Did the Potential investor visitors return back for the investment opportunity?

In [23]:
#interactive box plot 
fig = px.box(df_visits, x="invest", y="duration_minutes", title="<b>Distribution based on sessions duration and investment</b>")
fig.show()

####  3.3 Browsers visitors are using through their website visitation


In [24]:
# interactive pie chart using plotly 
fig = go.Figure(data=[go.Pie(labels=df_visits['browser'].unique(), 
                             values=df_visits['browser'].value_counts(), 
                             title_text="<b>Web browsers visitors are using</b>")])

fig.show()

✨ **Insights**:

*  As we can visualise from the pie chart, half of the website visitations where using Chrome as their web browser when browsing the website
* Followed by Safari users counting for 34% of the visitations

In [25]:
# Creating subplots
fig = make_subplots(
   rows=1, cols=3,
   specs=[[{"type": "pie"}, {"type": "pie"}, {"type": "pie"}]],
   
)
#1- Investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_investor['browser'].unique(), 
   values=df_visits_investor['browser'].value_counts(),
   title_text="<b>Investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=1
)
#2- Potential investors visitations
fig.add_trace(go.Pie(
   labels=df_visits_Potential['browser'].unique(), 
   values=df_visits_Potential['browser'].value_counts(), 
   title_text="<b>Potential investors visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=2
)
#3- Non-investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_No['browser'].unique(), 
   values=df_visits_No['browser'].value_counts(), 
   title_text="<b>Non-investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=3
)

fig.update_layout(
    title="Web browsers visitors are using?",
)

fig.show() 

✨ **Insights**:

*  Around 47% from the investment visitations where using Mobile Safari, followed by 36% using Chrome browser
*  More than half of the Droped out from investment visitations where using Chrome as their browser, followed by 32% Samsung internet browser users
*  Half of the non-investment visitors where using Chrome as their browser, followed by Mobile Safari users counting for 35% of users


Based on these insights what devices do you think visitors use?

####  3.4  Devices users are using when visiting the website

In [26]:
# interactive pie chart using plotly 
fig = go.Figure(data=[go.Pie(labels=df_visits['device_type'].unique(), 
                             values=df_visits['device_type'].value_counts(), 
                             title_text="<b>Devices visitors are using</b>")])

fig.show()

✨ **Insights**:

*  High proportion of visitors use their desktops while browsing the website, counting for ~74% of the visitors 🖥
* Followed by Mobile users with 26% 📱
* Tablets users when visiting the website are the least users counting for only 7 visitations 📱 


In [27]:
# Creating subplots
fig = make_subplots(
   rows=1, cols=3,
   specs=[[{"type": "pie"}, {"type": "pie"}, {"type": "pie"}]],
   
)
#1- Investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_investor['device_type'].unique(), 
   values=df_visits_investor['device_type'].value_counts(),
   title_text="<b>Investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=1
)
#2- Droped out from investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_Potential['device_type'].unique(), 
   values=df_visits_Potential['device_type'].value_counts(), 
   title_text="<b>Potential investors visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=2
)
#3- Non-investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_No['device_type'].unique(), 
   values=df_visits_No['device_type'].value_counts(), 
   title_text="<b>Non-investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=3
)

fig.update_layout(
    title="Devices visitors are using?",
)

fig.show() 

✨ **Insights**:

* The majority of Investment & droped out from the investment visitations where using their Mobiles 📱
* 75% of the Non-investment visitations they where using their Desktops 🖥

####  3.5  The Operating system users are using when visiting the website

In [28]:
# interactive pie chart using plotly 
fig = go.Figure(data=[go.Pie(labels=df_visits['os'].unique(), 
                             values=df_visits['os'].value_counts(), 
                             title_text="<b>Operating system visitors are using</b>")])
fig.show()

✨ **Insights**:

*  As we can visualise from the pie chart, visitors using Windows as their Operating System have the highest proportion counting for 55% of total vistors
*  Secondly comes MacOS and IOS users, covering about 19% each 

In [29]:
# Creating subplots
fig = make_subplots(
   rows=1, cols=3,
   specs=[[{"type": "pie"}, {"type": "pie"}, {"type": "pie"}]],
   
)
#1- Investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_investor['os'].unique(), 
   values=df_visits_investor['os'].value_counts(),
   title_text="<b>Investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=1
)
#2- Droped out from investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_Potential['os'].unique(), 
   values=df_visits_Potential['os'].value_counts(), 
   title_text="<b>Potential investors visitations/b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=2
)
#3- Non-investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_No['os'].unique(), 
   values=df_visits_No['os'].value_counts(), 
   title_text="<b>Non-investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=3
)

fig.update_layout(
    title="Operating system visitors are using?",
)

fig.show() 

✨ **Insights**:

* Around half of the Investment visitations where IOS(Iphones) as their Operating systems
* More than half of the potential investor visitations where using Android OS
* More than half of the Non-investment visitations they where using Windows OS 

####  3.6  Type of window visit by users 

In [30]:
# interactive pie chart using plotly
fig = go.Figure(data=[go.Pie(labels=df_visits['Type'].unique(), 
                             values=df_visits['Type'].value_counts(), 
                             title_text="<b>Type of window visit</b>")])
fig.update_traces(hole=.4, hoverinfo="label+percent+name")
fig.show()

✨ **Insights**:

*  Majority of users visitations actions within the website are catagorized as pageview covering 86% 
*  The least users visitations actions within the website where autocapture 4%


In [31]:
# Creating subplots
fig = make_subplots(
   rows=1, cols=3,
   specs=[[{"type": "pie"}, {"type": "pie"}, {"type": "pie"}]],
   
)
#1- Investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_investor['Type'].unique(), 
   values=df_visits_investor['Type'].value_counts(),
   title_text="<b>Investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=1
)
#2- Droped out from investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_Potential['Type'].unique(), 
   values=df_visits_Potential['Type'].value_counts(), 
   title_text="<b>Potential</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=2
)
#3- Non-investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_No['Type'].unique(), 
   values=df_visits_No['Type'].value_counts(), 
   title_text="<b>Non-investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=3
)

fig.update_layout(
    title="Type of window visit",
)

fig.show() 

✨ **Insights**:

* Almost all of the Investment visitations actions within the website are catagorized as pageview with no pageleave action
* However, Droped out from the investment and Non-investment visitations had pageleave as an action

####  3.7  User visitation throughout the week

In [32]:
# Interactive histogram using plotly
fig = px.histogram(df_visits, x="day_name", category_orders={"day_name":["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Sunday"]}, 
                   )
fig.update_layout(
    title="<b>User visitation throughout the week</b>",
                   xaxis_title="Day",
                    yaxis_title="Number of visits"
)
fig.update_layout(width=1000, height=500)
fig.show()


✨ **Insights**:

*  On Tuesday, around 1442 visits happend at the website, followed by Wednesday then Monday
* There weren’t any significant visits at the weekends 🍹

Investment opportunities open mid week or at weekends? 

Investment opportunities advertisement release mid week or at weekends? 

####  3.8  Root path distribution

In [None]:
# interactive pie chart using plotly 
fig = go.Figure(data=[go.Pie(labels=df['path1'].unique(), 
                             values=df['path1'].value_counts(), 
                             title_text="<b>Root path distribution</b>")])
fig.show() 

✨ **Insights**:

* Majority of visits where at the investor web path 90% (i.e., users looking at their investment profiles, wallets and investment opportunities)
* followed by the support web path (i.e., where issues are raised, following up with previous issues raised)

####  3.9  Host path distribution

In [34]:
# Interactive histogram using plotly
fig = px.histogram(df_visits, x="host", barmode="group", title="<b>Host distribution </b>")
fig.update_layout(width=700, height=500)
fig.show()

In [33]:
df_visits["host"].value_counts() 

company.sa                     2945
prelive-company.comtech.com       3
devlocal.company.sa               1
Name: host, dtype: int64

✨ **Insights**:

*  There are testers that entered the website that needs to be exclueded ❌

####  3.10-  Referring domain distribution

In [35]:
# Interactive histogram using plotly

fig = px.histogram(df_visits, x="referring_domain", barmode="group", title="<b>Referring domain distribution </b>")
fig.update_layout(width=700, height=500)
fig.show()

✨ **Insights**:

* Majority of reffering(linking) domains where "Driect" counting for 79% of webpage visits 
* Followed by the company domain at 46%

In [36]:

#1- Investment visitations
fig1 = px.histogram(df_visits_investor, x="referring_domain", barmode="group", title="<b>Investment visitations</b>")
fig1.update_layout(width=700, height=350)
fig1.show()

#2- Potential investors visitations
fig2 = px.histogram(df_visits_Potential, x="referring_domain", barmode="group", title="<b>Potential investors visitations</b>")
fig2.update_layout(width=700, height=350)
fig2.show()

#3- Non-investment visitations
fig3 = px.histogram(df_visits_No, x="referring_domain", barmode="group", title="<b>Non-investment visitations</b>")
fig3.update_layout(width=700, height=350)
fig3.show()

✨ **Insights**:

* Most of Investors visitations where coming from the direct path
* Potential investors visitations, had a near eqaul distribution between driect and the company URL

####  3.11  User visitation around the week and by the part of the day



In [37]:
fig = px.histogram(df_visits, x="day_name", color="day_parts", barmode="group",
                   category_orders={"day_name":["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Sunday"]}, 
                   title="<b>User visitation by days of the week and by the part of the day</b>")
fig.update_layout(width=700, height=500)
fig.update_layout(
    title="<b>User visitation throughout the week</b>",
                   xaxis_title="Day",
                    yaxis_title="Number of visits"
)
fig.show()

✨ **Insights**:

*  Thursday morning had the highest visits counting for around 1000 visits 
*  Weekends had the lowest visitstion, nearly no visitations were counted

####  3.12 Type of window action

In [38]:
# interactive pie chart using plotly
fig = go.Figure(data=[go.Pie(labels=df_visits['event_type'].unique(), 
                             values=df_visits['event_type'].value_counts(), 
                             title_text="<b>Type of window action</b>")])
fig.update_traces(hole=.4, hoverinfo="label+percent+name")
fig.show() 

✨ **Insights**:

*  88% of visitations had No actions ⛔
*  8% where submit ▶
*  4% of visitations has click as an action 🖱


In [39]:
# Creating subplots
fig = make_subplots(
   rows=1, cols=3,
   specs=[[{"type": "pie"}, {"type": "pie"}, {"type": "pie"}]],
   
)
#1- Investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_investor['event_type'].unique(), 
   values=df_visits_investor['event_type'].value_counts(),
   title_text="<b>Investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=1
)
#2- Potential investors visitations
fig.add_trace(go.Pie(
   labels=df_visits_Potential['event_type'].unique(), 
   values=df_visits_Potential['event_type'].value_counts(), 
   title_text="<b>Potential Investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=2
)
#3- Non-investment visitations
fig.add_trace(go.Pie(
   labels=df_visits_No['event_type'].unique(), 
   values=df_visits_No['event_type'].value_counts(), 
   title_text="<b>Non-investment visitations</b>",
   domain=dict(x=[0, 0.5]),
   name="colors1"),
   row=1, col=3
)

fig.update_layout(
    title="Type of window action",
)

fig.show() 

✨ **Insights**:

* Nearly all of the non-investment visitations had no window action
* investment and droped out from investment visitations had submit window action 

####  3.13 Avarage number of pages clicked by type of visit 

In [40]:
# Interactive histogram using plotly
fig = px.histogram(df_visits, x="invest",y="total_pages" ,barmode="group", histfunc='avg', title="<b>Avarage number of pages clicked by type of visit</b>")
fig.update_layout(width=700, height=500)
fig.show()

✨ **Insights**:

* On avarage investors visitsion count for around 80 pages 
* Followed by visitsions of users that would’ve invested but canceled on the last minute, counting on average 47 pages 
* Non-investment visitations where the laest with on average of 21 pages


####  3.14 Visitation distribution (investors)
##### Who are the investors/ potential investors and how can we predict them?

In [41]:
# interactive users distribution pie chart using plotly 
fig = go.Figure(data=[go.Pie(labels=df_visits['invest'].unique(), 
                             values=df_visits['invest'].value_counts(), 
                             pull=[0.2, 0, 0],
                             title_text="<b>Who are they and how can we predict them?</b>")])
fig.update_traces(marker=dict(colors=['darkblue', 'mediumturquoise', 'gold']))
fig.show()