# Example 1: Web bouncing

In [7]:
import pandas as pd
import numpy as np

Data represents transitions from one url to another (supposedly within our set of urls).

Initial urls have code -1, whereas final urls can have either a "B" (bounce: the user has bounced to another url that is out of our control. This means that our content was not interesting for him) or "C" (close: the user has closed the browser after visiting our last page, indicating that the content ws useful for him. If we owned this website, we would like to achieve as many Cs as posssible).

In [5]:
data_site_transitions = pd.read_csv("site_data.csv", header=None)
data_site_transitions.columns = ['previous_page', 'nex_page']

In [6]:
data_site_transitions

Unnamed: 0,previous_page,nex_page
0,-1,8
1,4,8
2,-1,2
3,1,B
4,-1,5
...,...,...
99995,8,2
99996,0,5
99997,0,7
99998,4,0


## Store transitions

In [20]:
transitions = {}
row_sums = {}
for line in open("site_data.csv", 'r'):
    s, e = line.rstrip().split(',')
    transitions[(s,e)] = transitions.get((s,e), 0.) + 1
    row_sums[s] = row_sums.get(s, 0.) + 1

# normalize counts to get the transition probabilities
for k, v in transitions.items():
    s, e = k
    transitions[k] = v / row_sums[s]

# initial state distribution
# (# sequences starting at state S)/(# of initial cases)
for k, v in transitions.items():
    s, e = k
    if s == '-1': # initial state
        print("Initial state", e, v)

# bounce rate
for k, v in transitions.items():
    s, e = k
    if e == 'B': # Bounce final state
        print("Bounces:", s, v)


Initial state 8 0.10152591025834719
Initial state 2 0.09507982071813466
Initial state 5 0.09779926474291183
Initial state 9 0.10384247368686106
Initial state 0 0.10298635241980159
Initial state 6 0.09800070504104345
Initial state 7 0.09971294757516241
Initial state 1 0.10348995316513068
Initial state 4 0.10243239159993957
Initial state 3 0.09513018079266758
Bounces: 1 0.125939617991374
Bounces: 2 0.12649551345962112
Bounces: 8 0.12529550827423167
Bounces: 6 0.1208153180975911
Bounces: 7 0.12371650388179314
Bounces: 3 0.12743384922616077
Bounces: 4 0.1255756067205974
Bounces: 5 0.12369559684398065
Bounces: 0 0.1279673590504451
Bounces: 9 0.13176232104396302


Page id 9 has the highest initial probability and also the highest bounce rate, followed closely by other pages