## Create and Compare Structural and Functional Networks

First, create a structural network of pages that hyperlink across domains, 
`G_structural`, and a functional network of real, observed user movements (sessions) 
that traverse across domains, `G_functional`.  

Second, undertake exploratory data analysis exploring the differences in source and target 
node pairs between `G_structural` and `G_functional` NetworkX graphs:
- 001: Which pages in the structural network are users not landing on?
- 002: Which edges in the structural network are users not traversing on?
- 003: Which edges are users traversing on but are not in the structural network?
- 004: Which pages exist in both the functional and structural networks?

To conclude, create a Bokeh plot, in attempt to visualise the differences between 
the two networks.

**Assumptions** <br>
- Both networks are directed graphs <br>
- In the functional graph, only page hits are included (not event hits). Print 
pages are excluded.
- In the functional graph, all edges must have a source page and a target page (i.e.
 the following page in a user's journey). Where a target page does not exist, e.g. 
 a user leaves ww.gov.uk, then the edge (source: target) does not exist.
- In the functional graph, the edge weight represents the number of sessions that 
visit the target page following the source page.
- For 003, where source: target pairs are both `www.gov.uk`, are removed. When 
creating `G_functional`, all `www.gov.uk` hostname page hits are included. When 
creating `G_structural`, only `www.gov.uk` hostname page hits are included if they 
are hyperlinked from a page on `account.gov.uk` or `signin.account.gov.uk`. 
Therefore, it is a fairer comparison to remove `www.gov.uk` source: target pairs.
In addition, edges that consist of the same page are removed as these are likely 
to represent page refreshes (which we are not interested in).
- For the Bokeh plot, edges are weighted, with weights representing the number of 
sessions that have traversed the edge.
- The foundational network is a structural network representing the pages hyperlinked 
to one another. In the Bokeh plot, a functional network is 'overlayed', representing 
real user movements across the structural network. As such, real user movements not 
in the structural network are removed.

**Requirements** <br>
- You must be able to use Google Cloud Platform through code on your local machine and have the correct permissions to access `govuk-bigquery-analytics` project. See: https://docs.data-community.publishing.service.gov.uk/analysis/google-cloud-platform/#use-gcp-through-the-command-line-on-your-local-machine

#### Import modules 

In [None]:
import os

import networkx as nx
from dotenv import load_dotenv

from src.make_data.create_networks import (
    create_functional_network,
    extract_observed_movements,
)
from src.make_visualisations.create_network_plot import create_bokeh_plot

#### Assign variables

In [None]:
# Assign folder variables
load_dotenv()
DIR_DATA_PROCESSED = os.environ.get("DIR_DATA_PROCESSED")
DIR_DATA_INTERIM = os.environ.get("DIR_DATA_INTERIM")

# Assign variables for the functional graph
start_date = "20220524"
end_date = "20220524"
seed_hosts = ["account.gov.uk", "signin.account.gov.uk"]
query_parameters = False

# Assign variables for the Bokeh plot
title = "User movements across the `account` domain"
functional_page_colour = "red"
structural_page_colour = "blue"
plot_width = 1400
plot_height = 800

#### Create structural network

For the Proof of Concept, hyperlink info has been extracted manually, as a 
dictionary of lists adjacency representation, where each key is the source page, 
and each value is a list of pages the source page hyperlinks to. The dictionary 
of lists is then transformed into a a directed NetworkX graph. 

In [None]:
page_links = {
    "https://signin.account.gov.uk/sign-in-or-create": [
        "https://signin.account.gov.uk/enter-email-create",
        "https://signin.account.gov.uk/enter-email",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/enter-email": [
        "https://signin.account.gov.uk/enter-password",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/enter-password": [
        "https://signin.account.gov.uk/reset-password-check-email",
        "https://www.gov.uk/email/subscriptions/account/confirm",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://www.gov.uk/email/subscriptions/account/confirm": [
        "https://www.gov.uk",
        "https://www.gov.uk/guidance/move-to-the-uk-if-youre-from-ukraine",
    ],
    "https://www.gov.uk/guidance/move-to-the-uk-if-youre-from-ukraine": [
        "https://www.gov.uk/email/manage"
    ],
    "https://signin.account.gov.uk/reset-password-check-email": [
        "https://signin.account.gov.uk/reset-password",
        "https://signin.account.gov.uk/reset-password-resend-code",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/reset-password-resend-code": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/enter-password-account-exists",
        "https://signin.account.gov.uk/reset-password-check-email",
    ],
    "https://signin.account.gov.uk/reset-password": [
        "https://signin.account.gov.uk/enter-code",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/enter-code": [
        "https://signin.account.gov.uk/contact-us",
        "https://account.gov.uk/manage-your-account",
        "https://signin.account.gov.uk/resend-code",
    ],
    "https://signin.account.gov.uk/resend-code": [
        "https://signin.account.gov.uk/enter-code",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/enter-email-create": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/check-your-email",
        "https://signin.account.gov.uk/enter-password-account-exists",
    ],
    "https://signin.account.gov.uk/enter-password-account-exists": [
        "https://signin.account.gov.uk/reset-password-check-email",
        "https://signin.account.gov.uk/enter-code",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/check-your-email": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/create-password",
        "https://signin.account.gov.uk/enter-email-create",
    ],
    "https://signin.account.gov.uk/create-password": [
        "https://signin.account.gov.uk/privacy-notice",
        "https://signin.account.gov.uk/terms-and-conditions",
        "https://signin.account.gov.uk/enter-phone-number",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/enter-phone-number": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/check-your-phone",
    ],
    "https://signin.account.gov.uk/check-your-phone": [
        "https://signin.account.gov.uk/enter-phone-number",
        "https://signin.account.gov.uk/account-created",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/account-created": [
        "https://signin.account.gov.uk/contact-us",
        "https://account.gov.uk/manage-your-account",
        "https://www.gov.uk/email/subscriptions/account/confirm",
    ],
    "https://account.gov.uk/manage-your-account": [
        "https://account.gov.uk/enter-password",
        "https://account.gov.uk/enter-password",
        "https://account.gov.uk/enter-password",
        "https://account.gov.uk/enter-password",
        "https://www.gov.uk/account/home",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://www.gov.uk/account/home": [
        "https://signin.account.gov.uk/contact-us",
        "https://www.gov.uk/sign-in",
        "https://signin.account.gov.uk/signed-out",
        "https://www.gov.uk/email/manage",
        "https://account.gov.uk/manage-your-account",
    ],
    "https://account.gov.uk/enter-password": [
        "https://www.gov.uk/account/home",
        "https://account.gov.uk/manage-your-account",
        "https://account.gov.uk/change-email",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://account.gov.uk/enter-password": [
        "https://www.gov.uk/account/home",
        "https://account.gov.uk/manage-your-account",
        "https://account.gov.uk/change-password",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://account.gov.uk/change-password": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/signed-out",
        "https://account.gov.uk/password-updated-confirmation",
        "https://account.gov.uk/manage-your-account",
        "https://www.gov.uk/account/home",
    ],
    "https://account.gov.uk/password-updated-confirmation": [
        "https://www.gov.uk/account/home",
        "https://account.gov.uk/manage-your-account",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://account.gov.uk/change-email": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/signed-out",
        "https://account.gov.uk/check-your-email",
        "https://account.gov.uk/manage-your-account",
        "https://www.gov.uk/account/home",
    ],
    "https://account.gov.uk/check-your-email": [
        "https://www.gov.uk/account/home",
        "https://account.gov.uk/change-email",
        "https://account.gov.uk/email-updated-confirmation",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://account.gov.uk/email-updated-confirmation": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/signed-out",
        "https://account.gov.uk/manage-your-account",
        "https://www.gov.uk/account/home",
    ],
    "https://account.gov.uk/enter-password": [
        "https://www.gov.uk/account/home",
        "https://account.gov.uk/manage-your-account",
        "https://account.gov.uk/change-phone-number",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://account.gov.uk/change-phone-number": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/signed-out",
        "https://account.gov.uk/check-your-phone",
        "https://account.gov.uk/manage-your-account",
        "https://www.gov.uk/account/home",
    ],
    "https://account.gov.uk/check-your-phone": [
        "https://www.gov.uk/account/home",
        "https://account.gov.uk/change-phone-number",
        "https://account.gov.uk/phone-number-updated-confirmation",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://account.gov.uk/phone-number-updated-confirmation": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/signed-out",
        "https://account.gov.uk/manage-your-account",
    ],
    "https://www.gov.uk/email/manage": [
        "https://www.gov.uk/account/home",
        "https://account.gov.uk/manage-your-account",
        "https://www.gov.uk/email/manage/unsubscribe-all",
        "https://www.gov.uk/email/unsubscribe/",
        "https://www.gov.uk/email/manage/frequency/",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://www.gov.uk/email/unsubscribe/": ["https://www.gov.uk/email/manage"],
    "https://www.gov.uk/email/manage/unsubscribe-all": [
        "https://www.gov.uk/account/home",
        "https://www.gov.uk/email/manage",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://account.gov.uk/enter-password": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/signed-out",
        "https://account.gov.uk/delete-account",
        "https://account.gov.uk/manage-your-account",
        "https://www.gov.uk/account/home",
    ],
    "https://account.gov.uk/delete-account": [
        "https://www.gov.uk/account/home",
        "https://account.gov.uk/manage-your-account",
        "https://signin.account.gov.uk/signed-out",
        "https://signin.account.gov.uk/contact-us",
        "https://account.gov.uk/account-deleted-confirmation",
    ],
    "https://www.gov.uk/sign-in": [
        "https://www.gov.uk/",
        "https://www.gov.uk/sign-in-childcare-account",
        "https://www.gov.uk/check-state-pension",
        "https://www.gov.uk/report-covid19-result",
        "https://www.gov.uk/log-in-register-hmrc-online-services",
        "https://www.gov.uk/sign-in-universal-credit",
        "https://www.gov.uk/email/manage",
    ],
    "https://signin.account.gov.uk/contact-us": [
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/contact-us-further-information",
    ],
    "https://signin.account.gov.uk/contact-us-further-information": [
        "https://signin.account.gov.uk/contact-us-questions",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/contact-us-questions": [
        "https://signin.account.gov.uk/contact-us-submit-success",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/contact-us-submit-success": [
        "https://www.gov.uk/",
        "https://signin.account.gov.uk/contact-us",
    ],
    "https://signin.account.gov.uk/signed-out": [
        "https://www.gov.uk/",
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/sign-in-or-create",
    ],
    "https://www.gov.uk/email/manage/frequency/": [
        "https://www.gov.uk/account/home",
        "https://www.gov.uk/email/manage",
        "https://signin.account.gov.uk/contact-us",
        "https://signin.account.gov.uk/contact-us-questions",
    ],
    "https://www.gov.uk/email/subscriptions/single-page/new": [
        "https://www.gov.uk/sign-in",
        "https://signin.account.gov.uk/sign-in-or-create",
    ],
}

G_structural = nx.from_dict_of_lists(page_links, create_using=nx.DiGraph)

#### Create functional network

In [None]:
user_journeys_df = extract_observed_movements(
    start_date, end_date, seed_hosts, query_parameters
)
G_functional = create_functional_network(user_journeys_df)

#### Explore networks

In [None]:
# Explore structural graph
nx.info(G_structural)
nx.draw(G_structural, with_labels=False)
G_structural.nodes(data=True)
G_structural.edges(data=True)

In [None]:
# Explore functional graph
nx.info(G_functional)
nx.draw(G_functional, with_labels=False)
G_functional.nodes(data=True)
G_functional.edges(data=True)

In [None]:
# Prepare data for EDA
G_func = G_functional.copy()
G_struc = G_structural.copy()

functional_nodes_list = [node for node in G_func.nodes()]
structural_nodes_list = [node for node in G_struc.nodes()]

In [None]:
# 001: Which pages in the structural network are users not landing on?
nodes_to_remove = [x for x in functional_nodes_list if x not in structural_nodes_list]
G_func.remove_nodes_from(nodes_to_remove)

structural_pages_not_visited = [
    node for node in structural_nodes_list if node not in functional_nodes_list
]
structural_pages_not_visited

In [None]:
# 002: Which edges in the structural network are users not traversing on?
functional = []
for source, target in G_func.edges():
    temp = []
    temp.append(source)
    temp.append(target)
    functional.append(temp)

structural = []
for source, target in G_struc.edges():
    temp = []
    temp.append(source)
    temp.append(target)
    structural.append(temp)

struc_edges_not_visited = []
for edge in structural:
    if edge not in functional:
        struc_edges_not_visited.append(edge)
struc_edges_not_visited

In [None]:
# 003: Which edges are users traversing on but are not in the structural network
# (does not include edges where both pages are `www.gov.uk` or are the same page)
func_edges_not_in_struc = []

for edge in functional:
    if edge[0].startswith("https://www.gov") and edge[1].startswith("https://www.gov"):
        continue
    if edge not in structural and not edge[0] == edge[1]:
        func_edges_not_in_struc.append(edge)

func_edges_not_in_struc

In [None]:
# 004: Which pages exist in both the functional and structural networks?
nodes_both_networks = [node for node in G_struc.nodes if node in G_func.nodes]
nodes_both_networks
len(nodes_both_networks)

In [None]:
# Create Bokeh plot visualising the structural and functional networks
plot = create_bokeh_plot(
    G_structural,
    G_functional,
    functional_page_colour,
    structural_page_colour,
    title,
    plot_width,
    plot_height,
)

#### Save data

Save data for use in further analysis.

In [None]:
nx.write_gpickle(G_structural, os.path.join(DIR_DATA_PROCESSED, "G_structural.gpickle"))
user_journeys_df.to_pickle(os.path.join(DIR_DATA_INTERIM, "user_journeys_df.gpickle"))
nx.write_gpickle(G_functional, os.path.join(DIR_DATA_PROCESSED, "G_functional.gpickle"))