# Step 4: Create NetworkX graph of the economic recovery whole user journey

Combine the nodes and edges together, as defined in `step_three_extract_nodes_and_edges.sql`,
to create a NetworkX functional graph of the economic recovery whole user journey. 

OUTPUT: 
- A NetworkX graph `G`
  - This is saved as a pickle object in `../data/processed` 

REQUIREMENTS: 
- Run `step_one_identify_seed_pages.ipynb` to define seed0 and seed1 pages
- Run `step_two_extract_page_hits.sql` to extract page hits for sessions that visit 
  at least one seed0 or seed1 page
- Run `step_three_extract_nodes_and_edges.sql` to extract nodes and edges

## Import modules and authentication 

In [None]:
# import statements 
import os 
import networkx as nx
import pandas as pd
import pandas_gbq
from google.oauth2 import service_account
import pandas_gbq

# authentication
GOOGLE_APPLICATION_CREDENTIALS = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
credentials = service_account.Credentials.from_service_account_file(GOOGLE_APPLICATION_CREDENTIALS,)

## Import nodes and edges 

In [None]:
project_id = "govuk-bigquery-analytics"

# nodes
sql_nodes = """
SELECT 
    *
FROM `govuk-bigquery-analytics.wuj_network_analysis.nodes_er`
"""
nodes = pandas_gbq.read_gbq(sql_nodes, project_id=project_id, credentials=credentials)  

# edges 
sql_edges = """
SELECT 
    *
FROM `govuk-bigquery-analytics.wuj_network_analysis.edges_er`
"""
edges = pandas_gbq.read_gbq(sql_edges, project_id=project_id, credentials=credentials)

## Create NetworkX graph

In [None]:
# add nodes, edges, and edge weight 
G=nx.from_pandas_edgelist(edges, 'sourcePagePath', 'destinationPagePath', ['edgeWeight'], create_using=nx.DiGraph())

# iterate over nodes and set the source nodes' attributes
nx.set_node_attributes(G, pd.Series(nodes.documentType.values, index=nodes.sourcePagePath).to_dict(), 'documentType')
nx.set_node_attributes(G, pd.Series(nodes.topLevelTaxons.values, index=nodes.sourcePagePath).to_dict(), 'topLevelTaxons')
nx.set_node_attributes(G, pd.Series(nodes.bottomLevelTaxons.values, index=nodes.sourcePagePath).to_dict(), 'bottomLevelTaxons')
nx.set_node_attributes(G, pd.Series(nodes.sourcePageSessionHitsAll.values, index=nodes.sourcePagePath).to_dict(), 'sessionHitsAll')
nx.set_node_attributes(G, pd.Series(nodes.sourcePageSessionHitsEntranceOnly.values, index=nodes.sourcePagePath).to_dict(), 'entranceHit')
nx.set_node_attributes(G, pd.Series(nodes.sourcePageSessionHitsExitOnly.values, index=nodes.sourcePagePath).to_dict(), 'exitHit')
nx.set_node_attributes(G, pd.Series(nodes.sourcePageSessionHitsEntranceAndExit.values, index=nodes.sourcePagePath).to_dict(), 'entranceAndExitHit')

# save functional graph 
nx.write_gpickle(G, "../data/processed/functional_session_hit_directed_graph_er.gpickle")