# What is a Jupyter notebook?

A Jupyter notebook is a way for you write and execute Python in your web browser.  It's easy to interleave text, code, and the outputs of code.  This makes notebooks useful as a kind of interactive textbook.

## Getting started

The document that you are reading is not a static web page, but an interactive environment called a "Jupyter notebook" that lets you write and execute code.

For example, below is a code cell with a short Python script that computes a value, stores it in a variable and prints the result.

In [None]:
seconds_in_a_day = 24 * 60 * 60
seconds_in_a_day

To execute the code in the above cell, select it with a click and then either press the play button to the left of the code, or use the keyboard shortcut 'Command/Ctrl+Enter'. To edit the code, just click the cell and start editing.

Variables that you define in one cell can later be used in other cells:

In [None]:
seconds_in_a_week = 7 * seconds_in_a_day
seconds_in_a_week

Jupyter notebooks allow you to combine executable code and rich text in a single document, along with images, HTML and more. To find out more about the Jupyter project, see [jupyter.org](https://www.jupyter.org).

## Data science

With Jupyter you can harness the full power of popular Python libraries to analyse and visualise data. The code cell below loads in various Python libraries to help us solve some problems that other people have already solved! Here we call a few useful libraries with `import` and even rename some to shorter names, such as `pandas` as `pd`.

In [None]:
# Based on notesbooks/2021_09_explore_govuk_structural_network.R
import os

import igraph as ig
import matplotlib.pyplot as plt
import pandas as pd
from igraph import Graph, VertexSeq

In [None]:
# Read the name of the edgelist dataset file from an environment variable.
DIR_DATA_RAW = os.getenv("DIR_DATA_RAW")
print(DIR_DATA_RAW)

In [None]:
# Load the edgelist dataset into a pandas data frame
edges = pd.read_csv(DIR_DATA_RAW + "/structural_network_adjacency_list_20190301.csv")

In [None]:
# Filter for pages whose URLs contain the word 'brexit'
search_string = "brexit"
brexit_edges = edges[
    (edges.source_base_path.str.contains(search_string))
    & (edges.sink_base_path.str.contains(search_string))
]

In [None]:
# Construct a graph object from the edges
g = Graph.DataFrame(brexit_edges, directed=True)

In [None]:
# The graph has one big component, and many small ones
# that are disconnected from the big one.
# Keep only the largest component.
components = g.clusters(mode="weak")
brexit = components.giant()

In [None]:
# Detect communities within the graph.  The spinglass
# algorithm allows for a maximum number of communities
# to be set.  It might detect fewer than this, but it
# won't detect more.  Every node (every page) will be
# assigned to exactly one community.
communities = brexit.community_spinglass(spins=5)

In [None]:
# Visualise the graph.  Colour each node (each page)
# according to the community that it belongs to.
pal = ig.drawing.colors.ClusterColoringPalette(len(communities))
brexit.vs["color"] = pal.get_many(communities.membership)
ig.plot(brexit)

In [None]:
# Calculate the degree of each node (each page).  The
# degree is the number of edges into and out of the
# node.
degrees = [v.degree() for v in VertexSeq(brexit)]

In [None]:
# Visualise the graph again, this time labelling each
# node with its degree.
ig.plot(brexit, vertex_label=degrees)

In [None]:
# Visualise the graph again, this time labelling each
# node with its degree, and sizing each node by its
# degree too.  This reveals a handful of nodes of
# high degree.
ig.plot(brexit, vertex_label=degrees, vertex_size=degrees)