# Social Network Analysis 2

COST Action Training School in Computational Opinion Analysis – COpA

Johannes B. Gruber \| GESIS

# Example: Divided They Blog

## Packages and setup

In [None]:
if (Sys.getenv("COLAB_RELEASE_TAG") != "") {
  download.file("https://github.com/eddelbuettel/r2u/raw/master/inst/scripts/add_cranapt_focal.sh",
                "add_cranapt_focal.sh")
  Sys.chmod("add_cranapt_focal.sh", "0755")
  system("./add_cranapt_focal.sh")
}
# install missing packages
required <- c("igraph", "tidyverse", "tidygraph", "ggraph", "atrrr")
missing <- setdiff(required, installed.packages()[,"Package"])
install.packages(missing, Ncpus = 4)
library(tidyverse); theme_set(theme_minimal())
library(igraph)
library(tidygraph)
library(ggraph)
library(atrrr)

## The Blogosphere data

In [None]:
graph_file <- "data/polblogs.zip"
dir.create(dirname(graph_file), showWarnings = FALSE)
if (!file.exists(graph_file)) {
  curl::curl_download(
    "https://public.websites.umich.edu/~mejn/netdata/polblogs.zip",
    graph_file
  )
}

# have a quick look at the data description
unz(graph_file, "polblogs.txt") |> 
  readLines() |> 
  cat(sep = "\n")

## graph data structure in `R`

Let’s first look at the `igraph` graph class:

In [None]:
polblogs_igraph <- igraph::read_graph(unz(graph_file, "polblogs.gml"), format = "gml") 
class(polblogs_igraph)
polblogs_igraph

## graph data structure in `R`

We can convert this to a `tidygraph` object:

In [None]:
polblogs_tbl <- as_tbl_graph(polblogs_igraph)
class(polblogs_tbl)
polblogs_tbl

## graph data structure in `R`/`Python`

Both `igraph` and `tbl_graph` objects essentially consist of two linked
tables containing <span class="kn-pink">nodes</span> (aka vertices) and
<span class="kn-pink">edges</span>.

In [None]:
polblogs_tbl |> 
  activate("nodes") |> 
  as_tibble()

In [None]:
polblogs_tbl |> 
  activate("edges") |> 
  as_tibble()

## working with graph data structures in `R`

We can look up the variables (like political class) for any given node
by filtering. For example, let’s see the node with ID 30:

In [None]:
polblogs_tbl |> 
  filter(id == 30)

## working with graph data structures in `R`

If we want to see how many left and right blogs are in the data, we can
use `count`. But only after converting the nodes to a `data.frame`!

In [None]:
polblogs_tbl |>
  activate("nodes") |>
  as_tibble() |> 
  count(ideology = value)

## working with graph data structures in `R`

The `value` variable is named terribly and stored in a strange format.
Let’s change that using some more tidyverse functions that work
out-of-the-box with `tidygraph` graphs:

In [None]:
polblogs_tbl_new <- polblogs_tbl |> 
  activate("nodes") |> 
  mutate(ideology = recode_factor(value, 
                                  "0" = "left",
                                  "1" = "right"))
polblogs_tbl_new

It’s not necessary to store ideology as a factor, but generally good
practice in `R`. Whenever you have a variable with just a few repeating
character values, a factor is more efficient.

## first insights

We can answer some initial questions about the data:

-   how many left and right blogs are there?

In [None]:
polblogs_tbl_new |> 
  as_tibble() |> 
  count(ideology)

## first insights

We can answer some initial questions about the data:

-   how many connections (edges) do nodes have to other nodes?

In [None]:
polblogs_tbl_new |> 
  activate(edges) |> 
  as_tibble() |> 
  count(from, sort = TRUE)

Let’s have a closer look at the top node:

In [None]:
polblogs_tbl_new |> 
  filter(id == 855) |> 
  as_tibble()

In [None]:
polblogs_tbl_new |> 
  activate(edges) |> 
  as_tibble() |> 
  count(to, sort = TRUE)

Let’s have a closer look at the top node:

In [None]:
polblogs_tbl_new |> 
  filter(id == 155) |> 
  as_tibble()

## first insights

We can answer some initial questions about the data:

-   do left and right blogs reference each other?

In [None]:
polblogs_tbl_new |> 
  activate(edges) |> 
  mutate(from_ideo = .N()$ideology[from],
         to_ideo = .N()$ideology[to]) |> 
  as_tibble() |> 
  count(from_ideo, to_ideo)

## visualising graphs in `R`

-   `ggraph` inherits the idea of a grammar of graphics from `ggplot`
-   hence, we build up plots in layers with visual aesthetics mapped to
    data
-   the difference is, we address map data from the nodes and edges
    table

<figure>
<img src="https://ggraph.data-imaginist.com/reference/figures/logo.png"
alt="ggraph" />
<figcaption aria-hidden="true">ggraph</figcaption>
</figure>

In [None]:
ggraph(graph = polblogs_tbl_new, layout = "kk") + 
  geom_edge_link() + 
  geom_node_point(aes(colour = ideology))

## Recreate the plot from Adamic and Glance (2005)

In [None]:
polblogs_tbl_new |> 
  # you can sample the graph to make plotting quicker (but incomplete)
  # sample_frac(0.1) |> 
  # the size of the bubbles is influenced by the number of blogs that link to it
  mutate(referenced = centrality_degree(mode = "in", loops = FALSE)) |> 
  # remove isolated nodes
  activate(nodes) |>
  filter(!node_is_isolated()) |>
  # the colour of edges is influenced by whether the connection is left, right 
  # or bipartisan
  activate(edges) |> 
  mutate(col = case_when(
    .N()$ideology[from] == "left" & .N()$ideology[to] == "left" ~ "#2F357E",
    .N()$ideology[from] == "right" & .N()$ideology[to] == "right" ~ "#D72F32",
    .N()$ideology[from] == "left" & .N()$ideology[to] == "right" ~ "#f4c23b",
    .N()$ideology[from] == "right" & .N()$ideology[to] == "left" ~ "#f4c23b"
  )) |> 
  # the stress majorization algorithm in ggraph is the closed to the original
  # force directed layout from
  ggraph(layout = "stress") +
  geom_edge_link(aes(colour = col),
                 arrow = arrow(length = unit(2, "mm"), type = "closed")) +
  # we map the number of references to the size
  geom_node_point(aes(fill = ideology, size = referenced),
                  # black border and a different shape creates bubbles
                  colour = "black", pch = 21) +
  scale_fill_manual(values = c(left = "#2F357E", right = "#D72F32")) +
  scale_edge_colour_identity() + 
  theme_graph()

## volatile layouts

One thing that always makes me cautious about interpreting network plots
is how volatile the placement of nodes in the plot is and how much it
can trick you into finding a pattern where none exists. So let’s look at
an experiment:

Prepare data for plotting:

In [None]:
set.seed(1)
plot_data <- polblogs_tbl_new |> 
  # you can sample the graph to make plotting quicker (but incomplete)
  sample_frac(0.25) |> 
  mutate(referenced = centrality_degree(mode = "in", loops = FALSE)) |> 
  activate(nodes) |>
  filter(!node_is_isolated()) |>
  activate(edges) |> 
  mutate(col = case_when(
    .N()$ideology[from] == "left" & .N()$ideology[to] == "left" ~ "#2F357E",
    .N()$ideology[from] == "right" & .N()$ideology[to] == "right" ~ "#D72F32",
    .N()$ideology[from] == "left" & .N()$ideology[to] == "right" ~ "#f4c23b",
    .N()$ideology[from] == "right" & .N()$ideology[to] == "left" ~ "#f4c23b"
  ))

Get all available layouts:

In [None]:
layouts <- c(
  "auto",       # Automatic layout
  "circle",     # Circular layout
  "dh",         # Davidson-Harel layout
  "drl",        # Distributed Recursive Layout
  "fr",         # Fruchterman-Reingold layout
  "gem",        # GEM layout
  "graphopt",   # Graphopt layout
  "grid",       # Grid layout
  "kk",         # Kamada-Kawai layout
  "lgl",        # Large Graph Layout
  "linear",     # Linear layout
  "mds",        # Multidimensional Scaling layout
  "randomly",   # Random layout
  "sphere",     # Spherical layout
  "star",       # Star layout
  "stress",     # Stress majorization layout
  "sugiyama",   # Sugiyama layout (for layered graphs)
  "tree"        # Tree layout
)

## volatile layouts

In [None]:
dir.create("media/layouts/")
for (layout in layouts) {
  
  # plot status message in interactive sessions
  if (interactive()) {
    message("plotting using layout ", layout)
  }
  
  plot_f <- paste0("media/layouts/network_", layout, ".png")
  
  if (!file.exists(plot_f)) {
    p <- plot_data |> 
      ggraph(layout = layout) +
      geom_edge_link(aes(colour = col),
                     arrow = arrow(length = unit(2, "mm"), type = "closed")) +
      # we map the number of references to the size
      geom_node_point(aes(fill = ideology, size = referenced),
                      # black border and a different shape creates bubbles
                      colour = "black", pch = 21) +
      scale_fill_manual(values = c(left = "#2F357E", right = "#D72F32")) +
      scale_edge_colour_identity() + 
      theme_graph() +
      labs(caption = paste0("Layout: ", layout))
    ggsave(filename = plot_f, plot = p, width = 7, height = 7)
  }
  
}
gif_file <- list.files("media/layouts/", full.names = TRUE) |> 
  gifski::gifski(gif_file = "media/layouts.gif")
knitr::include_graphics(gif_file)

## community detection

In the previous figure we used the political orientation of blogs
manually assigned by Adamic and Glance (2005). Usually, we want to
detect communities from the network structure itself. We learned about
the <span class="kn-pink">Louvain</span> and
<span class="kn-pink">Leiden</span> algorithms (and about their
downsides). So let’s start with these.

In [None]:
polblogs_tbl_new_grouped <- polblogs_tbl_new |> 
  activate(nodes) |>
  to_undirected() |> 
  mutate(group_louvain = group_louvain(),
         group_leiden = group_leiden())

polblogs_tbl_new_grouped |> 
  as_tibble()

## community detection

In [None]:
polblogs_tbl_new_grouped |> 
  activate(nodes) |>
  mutate(group_louvain = factor(group_louvain()),
         group_leiden = factor(group_leiden())) |> 
  # the size of the bubbles is influenced by the number of blogs that link to it
  mutate(referenced = centrality_degree(mode = "in", loops = FALSE)) |> 
  # remove isolated nodes
  filter(!node_is_isolated()) |>
  ggraph(layout = "stress") +
  geom_edge_link(colour = "gray",
                 arrow = arrow(length = unit(2, "mm"), type = "closed")) +
  # we map the number of references to the size
  geom_node_point(aes(fill = group_louvain, size = referenced),
                  # black border and a different shape creates bubbles
                  colour = "black", pch = 21) +
  theme_graph() +
  labs(title = "Blogosphere grouped by Louvain")

## community detection

In [None]:
polblogs_tbl_new_grouped |> 
  activate(nodes) |>
  mutate(group_louvain = factor(group_louvain()),
         group_leiden = factor(group_leiden())) |> 
  # the size of the bubbles is influenced by the number of blogs that link to it
  mutate(referenced = centrality_degree(mode = "in", loops = FALSE)) |> 
  # remove isolated nodes
  filter(!node_is_isolated()) |>
  ggraph(layout = "stress") +
  geom_edge_link(colour = "gray",
                 arrow = arrow(length = unit(2, "mm"), type = "closed")) +
  # we map the number of references to the size
  geom_node_point(aes(fill = group_leiden, size = referenced),
                  # black border and a different shape creates bubbles
                  colour = "black", pch = 21, show.legend = FALSE) +
  theme_graph() +
  labs(title = "Blogosphere grouped by Leiden")

# Bluesky: What is my Bluesky network?

Before we start working with Bluesky, you should authenticate your session with an app password.
To obtain the password, visit <https://bsky.app/settings/app-passwords>:

In [None]:
auth("jbgruber.bsky.social", "your_password")

When you are new to Bluesky you should start by searching a few names
that you know. There are also starter packs like
<https://bsky.app/starter-pack/sof14g1l.bsky.social/3lbc4bqetfp22> which
you can follow. But what then? Given the idea of homophily, you might
want to check who the people you follow, follow themselves. So let’s get
started with that (replace my handle with yours below if you like):

In [None]:
my_follows <- get_follows(actor = "jbgruber.bsky.social", limit = Inf)
my_follows |>
  glimpse()

Next, we want to see who they follow:

In [None]:
their_follows <- my_follows$actor_handle |> 
  # the 50 accounts at the bottom of the list are the ones I followed first
  tail(50) |> 
  map(function(handle) {
  # for demo purposes I add a limit since some follow 1000s
  tibble(
    from = handle,
    to =  get_follows(handle, limit = 1000L, verbose = FALSE)$actor_handle
  )
}, .progress = TRUE) |> 
  bind_rows()
saveRDS(their_follows, "data/their_follows.rds")

In [None]:
if (!file.exists("data/their_follows.rds")) {
  curl::curl_download(
    "https://www.dropbox.com/scl/fi/jc13aic7daa8z6ullv491/their_follows.rds?rlkey=7yp63480tk613ofm83j7sm1ei&st=9bqmj47b&dl=1",
    "data/their_follows.rds"
  )
}
their_follows <- readRDS("data/their_follows.rds")

As a first step, we can just check the raw number of who shows up most
often here:

In [None]:
their_follows |> 
  count(to) |> 
  filter(to != "handle.invalid") |> 
  slice_max(order_by = n, n = 15) |> 
  mutate(to = fct_reorder(to, n)) |> 
  ggplot(aes(x = n, y = to)) +
  geom_col()

But we can also use network analysis to find influential accounts:

In [None]:
follows_graph <- as_tbl_graph(their_follows, directed = TRUE) |> 
  activate(nodes) %>%
  mutate(
    degree = centrality_degree(),
    closeness = centrality_closeness(),
    betweeness = centrality_betweenness(),
    eigenvector = centrality_eigen()
  )

Let’s look at **degree** centrality (simply counts the number of
neighbors a node has):

In [None]:
follows_graph |> 
  as_tibble() |> 
  arrange(degree) |> 
  slice_max(degree, n = 10)

**closeness** (computes the shortest path distances among nodes. The
most central node has the minimum distance to all other nodes)

In [None]:
follows_graph |> 
  as_tibble() |> 
  arrange(closeness) |> 
  slice_max(closeness, n = 10)

**betweeness** (number of shortest paths that pass through a node,
divided by the total number of shortest paths)

In [None]:
central_accounts <- follows_graph |> 
  as_tibble() |> 
  arrange(betweeness) |> 
  slice_max(betweeness, n = 10)
central_accounts

and **eigenvector centrality** (extends the idea of degree by assuming
that a node is central if it is connected to other central nodes)

In [None]:
follows_graph |> 
  as_tibble() |> 
  arrange(eigenvector) |> 
  slice_max(eigenvector, n = 10)

We can also visialise this network and highlight some of the accounts in
it:

In [None]:
follows_graph |> 
  mutate(central_account = name %in% central_accounts$name) |> 
  slice_max(eigenvector, n = 500) |> 
  ggraph(layout = "mds") +
  geom_edge_link(alpha = 0.7) +
  geom_node_point(aes(size = betweeness, color = central_account)) +
  scale_color_manual(values = c("TRUE" = "firebrick", "FALSE" = "lightblue")) +
  geom_node_label(aes(label = ifelse(central_account, name, NA)), vjust = 1, hjust = 1) +
  theme_graph() +
  coord_equal(clip = "off")

For a different approach to find people to follow, you can check out
<https://www.johannesbgruber.eu/post/2024-11-24-bluesky-rising/>.

# Bluesky: Checking out the #rstats network

First let’s get some data. The code below searches for posts that
mention the hashtag #rstats, which is widely use for all things R:

In [None]:
rstats_content <- search_post("#rstats", since = "2025-04-01", limit = Inf)
saveRDS(rstats_content, "data/rstats_content.rds")

In [None]:
if (!file.exists("data/rstats_content.rds")) {
  curl::curl_download(
    "https://www.dropbox.com/scl/fi/jc13aic7daa8z6ullv491/their_follows.rds?rlkey=7yp63480tk613ofm83j7sm1ei&st=7hsyjz50&dl=1",
    "data/rstats_content.rds"
  )
}
rstats_content <- readRDS("data/rstats_content.rds")

## Semantic Network/Co-hashtag Network

The first network we can build from this data is a co-occurence network.
We check, which hashtags are used together in these posts. In other
words: the hashtags are nodes, and the edges are whether the hashtags
were used together. As a first step, let’s get some info on the most
popular hashtags:

In [None]:
rstats_tags <- rstats_content |> 
  unnest_longer(tags) |> 
  mutate(hashtag = tolower(tags))

rstats_tags_count <- rstats_tags |> 
  count(hashtag)

rstats_tags_count |> 
  slice_max(order_by = n, n = 10) |> 
  mutate(hashtag = fct_reorder(hashtag, n)) |> 
  ggplot(aes(x = n, y = hashtag)) +
  geom_col()

Now, to find co-occurrence of hashtags. You could also do this with
words, but it often comes out less meaningful.

In [None]:
rstats_tags_cooc <- rstats_tags |> 
  # for each post (or rather for each unique post ID), we get all possible 
  # combinations of hashtags
  group_by(cid) |> 
  group_map(function(df, ...) expand_grid(from = df$hashtag, to = df$hashtag)) |> 
  bind_rows() |> 
  # we filter out cases where the hashtag 'coocurrs' with itself
  filter(from != to)
rstats_tags_cooc

Now you should notice this looks like a network already. Let’s make it
one:

In [None]:
rstats_tags_network <- tbl_graph(edges = rstats_tags_cooc, directed = FALSE) |> 
  activate(nodes) |> 
  left_join(rstats_tags_count, by = c("name" = "hashtag"))
rstats_tags_network

We can try a few things to turn this into a nice plot. Let’s get a
baseline first:

In [None]:
rstats_tags_network |> 
  activate(nodes) |> 
  # to make it easier to look at, let's limit ourselves to top hashtags
  slice_max(order_by = n, n = 50) |> 
  ggraph(layout = "stress") +
  geom_edge_link() +
  # instead of nodes, I use labels directly here
  geom_node_label(aes(label = name)) +
  theme_graph()

We can use color to show the centrality of nodes. You can play around
with which measure produces the most interesting visual.

In [None]:
rstats_tags_network |> 
  activate(nodes) |> 
  mutate(
    degree = centrality_degree(),
    closeness = centrality_closeness(),
    betweeness = centrality_betweenness(),
    eigenvector = centrality_eigen()
  ) |> 
  slice_max(order_by = n, n = 50) |> 
  ggraph(layout = "stress") +
  geom_edge_link() +
  geom_node_label(aes(label = name, fill = eigenvector)) +
  scale_fill_continuous(low = "lightblue", high = "firebrick") +
  theme_graph()

We can also see if we can find some communities in this data. As above,
we are using the Louvain and Leiden algorithms:

In [None]:
rstats_tags_network_communities <- rstats_tags_network |> 
  activate(nodes) |> 
  mutate(group_louvain = as.factor(group_louvain()),
         group_leiden = as.factor(group_leiden()))
rstats_tags_network_communities |> 
  as_tibble()

Here is the network grouped by Louvain:

In [None]:
rstats_tags_network_communities |> 
  slice_max(order_by = n, n = 100) |> 
  ggraph(layout = "mds") +
  geom_edge_link() +
  geom_node_label(aes(label = name, fill = group_louvain), show.legend = FALSE) +
  theme_graph() +
  coord_equal(clip = "off")

And here it is grouped by the Leiden algorithm:

In [None]:
rstats_tags_network_communities |> 
  slice_max(order_by = n, n = 100) |> 
  ggraph(layout = "mds") +
  geom_edge_link() +
  geom_node_label(aes(label = name, fill = group_leiden), show.legend = FALSE) +
  theme_graph() +
  coord_equal(clip = "off")

``` python
plt.figure(figsize=(15, 12))
pos = nx.spring_layout(hashtag_subgraph, k=2, iterations=50)

# Color nodes by Leiden communities
community_colors = plt.cm.Set1(np.linspace(0, 1, len(leiden_communities)))
node_colors = [community_colors[leiden_dict.get(node, 0)] for node in hashtag_subgraph.nodes()]

nx.draw_networkx_nodes(hashtag_subgraph, pos, node_color=node_colors, node_size=300)
nx.draw_networkx_edges(hashtag_subgraph, pos, alpha=0.3, width=0.5)
nx.draw_networkx_labels(hashtag_subgraph, pos, font_size=8)

plt.title('Hashtag Network - Leiden Communities', fontsize=16)
plt.axis('off')
plt.tight_layout()
plt.show()
```

## Follower Network

Within the #rstats content, we can also check the community that posts
this content. We first get the user handles from the data:

In [None]:
rstats_users <- rstats_content |> 
  distinct(author_handle)

Now we query who follows these users who post about #rstats:

In [None]:
follower_data <- rstats_users |> 
  mutate(followed_by = map(author_handle, function(a) {
    # not sure why, but some handles errored when I ran this
    try(get_followers(a, verbose = FALSE)$actor_handle, silent = TRUE)
  }, .progress = TRUE))
saveRDS(follower_data, "data/follower_data.rds")

In [None]:
if (!file.exists("data/follower_data.rds")) {
  curl::curl_download(
    "https://www.dropbox.com/scl/fi/j9t1uejgs6uqdurfo6fdb/follower_data.rds?rlkey=uueaembzfp0xznxmv3art3zg4&st=u96f11ae&dl=1",
    "data/follower_data.rds"
  )
}
follower_data <- readRDS("data/follower_data.rds")

Who follows the most accounts contributing to #rstats?

In [None]:
follower_data |> 
  filter(map_lgl(followed_by, function(f) !is(f, "try-error"))) |> 
  unnest_longer(followed_by) |> 
  count(followed_by) |> 
  slice_max(order_by = n, n = 10) |> 
  mutate(followed_by = fct_reorder(followed_by, n)) |> 
  ggplot(aes(x = n, y = followed_by)) +
  geom_col()

We can turn this data into a directed network where the nodes are users
and the edges are whether a user follows another:

In [None]:
follower_network <- follower_data |> 
  filter(map_lgl(followed_by, function(res) !is(res, "try-error"))) |> 
  unnest_longer(followed_by, values_to = "to") |> 
  rename(from = author_handle) |> 
  tbl_graph(edges = _, directed = TRUE)

In [None]:
follower_network |> 
  ggraph(layout = "stress") +
  geom_edge_link() +
  geom_node_point() +
  theme_graph()

This is a mess, since there are so many nodes in the network now. We can
again look for the most central nodes (aka users) in this network:

In [None]:
follower_network_central <- follower_network |> 
  mutate(
    degree = centrality_degree(),
    closeness = centrality_closeness(),
    betweeness = centrality_betweenness(),
    eigenvector = centrality_eigen()
  )

follower_network_central |>
  arrange(-degree) |> 
  as_tibble()

Now let’s try only the most central accounts:

In [None]:
most_central <- follower_network_central |> 
  slice_max(eigenvector, n = 5) |> 
  activate(nodes) |> 
  as_tibble()

follower_network_central |> 
  slice_max(eigenvector, n = 100) |> 
  ggraph(layout = "stress") +
  geom_edge_link() +
  geom_node_point(color = "firebrick") +
  geom_node_label(aes(label = ifelse(name %in% most_central$name,
                                     name,
                                     NA))) +
  theme_graph()

## Mention Network

We can also contruct a network with users as nodes and mentions as the
edges. For this, let’s extract mentions first:

In [None]:
rstats_mentions <- rstats_content |> 
  filter(str_detect(text, "@")) |> 
  mutate(mentions = str_extract_all(text, "@\\w+")) |> 
  select(from = author_handle, to = mentions) |> 
  unnest_longer(to)

Let’s see who is mentioned most:

In [None]:
rstats_mentions |> 
  count(to) |>
  slice_max(order_by = n, n = 10) |> 
  mutate(to = fct_reorder(to, n)) |> 
  ggplot(aes(x = n, y = to)) +
  geom_col()

The network comes natural at this point:

In [None]:
rstats_mentions_netowrk <- tbl_graph(edges = rstats_mentions, directed = TRUE)
rstats_mentions_netowrk |> 
  ggraph(layout = "stress") +
  geom_edge_link() +
  geom_node_point() +
  theme_graph() +
  labs(title = "mention network in #rstats posts")

## Share/repost Network

Another way to contruct a network from this data is by looking for
reposts. Here the nodes would be posts

In [None]:
repost_data <- rstats_content |> 
  filter(repost_count > 0) |> 
  mutate(
    reposts = map(uri, function(u) {
      get_reposts(u, limit = Inf, verbose = FALSE)
    }, .progress = TRUE),
    reposted_by = map(reposts, "actor_handle")
  )
saveRDS(repost_data, "data/repost_data.rds")

In [None]:
if (!file.exists("data/repost_data.rds")) {
  curl::curl_download(
    "https://www.dropbox.com/scl/fi/gt93b4tgfyez3m4imtwdz/repost_data.rds?rlkey=m07wa09c41x8ag4l82o1d1c4e&st=ys107vd1&dl=1",
    "data/repost_data.rds"
  )
}
repost_data <- readRDS("data/repost_data.rds")

In [None]:
repost_network <- repost_data |> 
  unnest_longer(reposted_by, values_to = "to") |> 
  rename(from = author_handle) |> 
  tbl_graph(edges = _, directed = TRUE)

In [None]:
repost_network |> 
  ggraph(layout = "mds") +
  geom_edge_link() +
  geom_node_point(color = "firebrick") +
  theme_graph()

Let’s look at centrality again to see

In [None]:
repost_network_central <- repost_network |> 
  mutate(
    degree = centrality_degree(),
    closeness = centrality_closeness(),
    betweeness = centrality_betweenness(),
    eigenvector = centrality_eigen()
  )

repost_network_central |>
  arrange(-degree) |> 
  as_tibble()

Let’s once again see which are the most central accounts:

In [None]:
most_central <- repost_network_central |> 
  slice_max(degree, n = 5) |> 
  activate(nodes) |> 
  as_tibble()

repost_network_central |> 
  slice_max(degree, n = 100) |> 
  ggraph(layout = "stress") +
  geom_edge_link() +
  geom_node_point(color = "firebrick") +
  geom_node_label(aes(label = ifelse(name %in% most_central$name,
                                     name,
                                     NA))) +
  theme_graph()

Let’s see if we can find any sensible communities in this network:

In [None]:
repost_network_central |> 
  activate(nodes) |>
  to_undirected() |> 
  mutate(group_louvain = group_louvain(),
         group_leiden = group_leiden()) |> 
  ggraph(layout = "mds") +
  geom_edge_link() +
  geom_node_point(aes(color = group_louvain)) +
  geom_node_label(aes(label = ifelse(name %in% most_central$name,
                                     name,
                                     NA))) +
  theme_graph()