# Introduction

In the previous sessions, you learned how to construct scientometric networks in Python. It was clear that this can be quite challenging. VOSViewer takes care of a lot of the necessary work in creating scientometric networks. You can hence use VOSViewer to create networks, which you could then export and analyse further in Python. We will here take this approach.

## VOSViewer

You have previously constructed scientometric networks using VOSViewer. You can import the resulting network for further analysis in `igraph`. In order to import the file in `igraph` you need to have saved both the `map` file and the `network` file in VOSViewer. See the manual of VOSViewer for more explanation. As in the previous Python notebook, we have prepared some files for you, in this case the author collaboration network from the Web of Science files that we analysed previously.

We first import the necessary packages. You will presumably recognize these still from the previous Python notebook.

In [None]:
import pandas as pd
import igraph as ig

Now let us read the map and network file from VOSViewer.

In [None]:
map_df = pd.read_csv('data-files/vosviewer/vosviewer_map.txt', sep='\t')
# The network file from VOSViewer has no header, so we set it manually
network_df = pd.read_csv('data-files/vosviewer/vosviewer_network.txt', sep='\t', header=None,
                         names=['idA', 'idB', 'weight'])

Now we have loaded the data, so we can simply construct a network as before.

In [None]:
G_vosviewer = ig.Graph.DictList(
      vertices=map_df.to_dict('records'),
      edges=network_df.to_dict('records'),
      vertex_name_attr='id',
      edge_foreign_keys=('idA', 'idB'),
      directed=False
      )

The layout and clustering is also stored by VOSViewer, and we can use that to display the same visualization in `igraph`.

In [None]:
layout = ig.Layout(coords=zip(*[G_vosviewer.vs['x'], G_vosviewer.vs['y']]))
clustering = ig.VertexClustering.FromAttribute(G_vosviewer, 'cluster')

ig.plot(clustering, layout=layout, vertex_size=4, vertex_frame_width=0, vertex_label=None)

## Community detection

A common phenomenon in many networks is the presence of group structure, where nodes within the same group are densely connected. Such a structure is sometimes called a *modular* structure, and a frequently used measure of group structure is known as *modularity*. You have already encountered this functionality briefly in VOSViewer, which provides clusters. Here we will explore this a bit more in-depth.

First, we will import a package called `leidenalg` which is the *Leiden algorithm*, which we will use for community detection. It is built on top of `igraph` so that it easily integrates with all the exisiting methods of `igraph`.

In [None]:
import leidenalg

Now let us detect communities in the collaboration network from VOSViewer, using the weight of the edges. Because the algorithm is stochastic, it may yield somewhat different results every time you run it. To prevent that from happening, and to always get the same result, we will set the random seed to 0. The result is a `VertexClustering`, which we already briefly encountered when using the clustering results from VOSViewer.

We will first detect communities using *modularity*.

In [None]:
optimiser = leidenalg.Optimiser()
optimiser.set_rng_seed(0)
communities = leidenalg.ModularityVertexPartition(G_vosviewer, weights='weight')
optimiser.optimise_partition(communities)

The length of the `communities` variable indicates the number of communities.

In [None]:
len(communities)

When accessing `communities` variable as a list, each element corresponds to the set of nodes in that community.

In [None]:
communities[30]

Hence, node `548`, node `1052`, etc... belong to community `30`. Another way to look at the communities is by looking at the `membership` of the `VertexClustering`.

In [None]:
communities.membership[:10]

Hence, node `0` belongs to community `7`, node `1` belongs to community `9`, node `2` belongs to community `4`, et cetera.

Let us take a closer look at the largest community.

In [None]:
H = communities.giant()
print(H.summary())

We could again detect communities using modularity in the largest community.

In [None]:
optimiser.set_rng_seed(0)
subcommunities = leidenalg.ModularityVertexPartition(H, weights='weight')
optimiser.optimise_partition(subcommunities)
ig.plot(subcommunities, vertex_size=5, vertex_label=None)

In general, modularity will continue to find subcommunities in this way. An alternative approach, called CPM, does not suffer from that problem. 

Let us detect communities using CPM. We do have to specify a parameter, called the `resolution_parameter`. As its name suggests, it specifies the resolution of the communities we would like to find. At a higher resolution we will tend to find smaller communities, while at a lower resolution we find larger communities. Let us use the resolution parameter 0.01.

In [None]:
optimiser.set_rng_seed(0)
communities = leidenalg.CPMVertexPartition(G_vosviewer,
                                     weights='weight',
                                     resolution_parameter=0.1)
optimiser.optimise_partition(communities)
communities.giant().vcount()

<div class="alert alert-info">
Detect subcommunities in the largest community using CPM, using the same <code>resolution_parameter</code>. How many subcommunities do you find? How does that compare to modularity?
</div>

<div class="alert alert-info">
Try to find more subcommunities by specifying a higher <code>resolution_parameter</code>.
</div>

Modularity adapts itself to the network. In a sense that is convenient, because you then do not have to specify any parameters. On the other hand, it makes the definition of what a "community" is less clear.

CPM does not adapt itself to the network, and maintains the same defintion across different networks. That is convenient, because it brings more clarity to what we mean by a "community". Whenever you try to find subcommunities using the same `resolution_parameter`, CPM should not find any subcommunities. In practice, it may happen that CPM still finds some subcommunities, in which case the original communities were actually not the best possible. The Leiden algorithm can be run for multiple iterations, and with each iteration, the chances are smaller that CPM would find such subcommunities. Modularity will always find subcommunities, independent of the number of iterations.

<div class="alert alert-info">
Try to find optimise the partition with more iterations, as indicated below (<code>n_iterations=10</code>). Note that the function returns how much further it managed to improve the function, so that if it returns <code>0.0</code>, it means it couldn't find any further improvement. Execute the cell repeatedly. Does it return 0.0 after some time?
</div>

In [None]:
optimiser.optimise_partition(communities, n_iterations=10)

Let us compare the communities that we detected in Python with the clustering results from VOSViewer.

We can summarize the overall similarity to the partition based on the disciplines using the Normalised Mutual Information (NMI). The NMI varies between 0 and 1 and equals 1 if both are identical.

In [None]:
communities.compare_to(clustering, method='nmi')

There are some differences between the clustering from VOSViewer and the communities we detected in Python. This will of course highly depend on what resolution parameter we have used for both results. One other important difference is that VOSViewer will by default use *normalized* weights. By default, it will divide the weight of a link by the expected weight, assuming that the total link weight of each node would remain the same, which is sometimes referred to as the *association strength*. We also perform this normalization here.

In [None]:
G_vosviewer.es['weight_normalized'] = [
    e['weight']/( G_vosviewer.vs[e.source]['weight<Total link strength>']*G_vosviewer.vs[e.target]['weight<Total link strength>'] / (2*sum(G_vosviewer.es['weight'])) ) 
    for e in G_vosviewer.es]

By default VOSViewer uses the default resolution of `1` for these normalized weights. If we now detect communities using these weights, you will see that the result are more closely align to the VOSViewer results.

In [None]:
communities = leidenalg.find_partition(G_vosviewer, leidenalg.CPMVertexPartition, 
                                       weights='weight_normalized', resolution_parameter=1,
                                       n_iterations=10)

communities.compare_to(clustering, method='nmi')

In [None]:
member_df = pd.DataFrame({'python': communities.membership, 'vosviewer': clustering.membership})

In [None]:
t = pd.crosstab(member_df['python'], member_df['vosviewer'])

Now let us explore community detection a bit further.

<div class="alert alert-info">
    Vary the <code>resolution_parameter</code> when detecting communities using the CPM method. What <code>resolution_parameter</code> seems reasonable to you, and why?
</div>

<div class="alert alert-info">
    Try to find a <code>resolution_parameter</code> such that the network separates in two large communities (and some remaining small communities). What is the cause of these two large communities? (Hint: examine the author names)
</div>

<div class="alert alert-info">
Compare the co-authorship network that we created previously in Python to the network created in VOSViewer. What are the differences?
</div>

# Own analysis

<div class="alert alert-info">
Load your own data in VOSViewer and create a co-citation network of journals.
</div>

<div class="alert alert-info">
Detect comunities in the journal co-citation network. What do you think the different communities mean?
</div>