Prisoner's Dilemma Network Analysis
---------------------------

Academic articles can have from 1 to more than 20 articles. Some fields tend today
have more collaborations than others, in this section the connectivity of the
authors within the prisoner's dilemma field is examined. Over the \uniquetitles
articles within the data set the total number of unique authors is \authors.

Note that the authors name had to be cleaned before this analysis could be held.
Several journals use different methods of writing an authors name. For this reason
the Levenshtein Distance was used to calculated the difference between name 
entries. A manual check was performed before replacing the flagged entries
by the Levenshtein Distance.

The authors will be represented in a network. The network has sets of vertices \(V\) and edges \(E\). The 
\authors vertices represent each of the unique authors. The vertices are connected
with an edge if and only if two authors have written together. Weights have been
applied to both the vertices and the edges. Vertices' weight corresponds to 
the number of papers the author has within the data set and the edge weight
to the number of times the author wrote together.

In [20]:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
import itertools
import collections
import random 

%matplotlib inline

In [21]:
import numpy as np

In [22]:
import matplotlib.patches as mpatches
import matplotlib.lines as lines

legend_properties = {'weight':'bold'}

Prepering the data
------------------

In [23]:
df = pd.read_json('data_November_2018.json')

In [24]:
# names to lower case
df.author = df.author.str.lower()

In [25]:
authors = len(df.author.unique())
authors

6192

In [7]:
# file = open("/home/nightwing/rsc/Literature-Article/assets/authors.txt", 'w')
# file.write('{}'.format(authors))
# file.close()

In [26]:
# drop Arxiv
df = df[~(df['provenance']=='arXiv')]

In [27]:
authors = len(df.author.unique())
authors

3222

Co-authors
----------

In [28]:
pairs = []
for _, d in df.groupby('unique_key'):
    pairs += tuple(sorted(list(itertools.combinations(d['author'].unique(), 2))))
    co_authors = collections.Counter(pairs)

For creating and analysing the [network](https://networkx.github.io/) the python library networkx will be used though out 
the notebook.

In [29]:
authors_num_papers = df.groupby(['author', 'unique_key']).size().reset_index().groupby('author').count()
authors_num_papers = authors_num_papers.drop(0, axis=1)

In [30]:
G = nx.Graph()
_ = [G.add_node(name) for name, w in zip(df.author, authors_num_papers['unique_key'].values)]
_ = [G.add_edge(*pair[0]) for pair in co_authors.items()]

In [32]:
# file = open("/home/nightwing/rsc/Literature-Article/assets/prisoners_edges.txt", 'w')
# file.write('{}'.format(len(G.edges())))
# file.close()

In [33]:
number_of_author = len(df.author.unique())
number_of_author 

3222

In [34]:
nx.write_gml(G, "../data/prisoners_network.gml")

Illustrating co-authors network
------------------------

In [18]:
fig = plt.figure(figsize=(60, 40))

pos = nx.spring_layout(G)
nodes = nx.draw_networkx_nodes(G, pos, linewidths=2, node_color='orange')
nodes.set_edgecolor('black'); nx.draw_networkx_edges(G, pos)

limits=plt.axis('off') 
# plt.savefig("/home/nightwing/rsc/Literature-Article/assets/images/co-authors-network.pdf", format='pdf', bbox_inches='tight')

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/home/nikoleta/anaconda3/envs/literature/lib/python3.7/site-packages/networkx/drawing/layout.py", line 459, in fruchterman_reingold_layout
    dim, seed)
  File "<decorator-gen-786>", line 2, in _sparse_fruchterman_reingold
  File "/home/nikoleta/anaconda3/envs/literature/lib/python3.7/site-packages/networkx/utils/decorators.py", line 405, in _random_state
    return func(*new_args, **kwargs)
  File "/home/nikoleta/anaconda3/envs/literature/lib/python3.7/site-packages/networkx/drawing/layout.py", line 593, in _sparse_fruchterman_reingold
    delta = (pos[i] - pos).T
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nikoleta/anaconda3/envs/literature/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3265, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-18-50346bbdef26>", line 3, in <module>
    pos = 

TypeError: can only concatenate str (not "list") to str

<Figure size 4320x2880 with 0 Axes>

Analysing co-authors network
-----------------------------

**Connected components.**

In [38]:
isolated_authors = len(nx.isolates(G))

In [39]:
file = open("/home/nightwing/rsc/Literature-Article/assets/prisoners_isolated.txt", 'w')
file.write('{}'.format(isolated_authors))
file.close()

In [40]:
number_connected_components = nx.number_connected_components(G)

In [41]:
file = open("/home/nightwing/rsc/Literature-Article/assets/prisoners_connected_components.txt", 'w')
file.write('{}'.format(number_connected_components))
file.close()

**Clustering Coefficient**

In [42]:
clustering_ceoff = round(nx.average_clustering(G), 3)

In [25]:
file = open("/home/nightwing/rsc/Literature-Article/assets/prisoners_clustering.txt", 'w')
file.write('{}'.format(clustering_ceoff))
file.close()

**Centrality**

In graph theory and network analysis, indicators of centrality identify the most important vertices within a graph. 
*wikipedia link: https://en.wikipedia.org/wiki/Centrality*

We could illustrate these names on the graph.

In [26]:
betweeness = sorted(nx.betweenness_centrality(G, normalized=True).items(), 
                    key=lambda x:x[1], reverse=True)

In [27]:
dist = [b[1] for b in betweeness]

In [23]:
file = open("pd_bc_dist.tex",'w')
file.write('{}'.format(dist))
file.close()

In [29]:
betweeness = pd.DataFrame(betweeness[0:5], columns=['Author name', 'Betweeness'])
betweeness['Author name'] = [name.title() for name in betweeness['Author name']]
betweeness.index += 1

In [32]:
closeness_rank = sorted(nx.closeness_centrality(G, normalized=True).items(), 
                   key=lambda x:x[1], reverse=True)

In [33]:
dist = [b[1] for b in closeness_rank]

In [34]:
file = open("pd_cc_dist.tex",'w')
file.write('{}'.format(dist))
file.close()

In [36]:
closeness_rank = pd.DataFrame(closeness_rank[0:5], columns=['Author name', 'Closeness'])
closeness_rank['Author name'] = [name.title() for name in closeness_rank['Author name']]
closeness_rank.index += 1

In [37]:
for centrality, label in zip([betweeness, closeness_rank],
                             ['betweness', 'closeness']):
    file = open("/home/nightwing/rsc/Literature-Article/assets/prisoners_centrality_{}.tex".format(label),
                'w')
    file.write('{}'.format(pd.DataFrame(centrality).to_latex()))
    file.close()