# Network Visualisation

## In this notebook:

1. We will create a basic a network using networkx
2. We will learn how to manipulate aspects of that graph
3. We will use Twitter retweet data to make a graph 
4. There is an optional extra part where we can use Gephi to work with larger datasets


# Part 1: Basic network graphs

Networks can be used to show connections. This is a great way to visualise connections in social media and helps us to understand influence -- who is listening to who

## Questions & Objectives

* We are going to download the libraries required
* We will make an initial graph
* We will learn how to manipulate aspects of that graph
* we will try different projections of that graph

### 1. Firstly we will download the libraries 

In [None]:
# Uncomment these sections the first time you run this cell. This will update the libaries. 
# If you don't do this you will get an error when we use the Twitter data to make larger graphs later down
# Once installed restart the kernel and comment out this section again.

#!pip install --upgrade networkx
#!pip install --upgrade scipy networkx

# Import the libraries we will use today
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib

### 2. We are going to draw an initial graph with a few nodes to learn the basics.

In [None]:
# Networks use plt to create the visualisation
# Here we set the visualisation size

fig, ax = plt.subplots(figsize=(15,8))

# We then set up the nodes in the graph (the circles) and the edges (the lines)
relationships = pd.DataFrame({'from': ['Luke','Clare', 'Clare', 'Clare', 'Laura'], 
                              'to':   ['Robin','Laura', 'Robin', 'Luke', 'Robin']})

# We then get networkx to create the graph
G = nx.from_pandas_edgelist(relationships, 'from', 'to', create_using=nx.Graph())

# We then draw the graph -- we set it to have labels, green nodes and we set the node size
nx.draw(G, with_labels=True, node_color='green', node_size=2000)


### 🖇🐛 Minitask: Change the network graph

Try the above again and add yourself into the network. Change the sizes and colours of the nodes.

### 3. We will use arrays to set colour and size  

In [None]:
# Here we set a new plot with overall figure size
fig, ax = plt.subplots(figsize=(15,8))

# We set up the nodes and edges
relationships = pd.DataFrame({'from': ['Clare','Clare','Clare'], 
                              'to':   ['Luke','Laura','Robin']})
# We create the graph
G = nx.from_pandas_edgelist(relationships, 'from', 'to', create_using=nx.Graph())

# Here we set an array so we can change the colours and sizes more easily
edge_colors = ['blue', 'red', 'green']
node_sizes = [10000, 2000, 3000, 4000]
node_color =['red','green','yellow', 'blue']

# we draw the graph and reference the arrays
nx.draw(G, with_labels=True, node_color=node_color, node_size=node_sizes, width=3, edge_color=edge_colors)

### 🖇🐛 Minitask: 

Try changing the colours of the nodes and edges. Add in more nodes.


### 4. We will now try different ways of drawing the graphs. These are called projections.  We are going to grop together different nodes so that thay have a type.

In [None]:
# Here we set a new plot with overall figure size
f = plt.figure(figsize=(15,8))
# We can set the graphs to have a 'tight' layout
f.tight_layout()

# Specify data and attributes as before
relationships = pd.DataFrame({'from': ['CLARE', 'CLARE', 'CLARE', 'ROBIN', 'ROBIN', 'ROBIN', 'ROBIN', 'LUKE', 'LUKE', 
                                       'LUKE', 'LAURA', 'LAURA', 'LAURA', 'LAURA'], 
                              'to': ['GERRY', 'LAURA', 'ADAM', 'LUKE', 'LAURA', 'ADAM', 'GERRY', 'GERRY', 'STUART', 'LAURA',
                                     'LUKE', 'GERRY', 'STUART', 'ADAM']})

# Create DF for node characteristics
carac = pd.DataFrame({'ID':['CLARE','ROBIN','LUKE','LAURA','GERRY','STUART','ADAM'], 
                      'type':['1','1', '1', '2', '2', '3','3']})

# Create graph object
G = nx.from_pandas_edgelist(relationships, 'from', 'to', create_using=nx.Graph())

# Make types into categories
carac= carac.set_index('ID')
carac=carac.reindex(G.nodes())

carac['type']=pd.Categorical(carac['type'])
carac['type'].cat.codes

# Set node colors
cmap = matplotlib.colors.ListedColormap(['blue', 'green', 'orange'])

# Set node sizes -- we are making those of type 1 bigger 
node_sizes = [4000 if entry != '1' else 2000 for entry in carac.type]

# draw the graph
nx.draw(G, with_labels=True, node_color=carac['type'].cat.codes, cmap=cmap, 
        node_size = node_sizes, edgecolors='gray')



### 5. We will now try different ways of drawing the graphs. These are called projections. 

In [None]:
# Here we set a new plot with overall figure size
f = plt.figure(figsize=(20,20))
# We can set the graphs to have a 'tight' layout
f.tight_layout()

# Create different layouts

# Subplot 1
plt.subplot(2, 2, 1)
nx.draw(G, with_labels=True, node_color=carac['type'].cat.codes, cmap=cmap, 
        node_size = node_sizes, edgecolors='gray')
plt.title('Spring Layout (Default)', fontsize=18)

# Subplot 2
plt.subplot(2, 2, 2)
nx.draw_random(G, with_labels=True, node_color=carac['type'].cat.codes, cmap=cmap, 
               node_size = node_sizes, edgecolors='gray')
plt.title('Random Layout', fontsize=18)

# Subplot 3
plt.subplot(2, 2, 3)
nx.draw_shell(G, with_labels=True, node_color=carac['type'].cat.codes, cmap=cmap, 
            node_size = node_sizes, edgecolors='gray')
plt.title('Shell Layout', fontsize=18)

# Subplot 4
plt.subplot(2, 2, 4)
nx.draw_spectral(G, with_labels=True, node_color=carac['type'].cat.codes, cmap=cmap, 
            node_size = node_sizes, edgecolors='gray')
plt.title('Spectral Layout', fontsize=18)

# Part 2: Working with Twitter Data

We are now going to look at using some real social media data from Twitter

## Questions & Objectives

* We are going to read in some historic Twitter data. This data was gathered on 23rd June 2016. It is the from three accounts that retweetwd most on the topic of Brexit on that day. The file gives you the tweet ID, the person who retweeted, the person who originally tweeted
* We will hold the data is a special data frame called Pandas which will allow us to manage this large data set more easily. 
* We will look at the data and see what it contains. 
* Ww will map this data as a network


### 1. Read in the file and take a look at it.

In [None]:
#Here we use Pandas so we can use its DataFrame format to hold the data.
df = pd.read_csv('network_dict_tiny.csv')
df.head()

### 2. We can explore different aspects of the data

In [None]:
df['Retweeter'].count()

In [None]:
df['Tweeter'].count()


In [None]:
df['Retweeter'].nunique()

In [None]:
df['Tweeter'].nunique()

### 🖇🐛Minitask - Thinking task

Think about why the data has nodes that re-tweet lots of data. What are these Twitter accounts trying to do? Who do you think they may be?


### 2. Lets draw a network graph of the data like we did before


In [None]:
# Specify data and attributes by reading from the data frame
relationships = pd.DataFrame({'from': df['Retweeter'].tolist(), 
                              'to': df['Tweeter'].tolist()})

f = plt.figure(figsize=(15,8))
f.tight_layout()
# Create graph object
G = nx.from_pandas_edgelist(relationships, 'from', 'to', create_using=nx.Graph())
nx.draw(G)


### 🖇🐛Minitask - Thinking task

What do you think this structure tells us about the data?

### 3. Lets and add more information in and see what it can tell us?

In [None]:
f = plt.figure(figsize=(10,10))
f.tight_layout()
nx.draw(G, with_labels=True, node_color='green', node_size=900)

### 🖇🐛Minitask - Can you make the network more readable?

The graph above is not very useable. Can you manipulate the code to make it better?

### 4. Lets see if adding types to the data makes it more readable

In [None]:
# we are going to add in three types 
# node that are tweeter, nodes that are retweeters and nodes which are both tweeters and retweeters
# we are going to set up two lists one for retweeters and one for tweeters
retweeters = df['Retweeter'].tolist()
tweeters = df['Tweeter'].tolist()

# here we set up lists to hold the ids and types
ids = []
types = []

# we cycle through the list of retweeters if they are not already in the id list we add them 
# before we add the type we check if it is also in the tweeter list
for each_retweeter in retweeters:
    if each_retweeter not in ids:
        if each_retweeter not in tweeters:
            types.append('retweeter')
        else:
            types.append('both')
        ids.append(each_retweeter)
for each_tweeter in tweeters:
    if each_tweeter not in ids:
        if each_tweeter not in retweeters:
            types.append('tweeter')
            ids.append(each_tweeter)       

In [None]:
print(ids)

In [None]:
print(types)

### 5. Lets make the network graph

In [None]:
# Create DF for node characteristics
carac = pd.DataFrame({'ID':ids, 
                      'type':types})

# Create graph object
G = nx.from_pandas_edgelist(relationships, 'from', 'to', create_using=nx.Graph())

# Make types into categories
carac= carac.set_index('ID')
carac=carac.reindex(G.nodes())
carac['type']=pd.Categorical(carac['type'])
carac['type'].cat.codes


# Set node sizes and the node colours
node_sizes = [10000 if entry != 'tweeter' else 200 for entry in carac.type]
node_color= ["blue" if entry =="tweeter" else "red" if entry =="both" else "orange" for entry in carac.type]

f = plt.figure(figsize=(20,20))

nx.draw(G, with_labels=True, node_color=node_color, cmap=cmap, 
        node_size = node_sizes, edgecolors='black')

### 🖇🐛Minitask - Can you do any better? I couldn't in the time I had but maybe you can?

## Part 3: Optional extra task

Network graph is limited by the presentation tools offered by matplotlib.

We can export the graph and use a more sophisticated visualisation tool. We will need to do this outside Notable.

1. Download the gml -- this is the graph and can be imported into another tool
2. Download Gephi from https://gephi.org/
3. Download the file from Notable onto you own machine and upload into Gephi 
4. Please ask if you get stuck
5. I have also included further data sets to play with -- network_dict_shortest.csv contains the top 10 retweeters and network_dict_short.csv contains the top 100. Good luck!  

In [None]:
# Use this to download the graph stucture
nx.write_gml(G,"tweets.gml")

# save it on your local machine and upload it into Gephi