<a href="https://colab.research.google.com/github/andrybrew/sma-health/blob/master/03_unstructured_data_social_network_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Covid-19 and Vaccine Information Spread in Indonesia**

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, Hubei, China, and has resulted in an ongoing pandemic. The first confirmed case has been traced back to 17 November 2019 in Hubei.As of 6 August 2020, more than 18.7 million cases have been reported across 188 countries and territories, resulting in more than 706,000 deaths. More than 11.3 million people have recovered.

Until now, a vaccine for this disease has not been found. However, lately a lot of parties have started to raise rumors about the issue of vaccine findings.

Most of the people today share opinions and information about COVID-19 vaccine through various social media, including Twitter. 

## **A. Collecting Twitter Data**

To collect twitter data in python, we can use Tweepy. Tweepy is the most popular Python Package for accessing the Twitter API, You can read the full documentation [HERE](https://tweepy.readthedocs.io/en/latest/). In this practice, we will practice to get tweet data using a specific keyword and save it as .CSV files

**Install & Import Libraries**

In [None]:
# Install Library
!pip install tweepy

In [None]:
# Import Libraries
import tweepy
import pandas as pd
import numpy as np
import sys
import csv

**Set API Key**

In [None]:
# Fill the API Key
consumer_key = 'V6ecyXIMwnIMNSwkL2Aufh7kn'
consumer_secret = 'CZso5Sq8lSGC3fNntvli3YYJk5xZIQpuT6nJrgaqIBRPqY2Z6M'
access_token = '159365416-TMpXIVMZQvIMCXcG8kBuPes8P9Bx7Dnv2LEJJc4P'
access_token_secret = 'IVHB1ftl2vJTvqdRPDlePABIwUHmX2yhAZe5YQ9z01gSU'

# Auth. 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
api

**Set Tweet Requirements**

In [None]:
# Set Word to Follow
keyword = 'covid'

# Set Number of Tweets
number_of_tweets = 100

# Tweet Mode
tweet_mode = 'extended'

# Set Language
language = 'id'

**Get the Tweets**

In [None]:
# Finding Tweets by Keyword
crawling_keyword = api.search(keyword, count=number_of_tweets, tweet_mode = tweet_mode, lang=language)

# Show Tweets
crawling_keyword

**Transform Tweets to Dataframe**

In [None]:
# Create Tweets Data Frame
df_crawling_keyword = pd.DataFrame({
                      'time' : [tweet.created_at for tweet in crawling_keyword],
                      'description' : [tweet.user.description for tweet in crawling_keyword],
                      'usertweets' : [tweet.user.statuses_count for tweet in crawling_keyword],
                      'source' : [tweet.user.screen_name for tweet in crawling_keyword],
                      'target' : [tweet.in_reply_to_screen_name for tweet in crawling_keyword],
                      'verified' : [tweet.user.verified for tweet in crawling_keyword],
                      'text' : [tweet.full_text for tweet in crawling_keyword],
                      'hashtags' : [tweet.entities['hashtags'] for tweet in crawling_keyword],
                      'location' : [tweet.user.location for tweet in crawling_keyword],
                      'following' : [tweet.user.friends_count for tweet in crawling_keyword],
                      'followers' : [tweet.user.followers_count for tweet in crawling_keyword],
                      'retweets' : [tweet.retweet_count for tweet in crawling_keyword],
                      })

df_crawling_keyword

**Save Tweets**

In [None]:
# Save as .CSV
df_crawling_keyword.to_csv('crawling_keyword.csv')

## **B. Social Network Analysis: Covid-19**

Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be defined as a graph in which nodes and/or edges have attributes (e.g. names).

Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them.

In this prcatice we will use NetworkX. NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. You can see the full documentation of NetworkX HERE

**Install & Import Libraries**

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
import community
import seaborn as sns

### **Network Construction**

Here we construct a social network based on conversations about covid on Twitter.

**Import Edge List Data**

In [None]:
# Import Data
df_tweets = pd.read_csv('https://raw.githubusercontent.com/andrybrew/sma-health/master/data/edgelist_covid.csv', sep=',')

# Show Data
df_tweets

**Visualize the Network**

In [None]:
# Contstruct a Network
G1 = nx.from_pandas_edgelist(df_tweets)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color='skyblue', node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9,
        pos=nx.kamada_kawai_layout(G1))

### **Network Metrics and Measurement**

**Centrality Measurement**

In graph theory and network analysis, indicators of centrality identify the most important vertices within a graph. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, and super-spreaders of disease. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

In [None]:
# Degree Centrality
degree = nx.degree_centrality(G1)

# Sorted from the Highest
sorted(nx.degree(G1), key=lambda x: x[1], reverse=True)[0:10]

In [None]:
# Betweenness Centrality
betweenness = nx.betweenness_centrality(G1)

# Sorted from the Highest
sorted(nx.betweenness_centrality(G1, normalized=True).items(), key=lambda x:x[1], reverse=True)[0:10]

In [None]:
# Closeness Centrality
closeness = nx.closeness_centrality(G1)

# Sorted from the Highest
sorted(nx.closeness_centrality(G1).items(), key=lambda x:x[1], reverse=True)[0:10]

In [None]:
# Eigenvector Centrality
eigenvector = nx.eigenvector_centrality_numpy(G1)

# Sorted from the Highest
sorted(nx.eigenvector_centrality_numpy(G1).items(), key=lambda x:x[1], reverse=True)[0:10]

***Visualize Centrality Score with Scatter Plot***

In [None]:
# Convert Centralities to Data Frame
df_degree = pd.Series(degree).to_frame('degree_centrality')
df_betweenness = pd.Series(betweenness).to_frame('betweenness_centrality')
df_closeness = pd.Series(closeness).to_frame('closeness_centrality')
df_eigenvector = pd.Series(eigenvector).to_frame('eigenvector_centrality')

# Join Centralities Data Frame
df_centrality = pd.concat([df_degree, df_betweenness, df_closeness, df_eigenvector], axis = 1)
df_centrality['username'] = df_centrality.index
df_centrality = df_centrality.reset_index(drop = True)
df_centrality = df_centrality.sort_values(by=['degree_centrality'], ascending = False)
df_centrality = df_centrality.melt('username', var_name='cols',  value_name='centrality')
df_centrality

# Visualize Scatter Plot
plt.figure(figsize=(20,9))
sns.scatterplot(x='username', y='centrality', hue='cols', data=df_centrality)

***Visualize Network based on Centrality Measurement***

In [None]:
# Set Degree Dictionary
d = dict(degree)

# Contstruct a Network
G1 = nx.from_pandas_edgelist(df_tweets)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color='skyblue', nodelist=d.keys(),
        node_size=[v * 60000 for v in d.values()], 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=10,
        pos=nx.kamada_kawai_layout(G1))

**Network Topology Measurement**

The configuration, or topology, of a network is key to determining its performance. Network topology is the way a network is arranged, including the physical or logical description of how links and nodes are set up to relate to each other.

In [None]:
# Show Number of Nodes
nx.number_of_nodes(G1)

In [None]:
# Show Number of Edges
nx.number_of_edges(G1)

In [None]:
# Show Graph Density
nx.density(G1)

In [None]:
# Show Number of Connected Component
nx.number_connected_components(G1)

### **Community Detection**

Community detection is a fundamental problem in social network analysis consisting, roughly speaking, in dividing social actors (modelled as nodes in a social graph) with certain social connections (modelled as edges in the social graph) into densely knitted and highly related groups with each group well separated from different group members.

**Modularity Community**

In [None]:
# Import Module
from networkx.algorithms.community import greedy_modularity_communities

# Modularity Community Detection
communities_m = sorted(greedy_modularity_communities(G1), key=len, reverse=True)
communities_m

In [None]:
# Set Node Community Function
def set_node_community(G1, communities_m):
      '''Add community to node attributes'''
      for c, v_c in enumerate(communities_m):
        for v in v_c:
          # Add 1 to save 0 for external edges
          G1.nodes[v]['community'] = c + 1      

In [None]:
# Set Colour Function
def get_color(i, r_off=1, g_off=1, b_off=1):
     '''Assign a color to a vertex.'''
     r0, g0, b0 = 0, 0, 0
     n = 16
     low, high = 0.1, 0.9
     span = high - low
     r = low + span * (((i + r_off) * 3) % n) / (n - 1)
     g = low + span * (((i + g_off) * 5) % n) / (n - 1)
     b = low + span * (((i + b_off) * 7) % n) / (n - 1)
     return (r, g, b) 

In [None]:
# Set Node Communities
community = set_node_community(G1, communities_m)

# Set Node Color
node_color = [get_color(G1.nodes[v]['community']) for v in G1.nodes]

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color = node_color, node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=10, map = plt.get_cmap('jet'),
        pos=nx.kamada_kawai_layout(G1))



**Another Ways to Visualize Network**

In [None]:
! pip install nxviz

Arcplot

In [None]:
# Import Module
from nxviz import ArcPlot

# Visualize the Network
arcplot = ArcPlot(G1, node_color='community', node_orders='community', node_grouping='community',
                  figsize=(20, 20), group_label_position="middle", nodeprops={"radius": 2},
                  fontsize=12, fontfamily="fantasy")
arcplot.draw()
plt.show()

Circosplot

In [None]:
# Import Module
from nxviz import CircosPlot

# Visualize the Network
circosplot = CircosPlot(G1,
                        node_color='community', node_orders='community', node_grouping='community',
                        figsize=(20, 20), group_label_position="middle", nodeprops={"radius": 2},
                        fontsize=12, fontfamily="fantasy")
circosplot.draw()
plt.show()

## **C. Social Network Analysis: Vaksin**

### **Network Construction**

Here we construct a social network based on conversations about covid on Twitter.

**Import Edge List Data**

In [None]:
# Import Data
df_tweets = pd.read_csv('https://raw.githubusercontent.com/sma-health/data/master/edgelist_vaksin.csv', sep=',')

# Show Data
df_tweets

**Visualize the Network**

In [None]:
# Contstruct a Network
G1 = nx.from_pandas_edgelist(df_tweets)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color='skyblue', node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=9,
        pos=nx.kamada_kawai_layout(G1))

### **Network Metrics and Measurement**

**Centrality Measurement**

In [None]:
# Degree Centrality
degree = nx.degree_centrality(G1)

# Sorted from the Highest
sorted(nx.degree(G1), key=lambda x: x[1], reverse=True)[0:10]

In [None]:
# Betweenness Centrality
betweenness = nx.betweenness_centrality(G1)

# Sorted from the Highest
sorted(nx.betweenness_centrality(G1, normalized=True).items(), key=lambda x:x[1], reverse=True)[0:10]

In [None]:
# Closeness Centrality
closeness = nx.closeness_centrality(G1)

# Sorted from the Highest
sorted(nx.closeness_centrality(G1).items(), key=lambda x:x[1], reverse=True)[0:10]

In [None]:
# Eigenvector Centrality
eigenvector = nx.eigenvector_centrality_numpy(G1)

# Sorted from the Highest
sorted(nx.eigenvector_centrality_numpy(G1).items(), key=lambda x:x[1], reverse=True)[0:10]

***Visualize Centrality Score with Scatter Plot***

In [None]:
# Convert Centralities to Data Frame
df_degree = pd.Series(degree).to_frame('degree_centrality')
df_betweenness = pd.Series(betweenness).to_frame('betweenness_centrality')
df_closeness = pd.Series(closeness).to_frame('closeness_centrality')
df_eigenvector = pd.Series(eigenvector).to_frame('eigenvector_centrality')

# Join Centralities Data Frame
df_centrality = pd.concat([df_degree, df_betweenness, df_closeness, df_eigenvector], axis = 1)
df_centrality['username'] = df_centrality.index
df_centrality = df_centrality.reset_index(drop = True)
df_centrality = df_centrality.sort_values(by=['degree_centrality'], ascending = False)
df_centrality = df_centrality.melt('username', var_name='cols',  value_name='centrality')
df_centrality

# Visualize Scatter Plot
plt.figure(figsize=(20,9))
sns.scatterplot(x='username', y='centrality', hue='cols', data=df_centrality)

***Visualize Network based on Centrality Measurement***

In [None]:
# Set Degree Dictionary
d = dict(degree)

# Contstruct a Network
G1 = nx.from_pandas_edgelist(df_tweets)

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color='skyblue', nodelist=d.keys(),
        node_size=[v * 60000 for v in d.values()], 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=10,
        pos=nx.kamada_kawai_layout(G1))

**Network Topology Measurement**

In [None]:
# Show Number of Nodes
nx.number_of_nodes(G1)

In [None]:
# Show Number of Edges
nx.number_of_edges(G1)

In [None]:
# Show Graph Density
nx.density(G1)

In [None]:
# Show Number of Connected Component
nx.number_connected_components(G1)

### **Community Detection**

**Modularity Community**

In [None]:
# Import Module
from networkx.algorithms.community import greedy_modularity_communities

# Modularity Community Detection
communities_m = sorted(greedy_modularity_communities(G1), key=len, reverse=True)
communities_m

In [None]:
# Set Node Community Function
def set_node_community(G1, communities_m):
      '''Add community to node attributes'''
      for c, v_c in enumerate(communities_m):
        for v in v_c:
          # Add 1 to save 0 for external edges
          G1.nodes[v]['community'] = c + 1      

In [None]:
# Set Colour Function
def get_color(i, r_off=1, g_off=1, b_off=1):
     '''Assign a color to a vertex.'''
     r0, g0, b0 = 0, 0, 0
     n = 16
     low, high = 0.1, 0.9
     span = high - low
     r = low + span * (((i + r_off) * 3) % n) / (n - 1)
     g = low + span * (((i + g_off) * 5) % n) / (n - 1)
     b = low + span * (((i + b_off) * 7) % n) / (n - 1)
     return (r, g, b) 

In [None]:
# Set Node Communities
community = set_node_community(G1, communities_m)

# Set Node Color
node_color = [get_color(G1.nodes[v]['community']) for v in G1.nodes]

# Visualize the Network
import matplotlib.pyplot as plt
plt.figure(figsize=(50,50))
nx.draw(G1, with_labels=True, 
        node_color = node_color, node_size=1200, 
        arrowstyle='->',arrowsize=20, edge_color='r',
        font_size=10, map = plt.get_cmap('jet'),
        pos=nx.kamada_kawai_layout(G1))



**Another Ways to Visualize Network**

Arcplot

In [None]:
# Import Module
from nxviz import ArcPlot

# Visualize the Network
arcplot = ArcPlot(G1, node_color='community', node_orders='community', node_grouping='community',
                  figsize=(20, 20), group_label_position="middle", nodeprops={"radius": 2},
                  fontsize=12, fontfamily="fantasy")
arcplot.draw()
plt.show()

Circosplot

In [None]:
# Import Module
from nxviz import CircosPlot

# Visualize the Network
circosplot = CircosPlot(G1,
                        node_color='community', node_orders='community', node_grouping='community',
                        figsize=(20, 20), group_label_position="middle", nodeprops={"radius": 2},
                        fontsize=12, fontfamily="fantasy")
circosplot.draw()
plt.show()