# Instructions
On this first assignment, applying the basic functions of the Igraph package is required. The following datasets are going to be used:

* Actors dataset - undirected graph - : For the 2005 Graph Drawing conference a data set was provided of the IMDB movie database. We will use a reduced version of this dataset, which derived all actor-actor collaboration edges where the actors co-starred in at least 2 movies together between 1995 and 2004. 


You have to complete the code chunks in this document but also analyze the results, extract insights and answer the short questions. Fill the CSV attached with your answers, sometimes just the number is enough, some others just a small sentence or paragraph. Remember to change the header with your email.

In your submission please upload both this document in HTML and the CSV with the solutions.


# Loading data

In this section, the goal is loading the datasets given, building the graph and analyzing basics metrics. Include the edge or node attributes you consider.

Describe the values provided by summary function on the graph object.

**1) How many nodes are there?**

**2) How many edges are there?**

In [1]:
import networkx as nx
import pandas as pd
import numpy as np

In [2]:
#import tsv using pandas
df_edges = pd.read_csv('imdb_actor_edges.tsv', sep='\t')

In [3]:
#show head of df_edges
df_edges.head()

Unnamed: 0,from,to,weight
0,17776,17778,6
1,5578,9770,3
2,5578,929,2
3,5578,9982,2
4,1835,6278,2


In [17]:
df_edges.dtypes

from      int64
to        int64
weight    int64
dtype: object

In [4]:
df_key = pd.read_csv('imdb_actors_key.tsv', sep='\t', encoding='latin-1')

In [5]:
df_key.head()

Unnamed: 0,id,name,movies_95_04,main_genre,genres
0,15629,"Rudder, Michael (I)",12,Thriller,"Action:1,Comedy:1,Drama:1,Fantasy:1,Horror:1,N..."
1,5026,"Morgan, Debbi",16,Drama,"Comedy:2,Documentary:1,Drama:6,Horror:2,NULL:3..."
2,11252,"Bellows, Gil",33,Drama,"Comedy:6,Documentary:1,Drama:7,Family:1,Fantas..."
3,5150,"Dray, Albert",20,Comedy,"Comedy:6,Crime:1,Documentary:1,Drama:4,NULL:5,..."
4,4057,"Daly, Shane (I)",18,Drama,"Comedy:2,Crime:1,Drama:7,Horror:1,Music:1,Musi..."


In [6]:
#creating a graph object
G = nx.Graph()


In [11]:
#adding nodes to the graph
for key in df_key['id']:
    G.add_node(id)


In [12]:
#list nodes in G
G.nodes()

NodeView((15629, 5026, 11252, 5150, 4057, 12373, 3453, 9878, 4988, 13032, 13060, 3541, 9373, 4120, 14308, 13517, 12059, 3932, 577, 1362, 9764, 6511, 10932, 13040, 2416, 9554, 11701, 11054, 13485, 10043, 14551, 893, 9573, 647, 1867, 16112, 12091, 1460, 11180, 7320, 15151, 6151, 2645, 17284, 4152, 15006, 9570, 5314, 560, 15399, 2101, 9998, 17796, 14912, 10650, 7204, 5434, 4433, 16012, 9599, 15167, 6945, 2875, 817, 14305, 11818, 13947, 538, 17479, 1166, 397, 17872, 10103, 17326, 14786, 10672, 10577, 16156, 10489, 2958, 10840, 5850, 12291, 1615, 6812, 17022, 6434, 11935, 10738, 7242, 5622, 10965, 1527, 17528, 918, 14914, 15513, 9652, 368, 13696, 1245, 4301, 12830, 11700, 1640, 13909, 16662, 3328, 11318, 3860, 4801, 10388, 10740, 11487, 11616, 2040, 3320, 5214, 4248, 7956, 1917, 5298, 17606, 458, 17632, 8412, 4158, 4263, 6682, 10457, 7175, 6292, 2441, 8820, 4626, 12386, 8006, 1965, 14370, 1585, 9629, 1794, 6192, 15785, 3683, 6836, 15989, 299, 5242, 6368, 9656, 15946, 16827, 4440, 14708, 294

In [16]:
#adding edges to the graph
for i, row in df_edges.iterrows():
    source = row['from']
    target = row['to']
    G.add_edge(source, target)


In [20]:
print(nx.is_connected(G))

False


In [18]:
#the graph is not connected
connected_components = list(nx.connected_components(G))

In [21]:
#export graph to gexf
nx.write_gexf(G, 'imdb_actor_edges.gexf')

In [52]:
#add edges from df_edges
edges = nx.from_pandas_edgelist(df_edges, 'from', 'to', 'weight')

#add nodes from df_key
nodes = nx.from_pandas_edgelist(df_key, 'id', 'name')

In [53]:
#make graph g1 by adding nodes and edges and connecting them
g1 = nx.Graph()
g1.add_nodes_from(nodes)
g1.add_edges_from(edges.edges)
g1.add_nodes_from(df_key['id'])

 

In [55]:
#connect graph to itself
g1.add_edges_from(g1.edges())

In [56]:
#show graph
nx.draw(g1, with_labels=True)

TypeError: '_AxesStack' object is not callable

<Figure size 640x480 with 0 Axes>

In [13]:
#show graph info
print(nx.info(G))


Graph with 17577 nodes and 287074 edges



  print(nx.info(G))


In [14]:

#show graph density
print(nx.density(G))


0.001858484997760941


In [15]:

#show graph diameter
print(nx.diameter(G))


NetworkXError: Found infinite path length because the graph is not connected

In [16]:

#show graph average shortest path length
print(nx.average_shortest_path_length(G))
 


NetworkXError: Graph is not connected.

In [None]:
#show graph average clustering coefficient
print(nx.average_clustering(G))


In [None]:

#show graph average degree
print(np.mean(list(dict(G.degree()).values())))


In [None]:

#show graph average degree centrality
print(np.mean(list(nx.degree_centrality(G).values())))


In [None]:

#show graph average closeness centrality
print(np.mean(list(nx.closeness_centrality(G).values())))


In [None]:

#show graph average betweenness centrality
print(np.mean(list(nx.betweenness_centrality(G).values())))


In [22]:

#show graph average eigenvector centrality
print(np.mean(list(nx.eigenvector_centrality(G).values())))


PowerIterationFailedConvergence: (PowerIterationFailedConvergence(...), 'power iteration failed to converge within 100 iterations')

In [None]:

#show graph average pagerank
print(np.mean(list(nx.pagerank(G).values())))


In [None]:

#show graph average katz centrality
print(np.mean(list(nx.katz_centrality(G).values())))


In [None]:

#show graph average harmonic centrality
print(np.mean(list(nx.harmonic_centrality(G).values())))


In [None]:

#show graph average communicability centrality
print(np.mean(list(nx.communicability_centrality(G).values())))


# Degree distribution

Analyse the degree distribution. Compute the total degree distribution.

**3) How does this distributions look like?**



**4) What is the maximum degree?**



**5) What is the minum degree?**

# Network Diameter and Average Path Length

You have functions in igraph to calculate the diameter and the average path length. Think if you should consider the weights, the directions, etc.

**6) What is the diameter of the graph?**



**7) What is the avg path length of the graph?**

# Node importance: Centrality measures

(Optional but recommended): Obtain the distribution of the number of movies made by an actor and the number of genres in which an actor starred in. It may be useful to analyze and discuss the results to be obtained in the following exercises.

Obtain three vectors with the degree, betweeness and closeness for each vertex of the actors' graph.

Obtain the list of the 20 actors with the largest degree centrality. It can be useful to show a list with the degree, the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

**8) Who is the actor with highest degree centrality?**



**9) How do you explain the high degree of the top-20 list??**

Obtain the list of the 20 actors with the largest betweenness centrality. Show a list with the betweenness, the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

**10) Who is the actor with highest betweenes?**



**11) How do you explain the high betweenness of the top-20 list?**

Obtain the list of the 20 actors with the largest closeness centrality. Show a list with the closeness the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.

**12) Who is the actor with highest closeness centrality?**



**13) How do you explain the high closeness of the top-20 list?**