# Network Statistics of MovieLens 100K Dataset
This file contains network statistics of the MovieLens 100K dataset.

In [24]:
import networkx as nx
from networkx.algorithms import bipartite

G = nx.Graph()

f = open("data/raw/u.data")
for line in f.readlines():
    source, target, weight = line.split()
    G.add_node(source + "a", bipartite=0, id=source)
    G.add_node(target + "b", bipartite=1, id=target)
    G.add_edge(source + "a", target + "b", weight=weight)

# print(G)

## Number of Nodes and Edges
<!-- 4 by 2 table -->
|---------------------------| --- |
| Number of Nodes           | 2625  |
| Left Partition (Users)    | 943 |
| Right Partition (Movies)  | 1682 |
| Number of Edges (Ratings) | 100000 |

The network contains 2625 nodes and 100000 edges. Because the network is bipartite, we are also interested in each partition separately - the left partition contains 943 nodes and the right partition contains 1682 nodes. 
That means 943 users rated 1682 movies. The total number of ratings was 100000.


In [25]:
users =  {n for n, d in G.nodes(data=True) if d["bipartite"] == 0}
movies = G.nodes - users

print(G)
print("Number of users:", len(users))
print("Number of movies:", len(movies))


Graph with 2625 nodes and 100000 edges
Number of users: 943
Number of movies: 1682


## Node Degree
<!-- 4 by 2 table -->
|----------------------| --- |
| Max Degree           | 737  |
| Max Left Degree      | 737 |
| Max Right Degree     | 583 |
| Average Degree       | 76.19 |
| Average Left Degree  | 106.05 |
| Average Right Degree | 59.45 |

The maximum degree is 737 and the average degree is 76.19. 

**Left partition (Users)** - the maximum degree is 737 and the average degree is 106.05. That means that the user with maximum number of movie ratings rated 737 movies. Each user rated 106.05 movies on average.

**Right partition (Movies)** - the maximum degree is 583 and the average degree is 59.45. That means the movie with maximum number of ratings has 583 ratings. Each movie has 59.45 ratings on average.



In [26]:
degrees = G.degree
user_degrees, movie_degrees = bipartite.degrees(G, movies)

def avg(degree_array):
    return sum(deg for id, deg in degree_array)/len(degree_array)

def max_degree(degree_array):
    return max(deg for id, deg in degree_array)

print("Average degree:", avg(degrees))
print("Average user degree:", avg(user_degrees))
print("Average movie degree:", avg(movie_degrees))

print("Max degree:", max_degree(degrees))
print("Max user degree:", max_degree(user_degrees))
print("Max movie degree:", max_degree(movie_degrees))


Average degree: 76.19047619047619
Average user degree: 106.04453870625663
Average movie degree: 59.45303210463734
Max degree: 737
Max user degree: 737
Max movie degree: 583


## Clustering

## Degree Distribution