# Week 2 Part 1 Assignment: Graph Visualization

Team:
Gabriel Castellanos, Beshkia Kvarnstrom


This week's assignment is to:   1. Load a graph database of your choosing from a text file or other source. If you take a large network dataset from the web (such as from https://snap.stanford.edu/data/), please feel free at this point to load just a small subset of the nodes and edges.  

2. Create basic analysis on the graph, including the graph’s diameter, and at least one other metric of your choosing. You may either code the functions by hand (to build your intuition and insight), or use functions in an existing package.  

3. Use a visualization tool of your choice (Neo4j, Gephi, etc.) to display information.  

4. Please record a short video (~ 5 minutes), and submit a link to the video as part of your homework submission.   

You may work in a small group on this project. Parts one and two should be posted to GitHub and submitted in your assignment link by end of day September 12th. Parts 3 and 4 should be in your video presentation. We may display some of the results in our Meet-up on September 13th 

For this assignment we are using the 'Social circles: Facebook'.

This dataset consists of 'circles' (or 'friends lists') from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.

the dataset was taken from: https://snap.stanford.edu/data/ego-Facebook.html

In [None]:
import requests
import networkx as nx
import matplotlib.pyplot as plt


In [None]:
# Load the graph database from the text file
url = "https://raw.githubusercontent.com/BeshkiaKvarnstrom/DATA-620-Web-Analytics-Assignments/main/Data/facebook_combined.txt"

# Send an HTTP GET request to the GitHub raw file URL
response = requests.get(url)
    

In [None]:
# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Access the content of the response
    file_content = response.text

    # Write the file content to a local file
    with open("facebook_combined.txt", "w") as file:
        file.write(file_content)

    # Read the local file as a graph
    graph = nx.read_edgelist("facebook_combined.txt", create_using=nx.Graph())

    # Limit the number of nodes and edges
    max_nodes = 1000  # Maximum number of nodes to load
    max_edges = 3000  # Maximum number of edges to load
    graph = graph.subgraph(list(graph.nodes())[:max_nodes])
    graph = graph.edge_subgraph(list(graph.edges())[:max_edges])

    # Check if the graph is empty
    if graph.number_of_nodes() == 0 or graph.number_of_edges() == 0:
        print("Graph is empty. Cannot perform calculations.")
    else:
        # Print the number of nodes in the graph
        print("The Number of nodes is:", graph.number_of_nodes())

        # Print the number of edges in the graph
        print("The Number of edges is:", graph.number_of_edges())

        # Calculate clustering coefficient
        try:
            # Calculate another metric of your choosing (e.g., average clustering coefficient)
            clustering_coefficient = nx.average_clustering(graph)
            print("The Average clustering coefficient:", clustering_coefficient)
        except ZeroDivisionError:
            print("Graph has no triangles. Clustering coefficient is undefined.")

        # Calculate average shortest path length
        try:
            # Choose another metric to analyze
            # Here, we'll use the average shortest path length
            average_shortest_path_length = nx.average_shortest_path_length(graph)
            print(
                "The average shortest path length of the graph is:",
                average_shortest_path_length,
            )
        except nx.NetworkXPointlessConcept:
            print("Graph has no paths. Average shortest path length is undefined.")

        # Check if the graph is connected
        if nx.is_connected(graph):
            # Calculate the diameter of the graph
            diameter = nx.diameter(graph)
            print("The diameter of the Graph is:", diameter)
        else:
            print("Graph is not connected. Diameter calculation is undefined.")

            # Calculate the diameter of the largest connected component
            largest_component = max(nx.connected_components(graph), key=len)
            subgraph = graph.subgraph(largest_component)
            diameter = nx.diameter(subgraph)
            print("The diameter of the largest connected component is:", diameter)

        degree_centrality = nx.degree_centrality(graph)
        highest_degree_nodes = sorted(
            degree_centrality, key=degree_centrality.get, reverse=True
        )[:5]

        print("Nodes with highest degree centrality:")
        for node in highest_degree_nodes:
            print("Node:", node, "Degree Centrality:", degree_centrality[node])


        # Visualize the graph
        plt.figure(figsize=(10, 8))
        nx.draw_networkx(
            graph,
            with_labels=True,
            node_color="#26004d",
            node_size=600,
            font_weight="bold",
            edge_color="#e6ccff",
            alpha=0.5,
        )
        plt.show(block=True)  # Keep the figure window open

else:
    print("Failed to retrieve the Graph Database file. Status code:", response.status_code)