# Explanation for each peace of code of WEEK 2 team AA

In [1]:
import random
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

 # Here We are creating the Graph:
 
This line uses the from_pandas_edgelist function from NetworkX to create a graph (G) from the edge DataFrame (edges_df). 
The source and target parameters specify the columns in the DataFrame representing the source and target nodes of each edge.

In [None]:
G = nx.from_pandas_edgelist(edges_df, source='# source', target='target')

# Extracting the Largest Connected Component:

In this block, the code identifies the largest connected component in the graph (G) using the connected_components function from NetworkX. The max function is used to find the largest connected component based on the number of nodes (key=len). Then, a subgraph (G_main) is created, containing only the nodes and edges of the largest connected component.


In [None]:
largest_cc = max(nx.connected_components(G), key=len)
G_main = G.subgraph(largest_cc)


# Custom Functions for Clustering and Transitivity:

This block defines three custom functions:
1- clustering(G, node): Calculates the clustering coefficient for a given node in the graph.
2- average_clustering(G): Computes the average clustering coefficient for the entire graph.
3- custom_transitivity(G): Computes the transitivity of the graph using basic functions of NetworkX.

* I WILL EXPLAIN EVERYONE OF THEM IN DETAILS IN THE FOLLOWING BLOCKS

In [None]:
def clustering(G, node):
    # ... (function implementation)

def average_clustering(G):
    # ... (function implementation)

def custom_transitivity(G):
    # ... (function implementation)


# Function 1: clustering(G, node):

- G: The input parameter is the graph.
- node: The node for which the clustering coefficient is calculated.

1- k = G.degree[node]: Calculates the degree of the node, i.e., the number of edges incident to the node.

2- if k == 0 or k == 1:: Checks if the node has zero or one neighbors. In such cases, the clustering coefficient is defined as 0, as there are no triangles possible.

3- Otherwise:
. List_nodes = [s for s in G.nodes()]: Creates a list of all nodes in the graph.
. i = List_nodes.index(node): Finds the index of the current node in the list.
. A = nx.adjacency_matrix(G): Computes the adjacency matrix of the graph.
. A3 = A**3: Computes the cube of the adjacency matrix.
. triangle = A3[i, i] / 2: Extracts the number of triangles centered at the current node. Dividing by 2 to avoid double counting.
. den = k * (k - 1) / 2: Calculates the denominator of the clustering coefficient formula.
. return triangle / den: Computes and returns the clustering coefficient.

In [None]:
def clustering(G, node):
    k = G.degree[node]
    if k == 0 or k == 1:
        return 0
    else:
        List_nodes = [s for s in G.nodes()]
        i = List_nodes.index(node)
        A = nx.adjacency_matrix(G)
        A3 = A**3
        triangle = A3[i, i] / 2
        den = k * (k - 1) / 2
        return triangle / den


# Function 2: average_clustering(G):

- G: The input parameter is the graph 

1- N = G.number_of_nodes(): Gets the total number of nodes in the graph.

2- Temp_sum = sum(clustering(G, i) for i in G.nodes()): Calculates the sum of clustering coefficients for all nodes using the previously defined clustering function.

3- return Temp_sum / N: Computes and returns the average clustering coefficient by dividing the sum by the total number of nodes.


In [None]:
def average_clustering(G):
    N = G.number_of_nodes()
    Temp_sum = sum(clustering(G, i) for i in G.nodes())
    return Temp_sum / N


# Function 3: custom_transitivity(G):

- G: The input parameter is the graph as usual.

1- triangles = sum(nx.triangles(G, node) for node in G.nodes()) / 3: 
Computes the number of triangles in the graph using NetworkX's triangles function. Divides by 3 to avoid triple counting.

2- triplets = sum(d * (d - 1) for n, d in G.degree()) / 2: 
Calculates the number of triplets in the graph. A triplet is a set of three nodes that are pairwise connected. Divides by 2 to avoid double counting.

3- return triangles / triplets if triplets != 0 else 0: 
Computes and returns the transitivity by dividing the number of triangles by the number of triplets, if triplets is not zero. If triplets are zero, it returns 0 to avoid division by zero.

These custom functions provide alternatives to NetworkX's built-in functions for calculating clustering and transitivity. The clustering function computes the clustering coefficient for a given node, the average_clustering function calculates the average clustering coefficient for the entire graph, and the custom_transitivity function computes the transitivity of the graph using basic operations.

In [None]:
def custom_transitivity(G):
    triangles = sum(nx.triangles(G, node) for node in G.nodes()) / 3
    triplets = sum(d * (d - 1) for n, d in G.degree()) / 2
    return triangles / triplets if triplets != 0 else 0


# Computing Metrics - Average Clustering and Transitivity:

This block calculates the average clustering and transitivity using both NetworkX's built-in functions (nx.average_clustering and nx.transitivity) and the custom functions defined in Block 4. The results are stored in variables for later printing.


In [None]:
avg_clustering_nx = nx.average_clustering(G_main)
avg_clustering_custom = average_clustering(G_main)
transitivity_nx = nx.transitivity(G_main)
transitivity_custom = custom_transitivity(G_main)


# Printing Results:

Here, the code prints the computed values for average clustering and transitivity, both from NetworkX and the custom functions. This provides a comparison between the results obtained using NetworkX's built-in functions and the custom implementations.

In [None]:
print(f"Average Clustering (NetworkX): {avg_clustering_nx}")
print(f"Average Clustering (Custom): {avg_clustering_custom}")
print(f"Transitivity (NetworkX): {transitivity_nx}")
print(f"Transitivity (Custom): {transitivity_custom}")

# Visualization (Optional):

This optional block visualizes the largest connected component of the graph using NetworkX's draw function and matplotlib. It generates a plot with node labels and displays it using plt.show().

In summary, each block plays a crucial role in the overall process, from data preparation and graph creation to metric computation and visualization. The custom functions enhance the capability to compute network metrics beyond what is directly provided by NetworkX.

In [None]:
nx.draw(G_main, with_labels=True)
plt.show()
