# What are the most important nodes in the network?

Our assumption is that the important nodes in this kind of network are the ones which have high degree. These are the ones that will act as hubs for the entire network. We believe that big metropolitan cities and urban centers where people migrate for employment opportunities to be key nodes in our network.

In [None]:
node_degrees = dict(G.degree())

# Sort nodes based on degree in descending order and get top 10
top_10_nodes = sorted(node_degrees, key=node_degrees.get, reverse=True)[:10]

# Print the top 10 nodes along with their positions, names, and degrees
print("Top 10 nodes with highest degree:")
for node in top_10_nodes:
    pos = G.nodes[node]['pos']
    label = G.nodes[node]['label']
    degree = node_degrees[node]
    print("Node:", label)
    print("Position:", pos)
    print("Degree:", degree)
    print()

In [None]:
[G.nodes[node]['label'] for node in top_10_nodes]

In [None]:
import cartopy.crs as ccrs

# Plot the map of India
plt.figure(figsize=(10, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_extent([68, 98, 6, 30])
ax.coastlines()

# manual adjustments for better visualisation
text_offsets = [[-0.8,0.6], [-1, -1], [0.4,0.4], [0.5, 0.5], [0.5, 0.4], [0.4, 0.4], [0.4, -0.4], [0.4, 0.4], [0, 0.4], [-1.8, 0.6]]
pos_offsets = [[0,0], [-0.5, -0.5], [0,0], [0.2, 0.2], [0,0], [0,0], [0,0], [0,0], [0,0], [0,0]]

# Plot nodes on the map
for i, node in enumerate(top_10_nodes):
    pos = G.nodes[node]['pos']
    label = G.nodes[node]['label']
    ax.plot(pos[0]+pos_offsets[i][0], pos[1]+pos_offsets[i][1], 'ro')
    ax.text(pos[0]+text_offsets[i][0], pos[1]+text_offsets[i][1], label, fontsize=10, ha='left', va='center')

plt.title('Top 10 Nodes with Highest Degree in India')
plt.show()

The composition of the top 10 highest degree nodes in the railway network offers a glimpse into the pivotal role played by key railway stations across India. Among these, metropolitan giants such as Kolkata (Howrah Junction), Delhi(New Delhi, Delhi, H Nizamuddin), and Mumbai(Lokmayatilak, Mumbai CST), and Chennai naturally claim their positions, serving as bustling hubs that facilitate the movement of millions daily. Notably, Ahmedabad Junction's inclusion emphasises its status as a vital nexus in Western India's economic landscape. Chennai Central, too, emerges as a significant gateway to the southern region, cementing its importance in fostering regional connectivity. Further amplifying the network's breadth, stations like H. Nizamuddin in Delhi and Secunderabad Junction in Hyderabad emerge as crucial links, seamlessly connecting diverse regions. Similarly, Yesvantpur Junction's presence highlights Bangalore's pivotal role in the South's transportation network.

# Which places are this important nodes connected with

We notice from the above plot that there are more than one stations in few cities, and for some, all those stations are high degree nodes of our network. Let's take a look at the neighbours of these important nodes to see if we can make a comment on the distribution of load in our railways.

In [None]:
import cartopy.feature as cfeature

# Create a 3x4 subplot grid
fig, axs = plt.subplots(3, 4, figsize=(15, 13), subplot_kw={'projection': ccrs.PlateCarree()})

# Plot connections for each top 10 node and its neighbors
for i, node in enumerate(top_10_nodes):
      neighbors = list(G.neighbors(node))
      row = i // 4
      col = i % 4
      ax = axs[row, col]
      ax.set_title(G.nodes[node]['label'])
      pos = nx.get_node_attributes(G, 'pos')
      
      # Plot India map in the background
      ax.add_feature(cfeature.COASTLINE)
      ax.add_feature(cfeature.BORDERS, linestyle=':')
      ax.set_extent([65, 100, 5, 40])
      
      nx.draw_networkx_nodes(G, pos, nodelist=[node], node_color='red', node_size=50, ax=ax)
      for neighbor in neighbors:
          nx.draw_networkx_nodes(G, pos, nodelist=[neighbor], node_color='blue', node_size=node_degrees[neighbor]*2, ax=ax)
          nx.draw_networkx_edges(G, pos, edgelist=[(node, neighbor)], width=1, alpha=0.7, ax=ax)

# Remove unnecessary subplots
for i in range(10, 12):
    fig.delaxes(axs.flatten()[i])

plt.tight_layout()
plt.show()

In [None]:
import cartopy.feature as cfeature

G.nodes[top_10_nodes[-3]]["pos"] = (81.087883, 25.942377)

# Create a 3x4 subplot grid
fig, axs = plt.subplots(3, 4, figsize=(15, 13), subplot_kw={'projection': ccrs.PlateCarree()})

# Plot connections for each top 10 node and its neighbors
for i, node in enumerate(top_10_nodes):
      neighbors = list(G.neighbors(node))
      row = i // 4
      col = i % 4
      ax = axs[row, col]
      ax.set_title(G.nodes[node]['label'])
      pos = nx.get_node_attributes(G, 'pos')
      
      # Plot India map in the background
      ax.add_feature(cfeature.COASTLINE)
      ax.add_feature(cfeature.BORDERS, linestyle=':')
      ax.set_extent([65, 100, 5, 40])
      
      nx.draw_networkx_nodes(G, pos, nodelist=[node], node_color='red', node_size=50, ax=ax)
      for neighbor in neighbors:
          nx.draw_networkx_nodes(G, pos, nodelist=[neighbor], node_color='blue', node_size=node_degrees[neighbor]*2, ax=ax)
          nx.draw_networkx_edges(G, pos, edgelist=[(node, neighbor)], width=1, alpha=0.7, ax=ax)

G.nodes[top_10_nodes[-3]]["pos"] = (77.1647, 28.5861)
node = top_10_nodes[-3]
neighbors = list(G.neighbors(node))
ax = axs[2,2]
ax.set_title("H Nizamuddin (fixed)")
pos = nx.get_node_attributes(G, 'pos')
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
ax.set_extent([65, 100, 5, 40])
nx.draw_networkx_nodes(G, pos, nodelist=[node], node_color='red', node_size=50, ax=ax)
for neighbor in neighbors:
    nx.draw_networkx_nodes(G, pos, nodelist=[neighbor], node_color='blue', node_size=node_degrees[neighbor]*2, ax=ax)
    nx.draw_networkx_edges(G, pos, edgelist=[(node, neighbor)], width=1, alpha=0.7, ax=ax)

# Remove unnecessary subplots
fig.delaxes(axs.flatten()[11])

plt.tight_layout()
plt.show()

Our hypthesis was on the right track. Cities have multiple stations to reduce the load of handling multiple requests (here, railway routes).  
In case of Mumbai, there are two stations - LokmanyaTilak and Mumbai Cst. From the plots we can infer that most of the far distance railway routes pass through LT station whereas MCST, which does have few long distance plans, mostly deals with relatively closer cities.
Delhi has three stations - New Delhi, Delhi and H Nizamuddin. It was unclear due to the wrong position of HN station but on fixing its place on the map the load distribution becomes pretty evident for trains passing through this metropolitan centre; the HN station mostly deal with the North-South corridor of railways, spanning down till the southern tip, while the Delhi station serves the East-West corridor, spanning from Gujarat to Arunachal Pradesh. The New Delhi station's reach is more spread out; it is not focused on any region.
In southern India, the Secunderabad Junction has relatively lower connections to distant nodes, implying it usually only deals with intra-state and neighbouring state stations. On the other hand, Chennai Central and Yesvantpur Junction (Bangalore) have many rail routes extending to different corners of the country. This cements these centers as important hubs of our economy.
Howrah unlike other centres has only one station. This means most rail routes from other states to West Bengal usually go through this station. Since only one station caters to so many requests it is no surprise as to why this station has an abnormally higher degree than any other node in our network.

# Sparsely Connected Stations

Having observed the more prevalant stations in our country let's now take a glance as some sparsely visited stations and to whom they are connected with. We will see if we can see some formation of hubs for this.

In [None]:
# Define the color range for neighbors based on their degree ranges
degree_ranges = [
    (2, 5),
    (6, 10),
    (11, 15),
    (16, 20),
    (21, 25),
    (26, 30),
    (31, 35)
]

color_map = {}
colors = plt.cm.RdYlBu(np.linspace(0, 1, len(degree_ranges)))

for i, (start, end) in enumerate(degree_ranges):
    for degree in range(start, end + 1):
        color_map[degree] = colors[i]

# Plot the map of India
plt.figure(figsize=(10, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_extent([68, 98, 6, 34])
ax.coastlines()

# Plot degree-1 nodes in blue
for node in nodes_with_degree_1:
    pos = G.nodes[node]['pos']
    ax.plot(pos[0], pos[1], 'go', markersize=5)

# Plot neighbors of degree-1 nodes
for node in nodes_with_degree_1:
    pos = G.nodes[node]['pos']
    neighbors = list(G.neighbors(node))
    for neighbor in neighbors:
        pos_neighbor = G.nodes[neighbor]['pos']
        if neighbor not in nodes_with_degree_1:
            # Get color based on neighbor's degree
            neighbor_degree = dict(G.degree())[neighbor]
            color = color_map.get(neighbor_degree, 'black')
            ax.plot(pos_neighbor[0], pos_neighbor[1], 'o', color=color, markersize=5)

        # Draw an edge between the degree-1 node and its neighbor
        ax.plot([pos[0], pos_neighbor[0]], [pos[1], pos_neighbor[1]], 'k-', alpha=0.1)

# Create custom legend
legend_handles = [mpatches.Patch(color='green', label='Degree-1 Nodes')]
for start, end in degree_ranges:
    color = colors[(start - 1) // 5]  # Determine the color for the degree range
    label = f'Degree {start}-{end} Nodes'
    legend_handles.append(mpatches.Patch(color=color, label=label))

# Add legend to the plot
legend_handles.append(mpatches.Patch(color='black', label='Degree > 35 nodes'))
plt.legend(handles=legend_handles, loc='lower right')

plt.title('Nodes with Degree 1 and their Neighbors in India')
plt.show()

In [None]:
# Get the connected components
connected_components = list(nx.connected_components(G))

# Create a 9x3 subplot grid
fig, axs = plt.subplots(9, 3, figsize=(20, 30), subplot_kw={'projection': ccrs.PlateCarree()})

# Flatten the subplot grid
axs = axs.flatten()

# Setting manual margins
margins = [4, 0, 3.2, 2.8, 2.5, 4, 4, 2.8, 3.2, 2.5, 2.5, 2.5, 2, 4, 3.2, 4, 4, 3.3, 3.2, 2.5, 2, 3.5, 2.8, 2, 4, 4, 3.1]

# Plot each connected component separately
for i, component in enumerate(connected_components):
    ax = axs[i]
    if len(component) <= 10:
        # If the component has less than or equal to 10 nodes, display their names, increase node size, and darken edge color
        for node in component:
            pos = G.nodes[node]['pos']
            name = G.nodes[node]['label']
            ax.text(pos[0], pos[1], name, fontsize=8, ha='center', transform=ccrs.PlateCarree())
        nx.draw_networkx_nodes(G.subgraph(component), pos=nx.get_node_attributes(G, 'pos'), ax=ax, node_color='skyblue', node_size=50, alpha=0.8)
        nx.draw_networkx_edges(G.subgraph(component), pos=nx.get_node_attributes(G, 'pos'), ax=ax, edge_color='black', alpha=0.8)
        
        # Calculate the extent based on node positions with a slightly increased margin
        min_lon = min(pos[0] for node, pos in nx.get_node_attributes(G, 'pos').items() if node in component)
        max_lon = max(pos[0] for node, pos in nx.get_node_attributes(G, 'pos').items() if node in component)
        min_lat = min(pos[1] for node, pos in nx.get_node_attributes(G, 'pos').items() if node in component)
        max_lat = max(pos[1] for node, pos in nx.get_node_attributes(G, 'pos').items() if node in component)
        
        margin = margins[i]
        ax.set_extent([min_lon - margin, max_lon + margin, min_lat - margin, max_lat + margin], crs=ccrs.PlateCarree())
        
    else:
        # If the component has more than 10 nodes, plot it with a unique color
        subgraph = G.subgraph(component)
        pos = {node: G.nodes[node]['pos'] for node in subgraph.nodes}
        nx.draw(subgraph, pos, ax=ax, with_labels=False, node_color='skyblue', node_size=10, edge_color='gray')

    # Plot Indian map in each subplot
    ax.add_feature(cfeature.BORDERS, linestyle=':', alpha=0.5)
    ax.add_feature(cfeature.COASTLINE)
    
    # Add state boundaries
    states_provinces = cfeature.NaturalEarthFeature(category='cultural', name='admin_1_states_provinces_lines', scale='10m', facecolor='none')
    ax.add_feature(states_provinces, edgecolor='gray', linestyle='--')
    
    ax.set_title(f"Connected Component {i+1}")

# Hide empty subplots
for ax in axs[len(connected_components):]:
    ax.axis('off')

plt.tight_layout()
plt.show()

Well from the above plot it is clear that there is only one Giant Connected Cluster in the railway network, as one would think. All the other disconnection from these are either stations thorugh which train don't pass, or some small route going between handful of stations.  
Come to think of reasons for the later, it could be due to the connected component being present in some region slightly cut off or inaccessible from the main subnetwork. For some places like a high plateau or a mountain range it would be hard to set up rails to them, but there could be intra-region railways there (like toy trains we have heard of). And other possible reason is that for some sparsely populated areas there won't be good cost-outcome tradeoff for setting railway connections to some more populated centres. In this areas other means of travel like bus would have been found to be more economical, hence they were probably cut-off from the main subnetwork. But there could be some up-down shift between such cities (like Jamner and Panchora), and thus a train must commute between them for easy travel for the residents of the city.

In [None]:
# Get the number of nodes in the graph
num_nodes = len(GCC.nodes())
print(num_nodes)

# Initialize the distance matrix with infinity values
dist_matrix = np.full((num_nodes, num_nodes), np.inf)

# Set the diagonal elements to 0
np.fill_diagonal(dist_matrix, 0)

# Update the distance matrix with unit distance for all edges
for u, v in GCC.edges():
    u_index = list(GCC.nodes()).index(u)
    v_index = list(GCC.nodes()).index(v)
    dist_matrix[u_index, v_index] = 1
    dist_matrix[v_index, u_index] = 1

# Implement the Floyd-Warshall algorithm
for k in range(num_nodes):
    for i in range(num_nodes):
        for j in range(num_nodes):
            dist_matrix[i, j] = min(dist_matrix[i, j], dist_matrix[i, k] + dist_matrix[k, j])

# Store the result appropriately (for example, as a dictionary of dictionaries)
all_pairs_shortest_paths = {}
for i, node_i in enumerate(GCC.nodes()):
    all_pairs_shortest_paths[node_i] = {}
    for j, node_j in enumerate(GCC.nodes()):
        all_pairs_shortest_paths[node_i][node_j] = dist_matrix[i, j]

# Convert the distance matrix to a dictionary of Pandas Series
all_pairs_shortest_paths = {}
for node_i in GCC.nodes():
    distances = {GCC.nodes[node_j]['label']: dist_matrix[i, j] for j, node_j in enumerate(GCC.nodes())}
    distances_series = pd.Series(distances)
    all_pairs_shortest_paths[GCC.nodes[node_i]['label']] = distances_series

# Create a DataFrame from the dictionary
df_shortest_paths = pd.DataFrame(all_pairs_shortest_paths)

# Display the DataFrame
print("All pairs shortest paths:")
print(df_shortest_paths)