<a href="https://colab.research.google.com/github/1070rahul/1070rahul/blob/main/Delhi_Metro_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Metro network analysis involves examining the network of metro systems to understand their structure, efficiency, and effeictiveness. It typically includes analyzing routes, stations, trafficm connectivity, and other operational aspects.

# Delhi Metro Network Analysis

Analyzing the metro network in a city like Delhi helps improve urban transportation infrastructure, leading to better city planning and enhanced commuter experiences. Below is the process we can follow for the task of Metro Network Analysis of Delhi:

1. Determine what you want to achieve. It could be optimizing routes, reducing congestion, improving passenger flow, or understanding travel patterns.
2. Collect data on metro lines, stations, connections, and transit schedules.
3. Clean the data for inconsistencies, missing values, or errors.
4. Create visual representations of the network, such as route maps, passenger flow charts, or heat maps of station congestion.
5. Analyze how effectively the network handles passenger traffic and meets operational targets.

# Import desired libraries


In [2]:
import pandas as pd
import numpy as np
import folium
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default = 'plotly_white'

# load the dataset

In [4]:
metro_data = pd.read_csv('/content/Delhi-Metro-Network.csv')

In [20]:
metro_data.head()

Unnamed: 0,Station ID,Station Name,Distance from Start (km),Line,Opening Date,Station Layout,Latitude,Longitude
0,1,Jhil Mil,10.3,Red line,2008-04-06,Elevated,28.67579,77.31239
1,2,Welcome [Conn: Red],46.8,Pink line,2018-10-31,Elevated,28.6718,77.27756
2,3,DLF Phase 3,10.0,Rapid Metro,2013-11-14,Elevated,28.4936,77.0935
3,4,Okhla NSIC,23.8,Magenta line,2017-12-25,Elevated,28.554483,77.264849
4,5,Dwarka Mor,10.2,Blue line,2005-12-30,Elevated,28.61932,77.03326


In [6]:
metro_data.shape

(285, 8)

In [7]:
metro_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 285 entries, 0 to 284
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Station ID                285 non-null    int64  
 1   Station Name              285 non-null    object 
 2   Distance from Start (km)  285 non-null    float64
 3   Line                      285 non-null    object 
 4   Opening Date              285 non-null    object 
 5   Station Layout            285 non-null    object 
 6   Latitude                  285 non-null    float64
 7   Longitude                 285 non-null    float64
dtypes: float64(3), int64(1), object(4)
memory usage: 17.9+ KB


In [8]:
metro_data.dtypes

Station ID                    int64
Station Name                 object
Distance from Start (km)    float64
Line                         object
Opening Date                 object
Station Layout               object
Latitude                    float64
Longitude                   float64
dtype: object

Here Opening date column is in object datatype. It should be in datetime datatype

In [9]:
metro_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Station ID,285.0,143.0,82.416625,1.0,72.0,143.0,214.0,285.0
Distance from Start (km),285.0,19.218947,14.002862,0.0,7.3,17.4,28.8,52.7
Latitude,285.0,28.595428,0.091316,27.920862,28.545828,28.613453,28.66636,28.878965
Longitude,285.0,77.029315,2.8754,28.698807,77.10713,77.20722,77.281165,77.554479


Now, let’s have a look at whether the dataset has any null values

In [10]:
# checking the null values
metro_data.isnull().sum()

Station ID                  0
Station Name                0
Distance from Start (km)    0
Line                        0
Opening Date                0
Station Layout              0
Latitude                    0
Longitude                   0
dtype: int64

In [13]:
# converting Opening date to a datetime format

metro_data['Opening Date'] = pd.to_datetime(metro_data['Opening Date'])
metro_data.dtypes

Station ID                           int64
Station Name                        object
Distance from Start (km)           float64
Line                                object
Opening Date                datetime64[ns]
Station Layout                      object
Latitude                           float64
Longitude                          float64
dtype: object

# Geospatial Analysis

Now, Let's start by visualizing the locations of the metro stations on a map. It will give us an insight into the geographical distribution of the stations across Delhi. We will use the latitude and longitude data to plot each station

For this, I’ll create a map with markers for each metro station. Each marker will represent a station, and we’ll be able to analyze aspects like station density and geographic spread. Let’s proceed with this visualization:

In [14]:
metro_data.columns

Index(['Station ID', 'Station Name', 'Distance from Start (km)', 'Line',
       'Opening Date', 'Station Layout', 'Latitude', 'Longitude'],
      dtype='object')

In [17]:
metro_data.Line.unique()

array(['Red line', 'Pink line', 'Rapid Metro', 'Magenta line',
       'Blue line', 'Aqua line', 'Voilet line', 'Yellow line',
       'Green line', 'Gray line', 'Orange line', 'Green line branch',
       'Blue line branch'], dtype=object)

In [18]:
# defining color scheme for the metro lines

line_colors = {
    'Red line' : 'red', 'Pink line': 'pink', 'Rapid Metro':'cadetblue',
    'Magenta line': 'darkblue','Blue line': 'blue',
    'Aqua line': 'black', 'Voilet line':'purple', 'Yellow line':'beige',
    'Green line':'green', 'Gray line':'lightgray', 'Orange line':'oragne',
    'Green line branch':'lightgreen',
    'Blue line branch':'lightblue'

}

In [22]:
map_with_tooltip = folium.Map(loacation = [28.7041, 77.1025], zoom_start = 11)

# adding colored markers for each metro stations with line name in tootip

for index, row in metro_data.iterrows():
  line = row['Line']
  color = line_colors.get(line, 'black') # default color is black if line not found in the dict.
  folium.Marker(
      location = [row['Latitude'], row['Longitude']],
      popup = f"{row['Station Name']}, {line}",
      icon = folium.Icon(color=color)
  ).add_to(map_with_tooltip)

# Displaying the updated map
map_with_tooltip








color argument of Icon should be one of: {'pink', 'darkred', 'green', 'gray', 'black', 'darkpurple', 'beige', 'darkgreen', 'orange', 'red', 'lightblue', 'white', 'darkblue', 'cadetblue', 'blue', 'lightgray', 'purple', 'lightgreen', 'lightred'}.



The map above shows the geographical distribution of Delhi Metro Stations.Each marker represents a metro station. If you click on the markers you can see the station name and metro line it belongs to. This map provides a visual understanding of how he metro stations are spread across Delhi.

# Temporal Analysis

Let's analyze the growth of the Delhi Metro Network over time. I'll look at how many stations were opened each year and visualize this growth.
Let's start by extracting the year from the Opening Year from the Opening Date and then count the number of stations opened each year. Finally i will plot the information.

In [23]:
metro_data.columns

Index(['Station ID', 'Station Name', 'Distance from Start (km)', 'Line',
       'Opening Date', 'Station Layout', 'Latitude', 'Longitude'],
      dtype='object')

In [24]:
# create a new column Opening Year from Opening Date
metro_data['Opening Year'] = metro_data['Opening Date'].dt.year

In [25]:
metro_data.head()

Unnamed: 0,Station ID,Station Name,Distance from Start (km),Line,Opening Date,Station Layout,Latitude,Longitude,Opening Year
0,1,Jhil Mil,10.3,Red line,2008-04-06,Elevated,28.67579,77.31239,2008
1,2,Welcome [Conn: Red],46.8,Pink line,2018-10-31,Elevated,28.6718,77.27756,2018
2,3,DLF Phase 3,10.0,Rapid Metro,2013-11-14,Elevated,28.4936,77.0935,2013
3,4,Okhla NSIC,23.8,Magenta line,2017-12-25,Elevated,28.554483,77.264849,2017
4,5,Dwarka Mor,10.2,Blue line,2005-12-30,Elevated,28.61932,77.03326,2005


In [35]:
metro_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 285 entries, 0 to 284
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Station ID                285 non-null    int64         
 1   Station Name              285 non-null    object        
 2   Distance from Start (km)  285 non-null    float64       
 3   Line                      285 non-null    object        
 4   Opening Date              285 non-null    datetime64[ns]
 5   Station Layout            285 non-null    object        
 6   Latitude                  285 non-null    float64       
 7   Longitude                 285 non-null    float64       
 8   Opening Year              285 non-null    int64         
dtypes: datetime64[ns](1), float64(3), int64(2), object(3)
memory usage: 20.2+ KB


In [29]:
# counting the number of stations opened each year
stations_per_year = metro_data['Opening Year'].value_counts().sort_index()
stations_per_year

2002     6
2003     4
2004    11
2005    28
2006     9
2008     3
2009    17
2010    54
2011    13
2013     5
2014     3
2015    13
2017    18
2018    64
2019    37
Name: Opening Year, dtype: int64

In [30]:
# create a dataframe.
stations_per_year_df = stations_per_year.reset_index()
stations_per_year_df

Unnamed: 0,index,Opening Year
0,2002,6
1,2003,4
2,2004,11
3,2005,28
4,2006,9
5,2008,3
6,2009,17
7,2010,54
8,2011,13
9,2013,5


In [32]:
# Giving names to the columns  in stations_per_year_df
stations_per_year_df.columns = ['Year', 'Number of Stations']
stations_per_year_df

Unnamed: 0,Year,Number of Stations
0,2002,6
1,2003,4
2,2004,11
3,2005,28
4,2006,9
5,2008,3
6,2009,17
7,2010,54
8,2011,13
9,2013,5


In [40]:
# create a bar plot

fig = px.bar(stations_per_year_df, x = 'Year', y = 'Number of Stations',
             title = "Number of Stations Opened Each year in Delhi",
             labels = {'Year': 'Year', 'Number of Stations':'Number of Stations Opened'})

fig.update_layout(xaxis_tickangle = -45, xaxis = dict(tickmode = 'linear'),
                  yaxis = dict(title = 'Number of Stations Opened'),
                  xaxis_title = "Year"
                  )
fig.show()


##  Observations:
1. Some years show a significant number of new stations openings. Indicating phases of rapid network expansion.
2. 2018 was the year with most stations opened
3. There are years with few or no new stations.

#Line Analysis

Let's now analyze the various metro lines in terms of the number of stations they have and the avg distance between stations.

In [42]:
stations_per_line = metro_data['Line'].value_counts().sort_index()
stations_per_line

Aqua line            21
Blue line            49
Blue line branch      8
Gray line             3
Green line           21
Green line branch     3
Magenta line         25
Orange line           6
Pink line            38
Rapid Metro          11
Red line             29
Voilet line          34
Yellow line          37
Name: Line, dtype: int64

In [45]:
metro_data.columns

Index(['Station ID', 'Station Name', 'Distance from Start (km)', 'Line',
       'Opening Date', 'Station Layout', 'Latitude', 'Longitude',
       'Opening Year'],
      dtype='object')

In [47]:
# calculating the total distance of each metro line(max distance from start)
total_distance_per_line = metro_data.groupby('Line')['Distance from Start (km)'].max()
total_distance_per_line


Line
Aqua line            27.1
Blue line            52.7
Blue line branch      8.1
Gray line             3.9
Green line           24.8
Green line branch     2.1
Magenta line         33.1
Orange line          20.8
Pink line            52.6
Rapid Metro          10.0
Red line             32.7
Voilet line          43.5
Yellow line          45.7
Name: Distance from Start (km), dtype: float64

In [49]:
# average distance
avg_distance_per_line = total_distance_per_line/(stations_per_line -1)

line_analysis = pd.DataFrame({
    'Line': stations_per_line.index,
    'Number of Stations': stations_per_line.values,
    'Average Distance Between Stations(km)': avg_distance_per_line
})

In [50]:
line_analysis.head()

Unnamed: 0_level_0,Line,Number of Stations,Average Distance Between Stations(km)
Line,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aqua line,Aqua line,21,1.355
Blue line,Blue line,49,1.097917
Blue line branch,Blue line branch,8,1.157143
Gray line,Gray line,3,1.95
Green line,Green line,21,1.24


In [52]:
# sorting the dataframe by the numbre of stations
line_analysis = line_analysis.sort_values(by = 'Number of Stations', ascending = False)
line_analysis

Unnamed: 0_level_0,Line,Number of Stations,Average Distance Between Stations(km)
Line,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Blue line,Blue line,49,1.097917
Pink line,Pink line,38,1.421622
Yellow line,Yellow line,37,1.269444
Voilet line,Voilet line,34,1.318182
Red line,Red line,29,1.167857
Magenta line,Magenta line,25,1.379167
Aqua line,Aqua line,21,1.355
Green line,Green line,21,1.24
Rapid Metro,Rapid Metro,11,1.0
Blue line branch,Blue line branch,8,1.157143


In [53]:
line_analysis.reset_index(drop=True, inplace=True)
line_analysis

Unnamed: 0,Line,Number of Stations,Average Distance Between Stations(km)
0,Blue line,49,1.097917
1,Pink line,38,1.421622
2,Yellow line,37,1.269444
3,Voilet line,34,1.318182
4,Red line,29,1.167857
5,Magenta line,25,1.379167
6,Aqua line,21,1.355
7,Green line,21,1.24
8,Rapid Metro,11,1.0
9,Blue line branch,8,1.157143


## Observations:
1. Blue line has the highest numbr of station with the average distance of 1.1 km between them
2. Green line and gray line has the lowest numbr of stations.


To understand better let's visualize them
i will create two plots: one for the number of stations per line and another for the average distance between stations. It will provide a comparative view of the metro lines:

In [64]:
# creating subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Number of Stations Per Metro Line',
                                                    'Average Distance Between Stations Per Metro Line'),
                    horizontal_spacing=0.2)

# plot for Number of Stations per Line
fig.add_trace(
    go.Bar(y=line_analysis['Line'], x=line_analysis['Number of Stations'],
           orientation='h', name='Number of Stations', marker_color='crimson'),
    row=1, col=1
)

# plot for Average Distance Between Stations
fig.add_trace(
    go.Bar(y=line_analysis['Line'], x=line_analysis['Average Distance Between Stations(km)'],
           orientation='h', name='Average Distance (km)', marker_color='navy'),
    row=1, col=2
)

# update xaxis properties
fig.update_xaxes(title_text="Number of Stations", row=1, col=1)
fig.update_xaxes(title_text="Average Distance Between Stations (km)", row=1, col=2)

# update yaxis properties
fig.update_yaxes(title_text="Metro Line", row=1, col=1)
fig.update_yaxes(title_text="", row=1, col=2)

# update layout
fig.update_layout(height=600, width=1200, title_text="Metro Line Analysis", template="plotly_white")


# Station Layout Analysis


Let's explore the station Layouts. I will analyze the distribution of these layouts acros the network.

In [65]:
metro_data.head()

Unnamed: 0,Station ID,Station Name,Distance from Start (km),Line,Opening Date,Station Layout,Latitude,Longitude,Opening Year
0,1,Jhil Mil,10.3,Red line,2008-04-06,Elevated,28.67579,77.31239,2008
1,2,Welcome [Conn: Red],46.8,Pink line,2018-10-31,Elevated,28.6718,77.27756,2018
2,3,DLF Phase 3,10.0,Rapid Metro,2013-11-14,Elevated,28.4936,77.0935,2013
3,4,Okhla NSIC,23.8,Magenta line,2017-12-25,Elevated,28.554483,77.264849,2017
4,5,Dwarka Mor,10.2,Blue line,2005-12-30,Elevated,28.61932,77.03326,2005


In [66]:
layout_counts = metro_data['Station Layout'].value_counts().sort_index()

In [67]:
layout_counts

At-Grade         3
Elevated       214
Underground     68
Name: Station Layout, dtype: int64

In [70]:
# creating the barplot usig plotly
fig = px.bar(x=layout_counts.index, y=layout_counts.values,
             labels = {'x':'Station Layout','y':'Number of Stations'},
             title = 'Distribution of Metro Layouts',
             color = layout_counts.index,
            )

fig.update_layout(xaxis_title = "Station Layout",
                  yaxis_title = "Number of Stations",
                  coloraxis_showscale = False,
                  template="plotly_white")
fig.show()

## Observations:
1. The majority of the stations are Elevated.