
# **Challenge: Metro Network Analysis: Case Study**



**Tasks:**



1.   Map the stations to visualize the coverage and distribution of the metro network across Delhi.
2.   Examine characteristics of different metro lines, including station count and average distances between stations.
3. Analyze the types of station layouts and their distribution across the network.
4. Draw statistical correlations and insights, such as the relationship between station layout and distance from the city centre.







**Importing dataset and necessary libraries**

In [2]:
from google.colab import files
uploaded=files.upload()

Saving Delhi-Metro-Network.csv to Delhi-Metro-Network (1).csv


In [3]:
import pandas as pd

In [4]:
metro_data=pd.read_csv("Delhi-Metro-Network.csv")

In [5]:
#Checking dataset

metro_data.head()

Unnamed: 0,Station ID,Station Name,Distance from Start (km),Line,Opening Date,Station Layout,Latitude,Longitude
0,1,Jhil Mil,10.3,Red line,2008-04-06,Elevated,28.67579,77.31239
1,2,Welcome [Conn: Red],46.8,Pink line,2018-10-31,Elevated,28.6718,77.27756
2,3,DLF Phase 3,10.0,Rapid Metro,2013-11-14,Elevated,28.4936,77.0935
3,4,Okhla NSIC,23.8,Magenta line,2017-12-25,Elevated,28.554483,77.264849
4,5,Dwarka Mor,10.2,Blue line,2005-12-30,Elevated,28.61932,77.03326


The dataset cntains following information:

Station Information: Names and IDs of metro stations.

Geographical Coordinates: Latitude and longitude of each station.

Line Information: The specific metro line each station belongs to.

Distance Data: The distance of each station from the start of its line.

Station Layout: Type of station layout (e.g., Elevated, Underground, At-Grade).

Opening Date: Date of inauguration of each station.

In [6]:
#Checking for null values

null_values_count=metro_data.isnull().sum()
null_values_count

Station ID                  0
Station Name                0
Distance from Start (km)    0
Line                        0
Opening Date                0
Station Layout              0
Latitude                    0
Longitude                   0
dtype: int64

In [7]:
# Checking datatype of each column

metro_data.dtypes

Station ID                    int64
Station Name                 object
Distance from Start (km)    float64
Line                         object
Opening Date                 object
Station Layout               object
Latitude                    float64
Longitude                   float64
dtype: object

In [8]:
# Converting Date column in datetime datatype

metro_data['Opening Date']=pd.to_datetime(metro_data['Opening Date'])

In [9]:
metro_data.dtypes

Station ID                           int64
Station Name                        object
Distance from Start (km)           float64
Line                                object
Opening Date                datetime64[ns]
Station Layout                      object
Latitude                           float64
Longitude                          float64
dtype: object

In [10]:
# Checking unique values

metro_data['Line'].unique()

array(['Red line', 'Pink line', 'Rapid Metro', 'Magenta line',
       'Blue line', 'Aqua line', 'Voilet line', 'Yellow line',
       'Green line', 'Gray line', 'Orange line', 'Green line branch',
       'Blue line branch'], dtype=object)

**Analysis & Visualisation**

In [14]:
# Importing folium library

import folium

In [15]:
delhi_map=folium.Map(location=[28.7041, 77.1025], zoom_start=11)

In [16]:
# Assigning color code for each metro line

line_colors = {
    'Red line': 'red',
    'Blue line': 'blue',
    'Yellow line': 'beige',
    'Green line': 'green',
    'Voilet line': 'purple',
    'Pink line': 'pink',
    'Magenta line': 'darkred',
    'Orange line': 'orange',
    'Rapid Metro': 'cadetblue',
    'Aqua line': 'black',
    'Green line branch': 'lightgreen',
    'Blue line branch': 'lightblue',
    'Gray line': 'lightgray'
}

In [17]:
# Visualising spread of metro lines on map

for index, row in metro_data.iterrows():
    line=row['Line']
    color=line_colors.get(line,'black')
    folium.Marker(
        location=[row['Latitude'],row['Longitude']],
        popup=f"{row['Station Name']}",
        tooltip=f"{row['Station Name'],line}",
        icon=folium.Icon(color=color)
    ).add_to(delhi_map)

In [18]:
delhi_map

In [19]:
metro_data.head()

Unnamed: 0,Station ID,Station Name,Distance from Start (km),Line,Opening Date,Station Layout,Latitude,Longitude
0,1,Jhil Mil,10.3,Red line,2008-04-06,Elevated,28.67579,77.31239
1,2,Welcome [Conn: Red],46.8,Pink line,2018-10-31,Elevated,28.6718,77.27756
2,3,DLF Phase 3,10.0,Rapid Metro,2013-11-14,Elevated,28.4936,77.0935
3,4,Okhla NSIC,23.8,Magenta line,2017-12-25,Elevated,28.554483,77.264849
4,5,Dwarka Mor,10.2,Blue line,2005-12-30,Elevated,28.61932,77.03326


In [20]:
# Let' analyse the growth in number of stations per year

metro_data['Opening_year']=metro_data['Opening Date'].dt.year

In [21]:
metro_data.head()

Unnamed: 0,Station ID,Station Name,Distance from Start (km),Line,Opening Date,Station Layout,Latitude,Longitude,Opening_year
0,1,Jhil Mil,10.3,Red line,2008-04-06,Elevated,28.67579,77.31239,2008
1,2,Welcome [Conn: Red],46.8,Pink line,2018-10-31,Elevated,28.6718,77.27756,2018
2,3,DLF Phase 3,10.0,Rapid Metro,2013-11-14,Elevated,28.4936,77.0935,2013
3,4,Okhla NSIC,23.8,Magenta line,2017-12-25,Elevated,28.554483,77.264849,2017
4,5,Dwarka Mor,10.2,Blue line,2005-12-30,Elevated,28.61932,77.03326,2005


In [22]:
stations_per_year= metro_data['Opening_year'].value_counts().sort_index()
stations_per_year

2002     6
2003     4
2004    11
2005    28
2006     9
2008     3
2009    17
2010    54
2011    13
2013     5
2014     3
2015    13
2017    18
2018    64
2019    37
Name: Opening_year, dtype: int64

In [23]:
stations_per_year_df=stations_per_year.reset_index()
stations_per_year_df

Unnamed: 0,index,Opening_year
0,2002,6
1,2003,4
2,2004,11
3,2005,28
4,2006,9
5,2008,3
6,2009,17
7,2010,54
8,2011,13
9,2013,5


In [24]:
stations_per_year_df.columns=['Opening Year','Number of Stations']
stations_per_year_df

Unnamed: 0,Opening Year,Number of Stations
0,2002,6
1,2003,4
2,2004,11
3,2005,28
4,2006,9
5,2008,3
6,2009,17
7,2010,54
8,2011,13
9,2013,5


In [25]:
import plotly.express as px

In [48]:
fig=px.bar(stations_per_year_df,x="Opening Year", y="Number of Stations",
           title="Number of Metro stations opened each year in Delhi")

In [49]:
fig.update_layout(xaxis=dict(tickmode='linear'), yaxis_title='Number of Stations Opened',
                  xaxis_title="Year",width=800,height=400)
fig.show()

**Observations:**


1.   Some years show a significant number of new station openings, indicating phases of rapid network expansion.
2. Also there are years with few or no new stations, which could be due to various factors like planning, funding, or construction challenges.   





In [28]:
# Let's analyze the various metro lines in terms of the number of stations they have and the average distance between stations.

stations_per_line=metro_data['Line'].value_counts()
stations_per_line

Blue line            49
Pink line            38
Yellow line          37
Voilet line          34
Red line             29
Magenta line         25
Aqua line            21
Green line           21
Rapid Metro          11
Blue line branch      8
Orange line           6
Gray line             3
Green line branch     3
Name: Line, dtype: int64

In [29]:
total_distance_per_line= metro_data.groupby('Line')['Distance from Start (km)'].max()
total_distance_per_line

Line
Aqua line            27.1
Blue line            52.7
Blue line branch      8.1
Gray line             3.9
Green line           24.8
Green line branch     2.1
Magenta line         33.1
Orange line          20.8
Pink line            52.6
Rapid Metro          10.0
Red line             32.7
Voilet line          43.5
Yellow line          45.7
Name: Distance from Start (km), dtype: float64

In [30]:
avg_distance_per_line=total_distance_per_line/(stations_per_line-1)
avg_distance_per_line

Aqua line            1.355000
Blue line            1.097917
Blue line branch     1.157143
Gray line            1.950000
Green line           1.240000
Green line branch    1.050000
Magenta line         1.379167
Orange line          4.160000
Pink line            1.421622
Rapid Metro          1.000000
Red line             1.167857
Voilet line          1.318182
Yellow line          1.269444
dtype: float64

In [31]:
avg_dist_analysis=pd.DataFrame({'Line': stations_per_line.index,'Number of Stations': stations_per_line.values,
                               'Avg Distance (Km)': avg_distance_per_line})

avg_dist_analysis

Unnamed: 0,Line,Number of Stations,Avg Distance (Km)
Aqua line,Blue line,49,1.355
Blue line,Pink line,38,1.097917
Blue line branch,Yellow line,37,1.157143
Gray line,Voilet line,34,1.95
Green line,Red line,29,1.24
Green line branch,Magenta line,25,1.05
Magenta line,Aqua line,21,1.379167
Orange line,Green line,21,4.16
Pink line,Rapid Metro,11,1.421622
Rapid Metro,Blue line branch,8,1.0


In [42]:
avg_dist_analysis.reset_index(drop=True, inplace=True)
print(avg_dist_analysis)

                 Line  Number of Stations  Avg Distance (Km)
0           Blue line                  49           1.355000
1           Pink line                  38           1.097917
2         Yellow line                  37           1.157143
3         Voilet line                  34           1.950000
4            Red line                  29           1.240000
5        Magenta line                  25           1.050000
6           Aqua line                  21           1.379167
7          Green line                  21           4.160000
8         Rapid Metro                  11           1.421622
9    Blue line branch                   8           1.000000
10        Orange line                   6           1.167857
11          Gray line                   3           1.318182
12  Green line branch                   3           1.269444


In [43]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [44]:
fig=make_subplots(rows=1, cols=2,
                  subplot_titles=("Number of Stations Per Metro Line"," Average Distance Between Stations Per Metro Line "),
                  horizontal_spacing=0.3)

In [45]:
fig.add_trace(go.Bar(y=avg_dist_analysis['Line'],x=avg_dist_analysis['Number of Stations'], orientation='h', name='Number of Stations',
                     marker_color='crimson'), row=1, col=1)



fig.add_trace(go.Bar(y=avg_dist_analysis['Line'],x=avg_dist_analysis['Avg Distance (Km)'], orientation='h', name='Average Distance (km)',
                     marker_color='navy'), row=1, col=2)


fig.update_xaxes(title='Number of Stations', row=1, col=1)
fig.update_xaxes(title='Average Distance Between Stations (km)', row=1, col=2)

fig.update_yaxes(title='Metro Line', row=1, col=1)
fig.update_yaxes(title=' ', row=1, col=2)


fig.update_layout(height=600, width=1000, title="Delhi Metro Line Analysis", template='plotly_white')

fig.show()


**Observations:**



1.   Blue Line has higher number of stations, while Green Line branch has lowest number.
2.   Green line has the largest avg distance between stations.



In [46]:
# Let's analyse the distribution of different station layouts

layout_count=metro_data['Station Layout'].value_counts()
layout_count

Elevated       214
Underground     68
At-Grade         3
Name: Station Layout, dtype: int64

In [50]:
fig=px.bar(x= layout_count.index, y=layout_count.values, labels={'x': 'Station Layout', 'y':'Number of Stations'},
           title='Distribution of Station Layouts', color=layout_count.index, color_continuous_scale='pastel',
           template='plotly_white',width=800,height=400 )


fig.show()


**Observations:**



1.   Elevated Stations: The majority of the stations are Elevated. It is a common design choice in urban areas to save space and reduce land acquisition issues.
2.   Underground Stations: The Underground stations are fewer compared to elevated ones.
3. At-Grade Stations: There are only a few At-Grade (ground level) stations, suggesting they are less common in the network.





