## Overview of the project
This project analyzes data from the Delhi Metro Rail Corporation(DMRC). It includes:
1. Data processing and cleaning
2. Exploratory data analysis
3. Visualizations
4. Insights into metro expansion trends.

## Objectives
1. Analyze the distribution and growth of metro stations over time.
2. Visualize metro station locations and lines on an interactive map
3. Provide actionable insights into metro development patterns

In [22]:
import pandas as pd # for data manipulation
import folium #for map visualization
import plotly.express as px # for visualization
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default="plotly_white"
metro_data = pd.read_csv("C:/Users/arpit/Desktop/Projects/DMRC_Analysis/Dataset/Delhi-Metro-Network.csv")
print(metro_data.head())

   Station ID         Station Name  Distance from Start (km)          Line  \
0           1             Jhil Mil                      10.3      Red line   
1           2  Welcome [Conn: Red]                      46.8     Pink line   
2           3          DLF Phase 3                      10.0   Rapid Metro   
3           4           Okhla NSIC                      23.8  Magenta line   
4           5           Dwarka Mor                      10.2     Blue line   

  Opening Date Station Layout   Latitude  Longitude  
0   2008-04-06       Elevated  28.675790  77.312390  
1   2018-10-31       Elevated  28.671800  77.277560  
2   2013-11-14       Elevated  28.493600  77.093500  
3   2017-12-25       Elevated  28.554483  77.264849  
4   2005-12-30       Elevated  28.619320  77.033260  


In [23]:
# checking for missing values
missing_values=metro_data.isnull().sum()
missing_values

Station ID                  0
Station Name                0
Distance from Start (km)    0
Line                        0
Opening Date                0
Station Layout              0
Latitude                    0
Longitude                   0
dtype: int64

In [24]:
# checking data types:
data_types=metro_data.dtypes
data_types

Station ID                    int64
Station Name                 object
Distance from Start (km)    float64
Line                         object
Opening Date                 object
Station Layout               object
Latitude                    float64
Longitude                   float64
dtype: object

In [25]:
#converting opening_date to datetime format
metro_data['Opening Date']=pd.to_datetime(metro_data['Opening Date'])

## Geospatial Analysis:
Visualizing the locations of metro station on a map. It will provide the geographical distribution of the stations across Delhi using plot markers for each station.
The latitude and logitude information will be used for plotting each station.
Color code markers will show the distribution in map.

In [26]:
#Defining colors for metro lines:
line_colors={
    'Red line':'red',
    'Blue line':'blue',
    'Yellow line':'beige',
    'Green line':'green',
    'Violet line':'purple',
    'Pink line':'pink',
    'Magenta line': 'darkred',
    'Orange line':'orange',
    'Rapid metro':'cadetblue',
    'Aqua line':'black',
    'Green line branch':'lightgreen',
    'Blue line branch':'lightblue',
    'Gray line': 'lightgray'
}
MAP_LOCATION=[28.7041, 77.1025]
ZOOM_START=11
delhi_map_with_line_tooltip = folium.Map(location=MAP_LOCATION, zoom_start=ZOOM_START)

# adding colored markers for each metro station with line name in tooltip
for index, row in metro_data.iterrows():
    line = row['Line']
    color = line_colors.get(line, 'black')  # Default color is black if line not found in the dictionary
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"{row['Station Name']}",
        tooltip=f"{row['Station Name']}, {line}",
        icon=folium.Icon(color=color)
    ).add_to(delhi_map_with_line_tooltip)

# Displaying the updated map
delhi_map_with_line_tooltip

## Temporal Analysis:
Analysis of Delhi metro over time.
Observing how many stations opened each year and network over time.
This will display how DMRC developed to cater the needs of pulic transportation.

In [27]:
metro_data['Opening Year']=metro_data['Opening Date'].dt.year

#couting number of stations per year
#stations_per_year=metro_data['Opening Year'].value_counts().sort_index()
stations_per_year=metro_data.groupby('Opening Year').size()
stations_per_year_df=stations_per_year.reset_index()
stations_per_year_df.columns=['Year', 'Number of Stations']
fig = px.bar(stations_per_year_df, x='Year', y='Number of Stations',
             title="Number of Metro Stations Opened Each Year in Delhi",
             labels={'Year': 'Year', 'Number of Stations': 'Number of Stations Opened'})

fig.update_layout(xaxis_tickangle=-45, xaxis=dict(tickmode='linear'),
                  yaxis=dict(title='Number of Stations Opened'),
                  xaxis_title="Year")

fig.show()

## Line Analysis
Analysis of various metro lines in terms of stations they have and average distance between the stations

In [28]:
stations_per_line=metro_data['Line'].value_counts()
#calculating total distance of each metro line:
total_distance_per_line = metro_data.groupby('Line')['Distance from Start (km)'].max()

avg_distance_per_line = total_distance_per_line / (stations_per_line - 1)

line_analysis = pd.DataFrame({
    'Line': stations_per_line.index,
    'Number of Stations': stations_per_line.values,
    'Average Distance Between Stations (km)': avg_distance_per_line
})

# sorting the DataFrame by the number of stations
line_analysis = line_analysis.sort_values(by='Number of Stations', ascending=False)

line_analysis.reset_index(drop=True, inplace=True)
print(line_analysis)

                 Line  Number of Stations  \
0           Blue line                  49   
1           Pink line                  38   
2         Yellow line                  37   
3         Voilet line                  34   
4            Red line                  29   
5        Magenta line                  25   
6           Aqua line                  21   
7          Green line                  21   
8         Rapid Metro                  11   
9    Blue line branch                   8   
10        Orange line                   6   
11          Gray line                   3   
12  Green line branch                   3   

    Average Distance Between Stations (km)  
0                                 1.355000  
1                                 1.097917  
2                                 1.157143  
3                                 1.950000  
4                                 1.240000  
5                                 1.050000  
6                                 1.379167  
7        

In [29]:
# creating subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Number of Stations Per Metro Line',
                                                    'Average Distance Between Stations Per Metro Line'),
                    horizontal_spacing=0.2)

# plot for Number of Stations per Line
fig.add_trace(
    go.Bar(y=line_analysis['Line'], x=line_analysis['Number of Stations'],
           orientation='h', name='Number of Stations', marker_color='crimson'),
    row=1, col=1
)

# plot for Average Distance Between Stations
fig.add_trace(
    go.Bar(y=line_analysis['Line'], x=line_analysis['Average Distance Between Stations (km)'],
           orientation='h', name='Average Distance (km)', marker_color='navy'),
    row=1, col=2
)

# update xaxis properties
fig.update_xaxes(title_text="Number of Stations", row=1, col=1)
fig.update_xaxes(title_text="Average Distance Between Stations (km)", row=1, col=2)

# update yaxis properties
fig.update_yaxes(title_text="Metro Line", row=1, col=1)
fig.update_yaxes(title_text="", row=1, col=2)

# update layout
fig.update_layout(height=600, width=1200, title_text="Metro Line Analysis", template="plotly_white")

fig.show()

## Station layout analysis:
Analyzing the distribution of stations on different levels:
elevated, ground level or underground

In [30]:
layout_counts=metro_data['Station Layout'].value_counts()
# creating a bar using plotly:
fig = px.bar(x=layout_counts.index, y=layout_counts.values,
             labels={'x': 'Station Layout', 'y': 'Number of Stations'},
             title='Distribution of Delhi Metro Station Layouts',
             color=layout_counts.index,
             color_continuous_scale='pastel')

# creating layout:
fig.update_layout(xaxis_title="Station Layout",
                  yaxis_title="Number of Stations",
                  coloraxis_showscale=False,
                  template="plotly_white")

fig.show()

## Insights
Key takeaways from the analysis:
-  STATION GROWTH TREND: Most stations were opened in the years 2018,2010, and 2005.
-  METRO LINE DEVELOPMENT: Blue line has the highest number of metro stations whereas green line branch has the least number of metro stations.
-  GEOGRAPHIC SPREAD: The metro is spread all over Delhi covering most important parts of the capital city with easy access of airport as well. The lines intersect with each other aiding in line change protocol for the travellers


## Conclusion
 This project provides insights into developemnt of the Delhi Metro system. It leverages advanced visualizations to communicate findings effectively, making it useful tool for extensive use case by city planners and policy makers.