# Bangalore traffic prediction and analysis

## *Project Objectives :*

### -To analyze traffic patterns in Bangalore using real-world traffic data.
### -To predict traffic congestion levels (Low / Medium / High) based on demand, capacity, disruptions, and temporal factors
### -To identify and explain the key factors that contribute to traffic congestion

## (A) Data loading :


In [1]:
import pandas as pd
dataset = pd.read_csv('bangalore_traffic_daily.csv')
dataset

Unnamed: 0,Date,Area Name,Road/Intersection Name,Traffic Volume,Average Speed,Travel Time Index,Congestion Level,Road Capacity Utilization,Incident Reports,Environmental Impact,Public Transport Usage,Traffic Signal Compliance,Parking Usage,Pedestrian and Cyclist Count,Weather Conditions,Roadwork and Construction Activity
0,2022-01-01,Indiranagar,100 Feet Road,50590,50.230299,1.500000,100.000000,100.000000,0,151.180,70.632330,84.044600,85.403629,111,Clear,No
1,2022-01-01,Indiranagar,CMH Road,30825,29.377125,1.500000,100.000000,100.000000,1,111.650,41.924899,91.407038,59.983689,100,Clear,No
2,2022-01-01,Whitefield,Marathahalli Bridge,7399,54.474398,1.039069,28.347994,36.396525,0,64.798,44.662384,61.375541,95.466020,189,Clear,No
3,2022-01-01,Koramangala,Sony World Junction,60874,43.817610,1.500000,100.000000,100.000000,1,171.748,32.773123,75.547092,63.567452,111,Clear,No
4,2022-01-01,Koramangala,Sarjapur Road,57292,41.116763,1.500000,100.000000,100.000000,3,164.584,35.092601,64.634762,93.155171,104,Clear,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8931,2024-08-09,Electronic City,Hosur Road,11387,23.440276,1.262384,35.871483,57.354487,1,72.774,21.523289,83.530352,97.898279,211,Fog,No
8932,2024-08-09,M.G. Road,Trinity Circle,36477,45.168429,1.500000,100.000000,100.000000,3,122.954,29.822312,60.738488,60.355967,95,Clear,No
8933,2024-08-09,M.G. Road,Anil Kumble Circle,42822,22.028609,1.500000,100.000000,100.000000,1,135.644,43.185905,85.321627,61.333731,110,Clear,No
8934,2024-08-09,Jayanagar,South End Circle,20540,52.254798,1.020520,72.639152,97.845527,2,91.080,44.416043,89.586947,79.197198,94,Clear,No


### Dataset description :
#### This dataset contains daily traffic observations from Bangalore, recorded across major roads and intersections between January 2022 and August 2024.
#### Each record represents traffic conditions for a specific road on a given day and includes information related to traffic demand, road capacity, disruptions, environmental conditions, and commuter behavior.

## (B) Data cleaning and validation

In [5]:
dataset.shape


(8936, 16)

In [6]:
dataset.dtypes


Date                                   object
Area Name                              object
Road/Intersection Name                 object
Traffic Volume                          int64
Average Speed                         float64
Travel Time Index                     float64
Congestion Level                      float64
Road Capacity Utilization             float64
Incident Reports                        int64
Environmental Impact                  float64
Public Transport Usage                float64
Traffic Signal Compliance             float64
Parking Usage                         float64
Pedestrian and Cyclist Count            int64
Weather Conditions                     object
Roadwork and Construction Activity     object
dtype: object

#### 1. Converting date feature from obj to datetime type

In [7]:
dataset['Date'] = pd.to_datetime(dataset['Date'])
dataset.dtypes

Date                                  datetime64[ns]
Area Name                                     object
Road/Intersection Name                        object
Traffic Volume                                 int64
Average Speed                                float64
Travel Time Index                            float64
Congestion Level                             float64
Road Capacity Utilization                    float64
Incident Reports                               int64
Environmental Impact                         float64
Public Transport Usage                       float64
Traffic Signal Compliance                    float64
Parking Usage                                float64
Pedestrian and Cyclist Count                   int64
Weather Conditions                            object
Roadwork and Construction Activity            object
dtype: object

#### 2. Checking for inacurate ranges of date :

In [9]:
dataset['Date'].describe()

count                             8936
mean     2023-04-22 05:25:11.548791552
min                2022-01-01 00:00:00
25%                2022-08-26 00:00:00
50%                2023-04-24 00:00:00
75%                2023-12-17 06:00:00
max                2024-08-09 00:00:00
Name: Date, dtype: object

#### 3. Checking for exactly duplicate rows :

In [10]:
dataset.duplicated().sum()


np.int64(0)

#### 4. Checked for missing values in all columns

In [8]:
dataset.isna().sum()


Date                                  0
Area Name                             0
Road/Intersection Name                0
Traffic Volume                        0
Average Speed                         0
Travel Time Index                     0
Congestion Level                      0
Road Capacity Utilization             0
Incident Reports                      0
Environmental Impact                  0
Public Transport Usage                0
Traffic Signal Compliance             0
Parking Usage                         0
Pedestrian and Cyclist Count          0
Weather Conditions                    0
Roadwork and Construction Activity    0
dtype: int64

#### 5. Checking duplicate entries for same roads on same day :

In [12]:
dataset.duplicated(subset=['Date', 'Road/Intersection Name']).sum()


np.int64(0)

#### 6. Checking unique values for object type features :

In [11]:
dataset.select_dtypes(include='object').nunique()


Area Name                              8
Road/Intersection Name                16
Weather Conditions                     5
Roadwork and Construction Activity     2
dtype: int64

#### 7. Checking for unusual values of parameters : 

In [13]:
(dataset['Traffic Volume'] > 100000).sum()


np.int64(0)

In [14]:
(dataset['Road Capacity Utilization'] > 100).sum()


np.int64(0)

In [15]:
(dataset['Average Speed'] > 120).sum()

np.int64(0)

#### 8. Checking for negative values in numeric type parameters :

In [16]:
(dataset.select_dtypes(include='number') < 0).sum()


Traffic Volume                  0
Average Speed                   0
Travel Time Index               0
Congestion Level                0
Road Capacity Utilization       0
Incident Reports                0
Environmental Impact            0
Public Transport Usage          0
Traffic Signal Compliance       0
Parking Usage                   0
Pedestrian and Cyclist Count    0
dtype: int64