# **Data Visualization Notebook**

In this notebook, we will be goining through Worldwide Traffic Congestion dataset which contains names of cities along with City rank and Max & Average TCI (Traffic Congestion Index) and dipicting data using visuals.

This dataset is contributed by @KOUSTUBHK

TCI, calculated only for the center of the tracked location (the city image is split in 9 equal rectangles, forming a 3x3 grid. The central rectangle is taken into consideration when calculating TCI).

Every 20 minutes, the web app saves an image for each tracked location, containing the traffic data reported by Google Maps. After a couple of minutes, the images are analyzed, and the percentages of the 4 traffic colors are calculated.

Let's call these percentages:

green → P0

orange → P1

red → P2

dark red → P3

Obviously , the sum of all these percentages is 100:

P0 + P1 + P2 + P3 = 100

Based on these percentages, the TCI (Traffic Congestion Index) is calculated:

TCI = (0 * P0) + (1 * P1) + (2 * P2) + (3 * P3)

In [None]:
#Imports 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

#Get the dataset files
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
#Let's get the data
df = pd.read_csv('/kaggle/input/worldwide-traffic-congestion-ranking/TrafficIndex_19Jun2022-26Jun2022.csv')

In [None]:
#Printing first 10 lines of csv file
df.head(10)

In [None]:
#Get all info about dataset
df.info

In [None]:
#check for any null values
df.isnull().sum()

In [None]:
#Imports for plotting/ Visuals

import matplotlib.pyplot as plt
import seaborn as sns

df['City'].value_counts().head(20).plot(kind = 'pie', figsize = (6,6))


Pie chart depicting first 20 cities and their names.

In [None]:
df['AverageTCI'].value_counts().head(20).plot(kind = 'pie', figsize = (6,6))

In [None]:
df['MaxTCI'].value_counts().head(20).plot(kind = 'pie', figsize = (6,6))

In [None]:
#City Vs MaxTCI

import seaborn as sns
plt.figure(figsize = (25,8))
sns.barplot(x = 'City', y = 'MaxTCI', data = df)

The above barplot depicts MaxTCI for each city - though there are total 68 cities, there will be a mess to plot them all.
A solution to this is below.

In [None]:
#City vs MaxTCI

cities_with_same_TCI = df.groupby(['City'])['MaxTCI'].sum().reset_index()
cities_with_same_TCI = cities_with_same_TCI.sort_values('MaxTCI', ascending = False)
plt.figure(figsize = (25,8))
sns.barplot(x = 'City', y = 'MaxTCI', data = cities_with_same_TCI.head(20))

Plotting first 20 cities and their Max TCI values for clearer depiction.

In [None]:
#City Vs AverageTCI

cities_with_same_TCI = df.groupby(['City'])['AverageTCI'].sum().reset_index()
cities_with_same_TCI = cities_with_same_TCI.sort_values('AverageTCI', ascending = False)
plt.figure(figsize = (25,8))
sns.barplot(x = 'City', y = 'AverageTCI', data = cities_with_same_TCI.head(20))

Plotting first 20 cities and their Average TCI values

In [None]:
#Rank Vs City plot

plt.figure(figsize = (25,8))
sns.barplot(x = 'City', y = 'Rank', data = df.head(20))

Finally we can plot for Rank of each city corresponding to city name.

Thank You :)

Please leave your feedback and suggestions and help me improve.