## Dataset Description

| Column Name | Description | Data Type |
| :---------------- | :------: | ----: |
|   City  |   Name of the city	   | Object |
| Datetime           |   Date and time of the data entry	   | Object |
|  TrafficIndexLive    |  Real-time traffic index	   | Int |
|  Jamscount |  Number of traffic jams	   | Int |
|   JamsDelay  |   Total delay caused by traffic jams (in minutes)	   | Float |
|JamsLength           |   Total length of traffic jams (in kilometers)	   | Float |
|  TrafficIndexWeekAgo    |  Traffic index one week ago	   | Int |
|  TravelTimeHistoric |  Historical average travel time (in minutes)	   | Float |
|   TravelTimeLive  |   Real-time travel time (in minutes)	   | Float |


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder

In [None]:
df = pd.read_csv('Dataset.csv')

In [None]:
df.head()

## 1. Exploratory Data Analysis (EDA)

In [None]:
display(df.describe())

In [None]:
display(df.shape)

In [None]:
display("Jams Count: mean,std")
display(df['JamsCount'].mean())


In [None]:
display(df['JamsCount'].std())
display("JamsDelay: mean,std")


In [None]:
display(df['JamsDelay'].mean())
display(df['JamsDelay'].std())


In [None]:
display("JamsLength: mean,std")
display(df['JamsLength'].mean())
display(df['JamsLength'].std())


In [None]:
display("TravelTimeHistoric: mean,std")
display(df['TravelTimeHistoric'].mean())
display(df['TravelTimeHistoric'].std())

In [None]:
display(df.groupby('City').agg(
    Average_Jams_Count = ('JamsCount', 'mean'),
    Average_Jams_Delay = ('JamsDelay', 'mean'),
    Average_Jams_Length = ('JamsLength', 'mean'),
    Average_Travel_Time_Historic = ('TravelTimeHistoric', 'mean')
))

### Visualize the distribution of key variables (e.g., Traffic_Index, Date).

In [None]:
plt.figure(figsize=(5,5))
plt.hist(df['JamsCount'])
plt.title('Distribution of JamsCount')
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.hist(df['JamsDelay'])
plt.title('Distribution of JamsDelay')
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.hist(df['JamsLength'])
plt.title('Distribution of JamsLength')
plt.show()

In [None]:
plt.figure(figsize=(5,5))
plt.hist(df['TravelTimeHistoric'])
plt.title('Distribution of TravelTimeHistoric')
plt.show()

In [None]:
plt.figure(figsize=(5,5))
plt.hist(df['City'].value_counts().index)
plt.title('Distribution of City')
plt.xticks(rotation=90)
plt.show()

### Explore relationships between variables (e.g., Traffic_Index vs. Weather_Condition).


In [None]:
Encoded_df = df.copy()

for i in Encoded_df.select_dtypes(include='object').columns:
  le = LabelEncoder()
  Encoded_df[i] = le.fit_transform(Encoded_df[i])

display(Encoded_df.corr())

display(df.select_dtypes(exclude='object').corr())

## 2. Data Visualization

* Ensure the visualizations are clear and informative.

### Create visualizations to illustrate the findings from the EDA.


In [None]:
plt.figure(figsize=(5,5))
plt.hist(df['TrafficIndexLive'])
plt.title('Distribution of TrafficIndexLive')
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.hist(df['JamsCount'])
plt.title('Distribution of JamsCount')
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.hist(df['JamsDelay'])
plt.title('Distribution of JamsDelay')
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.hist(df['JamsLength'])
plt.title('Distribution of JamsLength')
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.hist(df['TrafficIndexWeekAgo'])
plt.title('Distribution of TrafficIndexWeekAgo')
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.hist(df['TravelTimeHistoric'])
plt.title('Distribution of TravelTimeHistoric')
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.hist(df['TravelTimeLive'])
plt.title('Distribution of TravelTimeLive')
plt.show()

In [None]:

plt.figure(figsize=(5, 5))
plt.barh(df['City'].value_counts().index, df['City'].value_counts().values)
plt.show()

In [None]:

plt.figure(figsize=(20, 20))
sns.heatmap(df.select_dtypes(exclude=['object']).corr(), annot=True)
plt.show()

### Use appropriate plots such as histograms, bar charts, pie charts, scatter plots, and heatmaps.

In [None]:
plt.figure(figsize=(5,5))
plt.scatter(df['JamsCount'], df['JamsDelay'])
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.scatter(df['JamsCount'], df['TrafficIndexLive'])
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.scatter(df['JamsCount'], df['JamsLength'])
plt.show()

In [None]:

plt.figure(figsize=(5,5))
plt.scatter(df['JamsCount'], df['TrafficIndexWeekAgo'])
plt.show()

In [None]:

plt.figure(figsize=(5,5))
sns.violinplot(df['JamsCount'])
plt.show()

In [None]:

plt.figure(figsize=(5,5))
sns.violinplot(df['JamsDelay'])
plt.show()

In [None]:

plt.figure(figsize=(5,5))
sns.violinplot(df['JamsLength'])
plt.show()

In [None]:

plt.figure(figsize=(5,5))
sns.violinplot(df['TrafficIndexWeekAgo'])
plt.show()

In [None]:

plt.figure(figsize=(5,5))
sns.barplot(data=df, x='City', y='JamsCount')
plt.xticks(rotation=90)
plt.show()

In [None]:

df.plot.hexbin(
    x='JamsCount',
    y='JamsDelay',
    gridsize=20,
)
plt.show()

In [None]:

df.plot.hexbin(
    x='JamsCount',
    y='JamsLength',
    gridsize=20,
)
plt.show()

In [None]:
sns.pairplot(df)

## 3. Insights and Conclusions

## Summary of Findings
In this analysis of traffic data, we examined various metrics related to traffic congestion in multiple cities. The key findings from our exploratory data analysis (EDA) are:

### **Traffic Congestion Metrics**:
- Dubai has the highest traffic congestion, including the highest counts of traffic jams, longest delays, and longest traffic jam lengths.
The distributions of JamsCount, JamsDelay, and JamsLength are skewed to the left.
TravelTimeHistoric shows a normal distribution when adjusted to start from zero.

### **Correlations**:
- Most numerical values exhibit a high positive correlation, meaning as one increases, others tend to increase as well.
TravelTimeHistoric does not correlate strongly with other variables.
Datetime and City variables do not significantly impact other columns.

### **City-Specific Insights**:
- Dubai, Cairo, Kuwait, Riyadh, and Doha are the cities with the highest traffic congestion.
Traffic jams generally start small and grow larger, indicating that early intervention could mitigate the severity of traffic jams.

## Decisions and Recommendations
Based on the insights gained from the analysis, the following decisions and recommendations are proposed:

#### **Focused Intervention in Dubai**:
- Develop more efficient road networks in Dubai to reduce the frequency and severity of traffic jams.
Implement early detection and response systems to address potential jams before they escalate.

#### **Traffic Management Strategies**:
- Place traffic officers at locations where jams commonly begin to manage and mitigate congestion effectively.
Prioritize intervention in cities with the highest congestion rates to improve overall traffic flow.