## Objective
* The objective of this exercise is to explore and visualize the Traffic Index dataset to gain insights and understand the patterns in the data.

##Dataset Description

| Column Name | Description | Data Type |
| :---------------- | :------: | ----: |
|   City  |   Name of the city	   | Object |
| Datetime           |   Date and time of the data entry	   | Object |
|  TrafficIndexLive    |  Real-time traffic index	   | Int |
|  Jamscount |  Number of traffic jams	   | Int |
|   JamsDelay  |   Total delay caused by traffic jams (in minutes)	   | Float |
|JamsLength           |   Total length of traffic jams (in kilometers)	   | Float |
|  TrafficIndexWeekAgo    |  Traffic index one week ago	   | Int |
|  TravelTimeHistoric |  Historical average travel time (in minutes)	   | Float |
|   TravelTimeLive  |   Real-time travel time (in minutes)	   | Float |


# Tasks

## 1. Exploratory Data Analysis (EDA)

### Perform summary statistics on the dataset.

In [19]:
import pandas as pd
df = pd.read_csv('/content/Task (2) Dataset.csv')
df.head()
print(df.describe())

       TrafficIndexLive     JamsCount     JamsDelay    JamsLength  \
count      66639.000000  66639.000000  66639.000000  66639.000000   
mean          14.043113     74.278531    288.353877     49.316135   
std           13.488906    107.452022    470.013224     85.352525   
min            0.000000      0.000000      0.000000      0.000000   
25%            3.000000      9.000000     27.700000      3.000000   
50%           10.000000     29.000000     95.700000     12.200000   
75%           21.000000     95.000000    336.600000     53.500000   
max          138.000000   1359.000000   9989.400000   1173.900000   

       TrafficIndexWeekAgo  TravelTimeHistoric  TravelTimeLive  
count         66639.000000        62772.000000    62772.000000  
mean             13.981737           70.706601       70.048451  
std              13.454922           10.588384       11.966725  
min               0.000000           49.381346       46.723235  
25%               3.000000           63.142591       

### Identify and analyze patterns in the data.

In [52]:
time_grouped_cars = df.groupby(['City','Datetime']).agg(
    total_jam_count=('JamsCount','sum'),
    average_jam_delay=('JamsDelay','mean'),
    average_jam_length=('JamsLength','mean'),
    total_jam_delay=('JamsDelay','sum'),
    total_jam_length=('JamsLength','sum'),
    avg_travel_time=('TravelTimeHistoric','mean'),
    real_time_travel_time=('TravelTimeLive','mean')
)

grouped_cars = df.groupby('City').agg(
    total_jam_count=('JamsCount','sum'),
    average_jam_delay=('JamsDelay','mean'),
    average_jam_length=('JamsLength','mean'),
    total_jam_delay=('JamsDelay','sum'),
    total_jam_length=('JamsLength','sum'),
    avg_travel_time=('TravelTimeHistoric','mean'),
    real_time_travel_time=('TravelTimeLive','mean')
)
grouped_cars
time_grouped_cars

Unnamed: 0_level_0,total_jam_count,average_jam_delay,average_jam_length,total_jam_delay,total_jam_length,avg_travel_time,real_time_travel_time
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Abudhabi,115421,107.800285,14.308282,416540.3,55287.2,59.280597,58.192029
Al-ain,26421,22.606573,2.251863,87351.8,8701.2,63.248751,62.218752
Cairo,567476,269.112911,39.861312,2338591.2,346394.8,88.945268,85.365045
Dammam,130124,123.072516,19.997645,475552.2,77270.9,66.714105,66.137676
Doha,364277,127.612057,14.084261,1109204.0,122420.4,76.245852,76.391722
Dubai,1609530,717.264856,111.623326,6233031.6,970006.7,65.09427,67.113816
Jeddah,188733,182.129322,35.801786,703747.7,138338.1,70.939753,70.635136
Kuwait,592523,247.57861,37.865309,2151705.7,329087.4,63.883796,63.318229
Mecca,46079,42.870885,5.88882,165653.1,22754.4,64.55978,61.185436
Medina,41080,36.790761,3.803778,142159.5,14697.8,71.988467,68.799035


### Visualize the distribution of key variables (e.g., Traffic_Index, Date).

In [38]:
import plotly.express as px
fig = px.pie(df, values='JamsCount', names='City', title='Jam counts depending on city')
fig.show()

### Explore relationships between variables (e.g., Traffic_Index vs. Weather_Condition).


In [43]:
df.groupby(['City','Datetime'])['TravelTimeLive'].mean()

City      Datetime           
Abudhabi  2023-07-07 08:01:30    54.803617
          2023-07-07 09:01:30    56.118629
          2023-07-07 10:46:30    55.518834
          2023-07-07 11:16:30    56.413917
          2023-07-07 12:01:30    56.059246
                                   ...    
Riyadh    2023-12-15 03:01:30    63.245473
          2023-12-15 04:31:30    60.012955
          2023-12-15 05:01:30    57.561438
          2023-12-15 06:01:30    55.463218
          2023-12-15 07:01:30    54.886055
Name: TravelTimeLive, Length: 66639, dtype: float64

## 2. Data Visualization

* Ensure the visualizations are clear and informative.

### Create visualizations to illustrate the findings from the EDA.


In [54]:
fig2 = px.line(grouped_cars, x=grouped_cars.index, y="average_jam_length", title='Total jams depending on city', hover_data='average_jam_delay', markers=True)
fig2.show()

### Use appropriate plots such as histograms, bar charts, pie charts, scatter plots, and heatmaps.

In [55]:
fig3 = px.scatter(df, x='JamsLength', y='TravelTimeLive', color='City', size_max=60)
fig3.show()

## 3. Insights and Conclusions

* <h3>Summarize the key insights gained from the data analysis.<h3/>
* <h3>Draw conclusions based on the patterns observed in the data.<h3/>

In [None]:
# Key insights:

# General Insights:
# 1. The average time it takes people to travel in cairo is 88.97

# Figure 1:
# 1. Dubai has the most total jam counts
# 2. Al ain has the fewest total jam counts
# 3. Kuwait, Cairo, and Doha make up ≈ 30% of the total jam counts

# Figure 2:
# 1. Riyadh has the highest average Jam length with a distance of 138.22km
# 2. Mecca, Medina, and Al Ain have the lowest Jam length and delay values
# 3. Although Dubai has the highest Jam counts it holds the second highest jam length and delay

# Figure 3:
# 1. Medina has the highest density of scatter points
# 2. Cairo has the biggest curve in terms from jumping 0 to max value
# 3. Both Riyadh and Dubai have the highest number of points that are not classified in a group

# Conclusion:
# I can conclude that the reason Riyadh and Dubai have the highest Jam counts and delays is mainly due to both cities being popular touring spots
# I can also conclude that the most crowded times for Riyadh is around 3-4pm due to Employees and students leaving
# Conclusion is that the other cities are less touring spots meaning that Jams may be because of Employees or schools which can be seen from al ain
# Final conclusion is that Mecca and Medina are low on the list due to religious events not being monitored so far, if it were any religious holidays/month I can safely assume that the Jam delay, count, and Time travel would be much higher
