# Team: The Elites

In [None]:
import pandas as pd

# Load the dataset to examine its structure and contents
file_path = 'unnati_phase1_data_revised.csv'
df = pd.read_csv(file_path)

# Show the first few rows of the dataset to understand its structure
df.head(), df.info()


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the aesthetic style of the plots
sns.set(style="whitegrid")

# Initialize the figure
plt.figure(figsize=(20, 15))

# Create subplots
plt.subplot(2, 2, 1)
sns.countplot(data=df, x='Alert', order=df['Alert'].value_counts().index)
plt.title('Frequency of Different Types of Alerts')

plt.subplot(2, 2, 2)
sns.countplot(data=df, x='Vehicle', order=df['Vehicle'].value_counts().index)
plt.title('Frequency of Alerts by Vehicle')
plt.xticks(rotation=45)

plt.subplot(2, 2, 3)
sns.histplot(df['Speed'], bins=30, kde=True)
plt.title('Distribution of Speed During Events')

plt.subplot(2, 2, 4)
df['Date'] = pd.to_datetime(df['Date'])
df['Day_of_Week'] = df['Date'].dt.day_name()
sns.countplot(data=df, x='Day_of_Week', order=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
plt.title('Frequency of Alerts by Day of the Week')

plt.tight_layout()
plt.show()


In [None]:
# Group the data by Vehicle ID 1995 and count the number of events for each type of alert
vehicle_1995_grouped = df[df['Vehicle'] == 1995].groupby('Alert').count()['Date']
vehicle_1995_grouped

### Frequency of Different Types of Alerts
- **cas_ldw (Lane Departure Warning):** Most frequent alert.
- **cas_fcw (Forward Collision Warning):** Second most frequent.
- **cas_hmw (Headway Monitoring and Warning)** and **cas_pcw (Pedestrian Collision Warning)** also present.

### Frequency of Alerts by Vehicle
- Some vehicles generate alerts more frequently.
- Investigate if certain vehicles are prone to incidents.

### Distribution of Speed During Events
- Event speeds show a normal distribution.
- Most events occur at 40-60 km/h.

### Frequency of Alerts by Day of the Week
- Alert frequency is consistent throughout the week.
- Slight increase on Fridays and Saturdays.


## 1. Temporal Analysis

First, we'll explore how the frequency of alerts changes over time. This will involve:

### 1.1 Distribution of Alerts by Hour of the Day

We'll begin by analyzing the variation in alert frequency throughout different hours.


In [None]:
# Convert the 'Time' column to datetime format and extract the hour
df['Time'] = pd.to_datetime(df['Time']).dt.time
df['Hour'] = df['Time'].apply(lambda x: x.hour)

# Initialize the figure
plt.figure(figsize=(12, 6))

# Create the plot
sns.countplot(data=df, x='Hour', hue='Alert')
plt.title('Distribution of Alerts by Hour of the Day')
plt.xlabel('Hour of the Day')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

#### Insights from Distribution of Alerts by Hour of the Day:
- Alert frequency peaks during the morning (around 6-8 AM) and late afternoon (around 4-6 PM), potentially coinciding with rush hours.
- The most frequent alert type throughout the day is cas_ldw (Lane Departure Warning).

### 1.2 Trends in the Number of Alerts Over Time
Next, we'll examine how the number of alerts has changed over the two-month period.


In [None]:
# Group data by date and count the number of alerts for each date
alerts_by_date = df.groupby('Date').size().reset_index(name='Count')

# Initialize the figure
plt.figure(figsize=(15, 6))

# Create the plot
sns.lineplot(data=alerts_by_date, x='Date', y='Count')
plt.title('Trends in the Number of Alerts Over Time')
plt.xlabel('Date')
plt.ylabel('Frequency of Alerts')

plt.tight_layout()
plt.show()

## 2. Spatial Analysis

Moving on, we'll shift our attention to the spatial distribution of these alerts. Our goal is to create a plot using geographical coordinates to potentially pinpoint "hotspots" or areas prone to accidents.

### 2.1 Plotting Geographical Coordinates of Alerts

Our next step involves visualizing the locations where these alerts are most prevalent.


In [None]:
! pip install folium
import folium

# Function to map the alerts to different colors
def get_color(alert_type):
    if alert_type == "cas_fcw":
        return "red"
    elif alert_type == "cas_pcw":
        return "blue"
    elif alert_type == "cas_ldw":
        return "green"
    else:  # cas_hmw
        return "purple"

# Create a base map
m = folium.Map(location=[df['Lat'].mean(), df['Long'].mean()], zoom_start=10)

# Add points to the map
for idx, row in df.iterrows():
    folium.CircleMarker(
        location=(row['Lat'], row['Long']),
        radius=2,
        color=get_color(row['Alert']),
        fill=True,
        fill_opacity=0.6
    ).add_to(m)

m


#### Insights from Geographical Distribution of Alerts:
- Alert concentration in specific areas suggests possible "hotspots" prone to incidents.
- Alert types vary across locations; cas_ldw (Lane Departure Warning) remains prevalent.

## 3. Vehicle-Based Analysis

Moving forward, we'll delve into whether specific vehicles are more inclined to generate particular types of alerts.

### 3.1 Frequency of Alerts by Vehicle Type
Our analysis continues by examining the frequency of alerts categorized by vehicle type.


In [None]:
# Initialize the figure
plt.figure(figsize=(15, 8))

# Create the plot
sns.countplot(data=df, x='Vehicle', hue='Alert', order=df['Vehicle'].value_counts().index)
plt.title('Frequency of Alerts by Vehicle')
plt.xlabel('Vehicle ID')
plt.ylabel('Frequency')
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()


#### Insights from Frequency of Alerts by Vehicle Type:
- Certain vehicles show higher frequencies of specific alerts.
- Vehicle ID 2846 generates numerous cas_ldw (Lane Departure Warning) alerts.
- Alert distribution varies by vehicle, indicating factors such as vehicle condition or driving behavior.

## 4. Speed Analysis

Lastly, we'll investigate the connection between speed and the types of alerts generated.

### 4.1 Speed Distribution by Alert Type
Our investigation proceeds with examining speed distribution categorized by alert type.


In [None]:
# Initialize the figure
plt.figure(figsize=(15, 8))

# Create the plot
sns.boxplot(data=df, x='Alert', y='Speed')
plt.title('Speed Distribution by Alert Type')
plt.xlabel('Type of Alert')
plt.ylabel('Speed (km/h)')

plt.tight_layout()
plt.show()


#### Insights from Speed Distribution by Alert Type:
- The median speed is highest for cas_fcw (Forward Collision Warning) and lowest for cas_pcw (Pedestrian Collision Warning).
- Each alert type has a wide range of associated speeds.
- cas_hmw (Headway Monitoring and Warning) exhibits the most variability in speed.


### Next steps:
#### Key Observations

- Peak alerts during commuting hours (6-8 AM, 4-6 PM), requiring increased vigilance and potential traffic management.

- Hotspots in certain areas indicate higher accident risk; consider improved road infrastructure and enforcement.

- Vehicle-specific alert patterns may suggest maintenance needs or driving behavior adjustments.

- Speed-alert relationship implies adapting speed limits and warnings to incident types at different speeds.

- Frequent cas_ldw alerts point to lane-changing behavior, prompting better lane discipline.

#### Recommendations

- Enhance traffic management in peak hours and hotspots to prevent collisions.

- Monitor and take corrective actions for vehicles frequently triggering alerts.

- Upgrade road infrastructure in hotspot areas to mitigate risks.

- Increase public awareness about speed limits and lane discipline.

- Integrate data with accidents, weather, and road types for comprehensive analysis.

- Develop a real-time alert system for high-risk areas or conditions.
