In [1]:
# !pip install numpy 
# !pip install matplotlib
# !pip install seaborn


This Jupyter Notebook analyzes radio communication event data from the file "Radio Event 1300003.csv". The dataset logs events from a radio or dispatch system, capturing details like event timestamps, call types, durations, operator IDs, and statuses. The goal is to provide insights into system usage, efficiency, and performance through key performance indicators (KPIs) and visualizations.
## Objectives
- Load and validate the data into a pandas DataFrame (`df`).
- Calculate KPIs to summarize system activity.
- Create visualizations to explore trends and patterns.
- Provide detailed explanations for each step to aid understanding.
## Dataset Description
The data contains columns such as:
- `evid`: Unique event ID.
- `dt`: Timestamp of the event (e.g., `2025-05-19 11:31:24.617`).
- `opid`: Operator ID.
- `grpname`: Call type (e.g., "All Call" or individual call).
- `duration`: Call duration in seconds.
- `statusid`: Event status (e.g., 1, 3, 5, possibly indicating initiated, failed, or completed).
- Other metadata like `source_name`, `target_name`, and `channel_id`.

This analysis is useful for system administrators, dispatch managers, or analysts to optimize resource allocation, identify bottlenecks, and ensure reliable communication.


In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set a consistent style for plots to enhance readability


In [5]:
df = pd.read_csv("Radio Event 1300003.csv", sep=";")
print("DataFrame Loaded Successfully")
print(f"Shape: {df.shape[0]} rows, {df.shape[1]} columns")
print("Columns:", ", ".join(df.columns))
print("\nFirst Few Rows:")
print(df.head().to_string())


DataFrame Loaded Successfully
Shape: 16 rows, 33 columns
Columns: evid, dt, opid, dispid, rsname, mcssn, mcsname, msuid, msuname, grpname, typeid, direction, statusid, description, duration, notRead, recorded, media, extdispid, soundfile, trgid, trgname, grpid, loneworker, addInfo, sourceid, userName, source_name, target_name, source_uid, target_uid, channel_id, is_server_event

First Few Rows:
     evid                       dt  opid                            dispid   rsname       mcssn     mcsname  msuid  msuname   grpname  typeid  direction  statusid            description  duration  notRead  recorded  media  extdispid  soundfile  trgid  trgname  grpid  loneworker  addInfo  sourceid  userName  source_name target_name                      source_uid                     target_uid  channel_id  is_server_event
0  171244  2025-05-19 11:31:24.617     1  bc0599df84a340d6a428da8322e22082  Central  511TSD4885  VOIX TUNIS    NaN      NaN       NaN       0          0         1            Tun

In [6]:
df['dt'] = pd.to_datetime(df['dt'], format='%Y-%m-%d %H:%M:%S.%f', errors='coerce')


### KPIs
1. **Total Events**: Total number of communication events.
2. **Average Call Duration**: Mean duration of calls (in seconds).
3. **Peak Activity Hour**: Hour with the most events, indicating busiest time.
4. **All Call Percentage**: Percentage of events that are "All Call" (broadcasts to all users).

### Why These KPIs?
- **Total Events**: Measures overall system activity.
- **Average Call Duration**: Indicates efficiency of communications.
- **Peak Activity Hour**: Helps with staffing and resource planning.
- **All Call Percentage**: Shows reliance on broadcasts vs. individual calls.

In [8]:


# Create a subset of data with valid datetime values for time-based analysis
df_valid_dt = df[df['dt'].notna()]

# KPI 1: Total number of events
total_events = len(df)
print("\nKPI 1: Total Events")
print(f"Total number of communication events: {total_events}")

# KPI 2: Average call duration
avg_duration = df['duration'].mean()
print("\nKPI 2: Average Call Duration")
print(f"Average duration of calls: {avg_duration:.2f} seconds")

# KPI 3: Peak activity period (hour with most events)
if not df_valid_dt.empty:
    df_valid_dt['hour'] = df_valid_dt['dt'].dt.hour
    hour_counts = df_valid_dt['hour'].value_counts().sort_index()
    peak_hour = hour_counts.idxmax()
    peak_hour_events = hour_counts.max()
    print("\nKPI 3: Peak Activity Hour")
    print(f"Hour with most events: {peak_hour}:00")
    print(f"Number of events in peak hour: {peak_hour_events}")
    print("Events per hour:\n", hour_counts.to_string())
else:
    print("\nKPI 3: Peak Activity Hour")
    print("No valid datetime data available for peak hour analysis.")

# KPI 4: Percentage of "All Call" events
all_call_count = len(df[df['grpname'] == 'All Call'])
all_call_percentage = (all_call_count / total_events) * 100 if total_events > 0 else 0
print("\nKPI 4: All Call Percentage")


KPI 1: Total Events
Total number of communication events: 16

KPI 2: Average Call Duration
Average duration of calls: 1.12 seconds

KPI 3: Peak Activity Hour
Hour with most events: 11:00
Number of events in peak hour: 16
Events per hour:
 hour
11    16

KPI 4: All Call Percentage
