### 🌍 Geospatial Analysis

This notebook presents a geospatial analysis of the **Power System Faults Dataset**, available on [Kaggle](https://www.kaggle.com/datasets/ziya07/power-system-faults-dataset). The dataset contains synthetic records of electrical power system faults across various geographic locations and weather conditions, designed for analysis, pattern recognition, and reliability prediction.

#### 📄 Dataset Description

Each row in the dataset represents a unique power system fault incident and includes information such as:

- **Fault Type** (e.g., Line Breakage, Transformer Failure)
- **Geographic Location** (Latitude & Longitude)
- **Environmental Conditions** (Temperature, Wind Speed, Weather)
- **Electrical Measurements** (Voltage, Current, Power Load)
- **Fault Duration and Downtime**
- **Maintenance and Component Health Status**

In [1]:
# 📥 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium.plugins import MarkerCluster, HeatMap
from scipy.stats import skew, kurtosis
import warnings
warnings.filterwarnings('ignore')

***
### 📥 Load the dataset from URL

In [2]:
url = 'https://raw.githubusercontent.com/Dr-AlaaKhamis/ISE518/refs/heads/main/datasets/fault_data.csv'
df = pd.read_csv(url)

df.head()

Unnamed: 0,Fault ID,Fault Type,"Fault Location (Latitude, Longitude)",Voltage (V),Current (A),Power Load (MW),Temperature (C),Wind Speed (km/h),Weather Condition,Maintenance Status,Component Health,Duration of Fault (hrs),Down time (hrs)
0,F001,Line Breakage,"(34.0522, -118.2437)",2200,250,50,25,20,Clear,Scheduled,Normal,2.0,1.0
1,F002,Transformer Failure,"(34.056, -118.245)",1800,180,45,28,15,Rainy,Completed,Faulty,3.0,5.0
2,F003,Overheating,"(34.0525, -118.244)",2100,230,55,35,25,Windstorm,Pending,Overheated,4.0,6.0
3,F004,Line Breakage,"(34.055, -118.242)",2050,240,48,23,10,Clear,Completed,Normal,2.5,3.0
4,F005,Transformer Failure,"(34.0545, -118.243)",1900,190,50,30,18,Snowy,Scheduled,Faulty,3.5,4.0


#### 📂 Load dataset from local folder

In [3]:
# df = pd.read_csv("data/fault_data.csv")

# df.head()

***
### Visualize data on a folium map with popup labels and power plant icons

In [4]:
# Parse lat/lon from the string column
df[['Latitude', 'Longitude']] = df['Fault Location (Latitude, Longitude)']\
    .str.replace('[()]', '', regex=True).str.split(', ', expand=True).astype(float)

# Create base map
map1 = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=10)

# Add markers with labels
for _, row in df.iterrows():
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=folium.Popup(f"""
            <b>ID:</b> {row['Fault ID']}<br>
            <b>Type:</b> {row['Fault Type']}<br>
            <b>Voltage:</b> {row['Voltage (V)']} V<br>
            <b>Current:</b> {row['Current (A)']} A<br>
            <b>Load:</b> {row['Power Load (MW)']} MW
        """, max_width=300),
        icon=folium.Icon(color="blue", icon="bolt", prefix="fa")
    ).add_to(map1)

# map1.save("folium_power_plant_map.html")
map1

### Cluster Map for the Data

In [5]:
map2 = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=10)
marker_cluster = MarkerCluster().add_to(map2)

for _, row in df.iterrows():
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"{row['Fault ID']} - {row['Fault Type']}",
        icon=folium.Icon(color="green", icon="info-sign")
    ).add_to(marker_cluster)

# map2.save("folium_cluster_map.html")
map2

### Bubble Map 

Based on Duration of Fault (hrs) Discretized into 3 Bins

In [6]:
# Bin durations
df['Duration_Bin'] = pd.cut(df['Duration of Fault (hrs)'], bins=3, labels=["Short", "Medium", "Long"])

map3 = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=10)

for _, row in df.iterrows():
    folium.CircleMarker(
        location=[row['Latitude'], row['Longitude']],
        radius=5 + row['Duration of Fault (hrs)'],  # Bubble size
        color="crimson",
        fill=True,
        fill_opacity=0.6,
        popup=f"{row['Fault ID']} - Duration: {row['Duration of Fault (hrs)']} hrs ({row['Duration_Bin']})"
    ).add_to(map3)

# map3.save("folium_bubble_duration_map.html")
map3

### Heat Map 

Based on Down time (hrs) Discretized into 3 Bins

In [7]:
# Prepare heatmap data with weight = downtime
heat_data = [[row['Latitude'], row['Longitude'], row['Down time (hrs)']] for _, row in df.iterrows()]

map5 = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=10)
HeatMap(heat_data, radius=15, blur=10, max_zoom=1).add_to(map5)

# map5.save("folium_heatmap.html")
map5

***
### 📍 **More information**
For more information, see Alaa Khamis, [Handling Geospatial Data and Mapping in Python](https://medium.com/ai4sm/handling-geospatial-data-and-mapping-in-python-5e63326a13d5), Medium, August 31, 2023.