## Bridge Structural Integrity and Collapse Prediction

### 📊 Dataset Overview

| Feature | Description |
|----------|--------------|
| `Bridge_ID` | Unique identifier for each bridge |
| `Location` | Geographic region of the bridge |
| `Age (years)` | Age of the bridge in years |
| `Material` | Main construction material (e.g., Steel, Concrete, Wood) |
| `Length (m)` | Total bridge length in meters |
| `Width (m)` | Bridge width in meters |
| `Height (m)` | Bridge height in meters |
| `Traffic_Volume (vehicles/day)` | Average number of vehicles crossing per day |
| `Weather_Conditions` | Weather type at the time of data collection (e.g., Rainy, Sunny, Snowy) |
| `Water_Flow_Rate (m³/s)` | Flow rate of water under the bridge |
| `Stress (MPa)` | Mechanical stress measured in megapascals |
| `Strain (%)` | Strain percentage in the bridge material |
| `Tensile_Strength (MPa)` | Material tensile strength |
| `Rainfall (mm)` | Rainfall in millimeters |
| `Material_Composition` | Percentage ratio of materials used (e.g., Steel 70%, Concrete 30%) |
| `Bridge_Design` | Structural design type (e.g., Arch, Beam, Truss) |
| `Construction_Quality` | Overall quality rating of construction |
| `Temperature (°C)` | Environmental temperature |
| `Humidity (%)` | Relative humidity |
| `Collapse_Status` | Target variable: bridge condition (Standing = 0, Collapsed = 1) |

### 🎯 Problem Statement
Develop a **deep learning model** using **LSTM/GRU** networks to predict whether a bridge is **likely to collapse or remain standing** based on its material, environmental, and structural parameters.

This dataset combines **static features** (e.g., material, design, dimensions) and **dynamic features** (e.g., stress, strain, traffic, weather) that influence bridge integrity.

The target column `Collapse_Status` indicates the **final health state** of the bridge:
- `0` → Standing  
- `1` → Collapsed  

### 🧩 Modeling Goal
Build an **LSTM/GRU-based classification model** capable of:
- Learning temporal and structural dependencies  
- Predicting potential collapse risk in advance  
- Providing early warnings for maintenance planning  

### 🔧 Feature Engineering Ideas
- Extract year and month from maintenance dates (if available)  
- Convert categorical columns using label or one-hot encoding  
- Parse material composition into numeric ratios (e.g., `Steel_%`, `Concrete_%`)  
- Normalize numerical columns for stable model convergence  
- Remove uninformative columns such as `Bridge_ID` or constant-quality columns  

---



In [2]:
import pandas as pd
import os 
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import torch
from torch.utils.data import DataLoader,TensorDataset
import numpy as np 
import torch.nn as nn

In [3]:
%pwd


'/home/leksman/Desktop/my git hub work/end_to_end_A_I_H_M_S/notebooks'

In [4]:
os.chdir("../")

In [5]:
%pwd


'/home/leksman/Desktop/my git hub work/end_to_end_A_I_H_M_S'

In [6]:
df = pd.read_csv("data/brigde_dataset/bridge_data.csv")

In [34]:
df.dtypes

Age (years)                        int64
Material                           int64
Length (m)                       float64
Width (m)                        float64
Height (m)                       float64
Traffic_Volume (vehicles/day)    float64
Water_Flow_Rate (m³/s)           float64
Stress (MPa)                     float64
Strain (%)                       float64
Tensile_Strength (MPa)           float64
Rainfall (mm)                    float64
Bridge_Design                      int64
Temperature (°C)                 float64
Humidity (%)                     float64
Collapse_Status                    int64
Weather_Conditions_Rainy           int64
Weather_Conditions_Snowy           int64
Weather_Conditions_Sunny           int64
Weather_Conditions_Windy           int64
Maintenance_Year                   int32
Maintenance_Month                  int32
Steel_%                            int64
Concrete_%                         int64
Wood_%                             int64
dtype: object

In [None]:
df

Unnamed: 0,Bridge_ID,Location,Age (years),Material,Length (m),Width (m),Height (m),Traffic_Volume (vehicles/day),Weather_Conditions,Water_Flow_Rate (m³/s),...,Stress (MPa),Strain (%),Tensile_Strength (MPa),Rainfall (mm),Material_Composition,Bridge_Design,Construction_Quality,Temperature (°C),Humidity (%),Collapse_Status
0,1,Region_1,26,Steel,50.000000,5.000000,10.000000,100.000000,Rainy,0.00000,...,0.000000,0.000,200.000000,0.000000,"Steel 70%, Concrete 30%",Arch,Good,-30.000000,0.000000,Standing
1,2,Region_2,47,Concrete,50.195020,5.004500,10.009001,100.990099,Sunny,0.50005,...,0.010001,0.001,200.080008,0.050005,"Concrete 80%, Wood 20%",Beam,Good,-29.991999,0.010001,Standing
2,3,Region_3,1,Wood,50.390039,5.009001,10.018002,101.980198,Snowy,1.00010,...,0.020002,0.002,200.160016,0.100010,"Concrete 80%, Wood 20%",Truss,Good,-29.983998,0.020002,Standing
3,4,Region_4,32,Steel,50.585059,5.013501,10.027003,102.970297,Cloudy,1.50015,...,0.030003,0.003,200.240024,0.150015,"Steel 70%, Concrete 30%",Arch,Good,-29.975998,0.030003,Standing
4,5,Region_5,27,Concrete,50.780078,5.018002,10.036004,103.960396,Windy,2.00020,...,0.040004,0.004,200.320032,0.200020,"Concrete 80%, Wood 20%",Beam,Good,-29.967997,0.040004,Standing
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,Region_1,13,Wood,1999.219922,49.981998,99.963996,9996.039604,Rainy,4997.99980,...,99.959996,9.996,999.679968,499.799980,"Concrete 80%, Wood 20%",Truss,Good,49.967997,99.959996,Collapsed
9996,9997,Region_2,6,Steel,1999.414941,49.986499,99.972997,9997.029703,Sunny,4998.49985,...,99.969997,9.997,999.759976,499.849985,"Steel 70%, Concrete 30%",Arch,Good,49.975998,99.969997,Collapsed
9997,9998,Region_3,44,Concrete,1999.609961,49.990999,99.981998,9998.019802,Snowy,4998.99990,...,99.979998,9.998,999.839984,499.899990,"Concrete 80%, Wood 20%",Beam,Good,49.983998,99.979998,Collapsed
9998,9999,Region_4,5,Wood,1999.804980,49.995500,99.990999,9999.009901,Cloudy,4999.49995,...,99.989999,9.999,999.919992,499.949995,"Concrete 80%, Wood 20%",Truss,Good,49.991999,99.989999,Collapsed


In [8]:
df = df.drop(["Location","Bridge_ID"],axis=1) 

In [9]:
df.info

<bound method DataFrame.info of       Age (years)  Material   Length (m)  Width (m)  Height (m)  \
0              26     Steel    50.000000   5.000000   10.000000   
1              47  Concrete    50.195020   5.004500   10.009001   
2               1      Wood    50.390039   5.009001   10.018002   
3              32     Steel    50.585059   5.013501   10.027003   
4              27  Concrete    50.780078   5.018002   10.036004   
...           ...       ...          ...        ...         ...   
9995           13      Wood  1999.219922  49.981998   99.963996   
9996            6     Steel  1999.414941  49.986499   99.972997   
9997           44  Concrete  1999.609961  49.990999   99.981998   
9998            5      Wood  1999.804980  49.995500   99.990999   
9999            7     Steel  2000.000000  50.000000  100.000000   

      Traffic_Volume (vehicles/day) Weather_Conditions  \
0                        100.000000              Rainy   
1                        100.990099            

In [10]:
cat_df = df.select_dtypes(include="object")

In [11]:
columns = cat_df.columns.to_list()

In [12]:
for feature in columns:
    
    print(f"{feature} have  {len(df[feature].unique())}  values , they are:{df[feature].unique()}")

Material have  3  values , they are:['Steel' 'Concrete' 'Wood']
Weather_Conditions have  5  values , they are:['Rainy' 'Sunny' 'Snowy' 'Cloudy' 'Windy']
Maintenance_History have  10000  values , they are:['2010-01-01' '2010-01-02' '2010-01-03' ... '2037-05-16' '2037-05-17'
 '2037-05-18']
Material_Composition have  2  values , they are:['Steel 70%, Concrete 30%' 'Concrete 80%, Wood 20%']
Bridge_Design have  3  values , they are:['Arch' 'Beam' 'Truss']
Construction_Quality have  1  values , they are:['Good']
Collapse_Status have  2  values , they are:['Standing' 'Collapsed']


### Below are short text categories that is why i use labelEncoder

In [13]:
le = LabelEncoder()
df['Material'] = le.fit_transform(df['Material'])
df['Bridge_Design'] = le.fit_transform(df['Bridge_Design'])

### I use one-hot-encoder for whether_condiction because the unique values ain't in order because each vaule have there own important

In [14]:
df = pd.get_dummies(df, columns=['Weather_Conditions'], drop_first=True,dtype=int)

### I create new columns for Maintenance data History because i want the model to learn time based trend from the dataset

In [15]:
df['Maintenance_History'] = pd.to_datetime(df['Maintenance_History'])
df['Maintenance_Year'] = df['Maintenance_History'].dt.year
df['Maintenance_Month'] = df['Maintenance_History'].dt.month

# drop the original column
df.drop('Maintenance_History', axis=1, inplace=True)


### I reformated ['Steel 70%, Concrete 30%' 'Concrete 80%, Wood 20%'] to new columns 

In [16]:
df['Steel_%'] = df['Material_Composition'].apply(lambda x: int(x.split('Steel ')[1].split('%')[0]) if 'Steel' in x else 0)
df['Concrete_%'] = df['Material_Composition'].apply(lambda x: int(x.split('Concrete ')[1].split('%')[0]) if 'Concrete' in x else 0)
df['Wood_%'] = df['Material_Composition'].apply(lambda x: int(x.split('Wood ')[1].split('%')[0]) if 'Wood' in x else 0)

df.drop('Material_Composition', axis=1, inplace=True)


### I droped Construction_Quality because it only has a unique value

In [17]:
df.drop('Construction_Quality', axis=1, inplace=True)


### Converted my target label into 0s and 1s

In [18]:
df['Collapse_Status'] = df['Collapse_Status'].map({'Standing': 0, 'Collapsed': 1})

In [19]:
df

Unnamed: 0,Age (years),Material,Length (m),Width (m),Height (m),Traffic_Volume (vehicles/day),Water_Flow_Rate (m³/s),Stress (MPa),Strain (%),Tensile_Strength (MPa),...,Collapse_Status,Weather_Conditions_Rainy,Weather_Conditions_Snowy,Weather_Conditions_Sunny,Weather_Conditions_Windy,Maintenance_Year,Maintenance_Month,Steel_%,Concrete_%,Wood_%
0,26,1,50.000000,5.000000,10.000000,100.000000,0.00000,0.000000,0.000,200.000000,...,0,1,0,0,0,2010,1,70,30,0
1,47,0,50.195020,5.004500,10.009001,100.990099,0.50005,0.010001,0.001,200.080008,...,0,0,0,1,0,2010,1,0,80,20
2,1,2,50.390039,5.009001,10.018002,101.980198,1.00010,0.020002,0.002,200.160016,...,0,0,1,0,0,2010,1,0,80,20
3,32,1,50.585059,5.013501,10.027003,102.970297,1.50015,0.030003,0.003,200.240024,...,0,0,0,0,0,2010,1,70,30,0
4,27,0,50.780078,5.018002,10.036004,103.960396,2.00020,0.040004,0.004,200.320032,...,0,0,0,0,1,2010,1,0,80,20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,13,2,1999.219922,49.981998,99.963996,9996.039604,4997.99980,99.959996,9.996,999.679968,...,1,1,0,0,0,2037,5,0,80,20
9996,6,1,1999.414941,49.986499,99.972997,9997.029703,4998.49985,99.969997,9.997,999.759976,...,1,0,0,1,0,2037,5,70,30,0
9997,44,0,1999.609961,49.990999,99.981998,9998.019802,4998.99990,99.979998,9.998,999.839984,...,1,0,1,0,0,2037,5,0,80,20
9998,5,2,1999.804980,49.995500,99.990999,9999.009901,4999.49995,99.989999,9.999,999.919992,...,1,0,0,0,0,2037,5,0,80,20


In [20]:
df.describe().corr()['Collapse_Status']

Age (years)                      0.999991
Material                         1.000000
Length (m)                       0.982584
Width (m)                        0.999992
Height (m)                       0.999967
Traffic_Volume (vehicles/day)    0.520125
Water_Flow_Rate (m³/s)           0.869189
Stress (MPa)                     0.999962
Strain (%)                       1.000000
Tensile_Strength (MPa)           0.996716
Rainfall (mm)                    0.999011
Bridge_Design                    1.000000
Temperature (°C)                 0.999977
Humidity (%)                     0.999962
Collapse_Status                  1.000000
Weather_Conditions_Rainy         1.000000
Weather_Conditions_Snowy         1.000000
Weather_Conditions_Sunny         1.000000
Weather_Conditions_Windy         1.000000
Maintenance_Year                 0.972082
Maintenance_Month                1.000000
Steel_%                          0.999966
Concrete_%                       0.999976
Wood_%                           0

In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 24 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age (years)                    10000 non-null  int64  
 1   Material                       10000 non-null  int64  
 2   Length (m)                     10000 non-null  float64
 3   Width (m)                      10000 non-null  float64
 4   Height (m)                     10000 non-null  float64
 5   Traffic_Volume (vehicles/day)  10000 non-null  float64
 6   Water_Flow_Rate (m³/s)         10000 non-null  float64
 7   Stress (MPa)                   10000 non-null  float64
 8   Strain (%)                     10000 non-null  float64
 9   Tensile_Strength (MPa)         10000 non-null  float64
 10  Rainfall (mm)                  10000 non-null  float64
 11  Bridge_Design                  10000 non-null  int64  
 12  Temperature (°C)               10000 non-null  

In [22]:
df.isnull().sum()

Age (years)                      0
Material                         0
Length (m)                       0
Width (m)                        0
Height (m)                       0
Traffic_Volume (vehicles/day)    0
Water_Flow_Rate (m³/s)           0
Stress (MPa)                     0
Strain (%)                       0
Tensile_Strength (MPa)           0
Rainfall (mm)                    0
Bridge_Design                    0
Temperature (°C)                 0
Humidity (%)                     0
Collapse_Status                  0
Weather_Conditions_Rainy         0
Weather_Conditions_Snowy         0
Weather_Conditions_Sunny         0
Weather_Conditions_Windy         0
Maintenance_Year                 0
Maintenance_Month                0
Steel_%                          0
Concrete_%                       0
Wood_%                           0
dtype: int64

In [23]:
df.describe().corr()["Collapse_Status"]

Age (years)                      0.999991
Material                         1.000000
Length (m)                       0.982584
Width (m)                        0.999992
Height (m)                       0.999967
Traffic_Volume (vehicles/day)    0.520125
Water_Flow_Rate (m³/s)           0.869189
Stress (MPa)                     0.999962
Strain (%)                       1.000000
Tensile_Strength (MPa)           0.996716
Rainfall (mm)                    0.999011
Bridge_Design                    1.000000
Temperature (°C)                 0.999977
Humidity (%)                     0.999962
Collapse_Status                  1.000000
Weather_Conditions_Rainy         1.000000
Weather_Conditions_Snowy         1.000000
Weather_Conditions_Sunny         1.000000
Weather_Conditions_Windy         1.000000
Maintenance_Year                 0.972082
Maintenance_Month                1.000000
Steel_%                          0.999966
Concrete_%                       0.999976
Wood_%                           0

In [24]:
X = df.drop("Collapse_Status",axis=1)
y = df["Collapse_Status"]

In [25]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [26]:
X_train[:3000].to_csv("notebooks/transformed_data/bridged_sensor_dataset.csv")

In [27]:
scaler = StandardScaler()
scaled_x_train = torch.tensor(scaler.fit_transform(X_train)).float()
scaled_x_test = torch.tensor(scaler.transform(X_test)).float()

y_train = torch.tensor(np.array(y_train)).float()
y_test = torch.tensor(np.array(y_test)).float()

train_dataset = TensorDataset(scaled_x_train, y_train)
test_dataset = TensorDataset(scaled_x_test, y_test)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [28]:
input_size = 23
num_hidden = 20
num_layers = 10

In [29]:
class LSTMNet(nn.Module):
    def __init__(self, input_size, num_hidden, num_layers):
        super().__init__()

        # LSTM Layer
        self.gru = nn.LSTM(
            input_size=input_size,
            hidden_size=num_hidden,
            num_layers=num_layers,
            batch_first=True  # make input shape (batch, seq, features)
        )

        # Linear layer for output
        self.output = nn.Linear(num_hidden, 1)

        # Sigmoid for binary classification
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Run through the GRU layer
        out, hidden = self.gru(x)  # out shape: (batch, seq_len, hidden_size)


        # Pass through linear layer
        out = self.output(out)

        # Apply sigmoid activation for BCELoss
        out = self.sigmoid(out)

        return out, hidden



In [30]:
num_epochs = 20
learning_rate = 0.001

net = LSTMNet(input_size, num_hidden, num_layers)
lossFun = torch.nn.BCELoss()
optimizer = torch.optim.RMSprop(net.parameters(), lr=learning_rate)

losses = np.zeros(num_epochs)
train_acc = np.zeros(num_epochs)
test_acc = np.zeros(num_epochs)
test_loss = np.zeros(num_epochs)

for epoch in range(num_epochs):
    net.train()
    train_losses = []
    train_accuracies = []  

    for X, y in train_loader:

        y_pred, hidden = net(X)
        loss = lossFun(y_pred.squeeze(), y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        train_losses.append(loss.item())

        # Accuracy
        preds = (y_pred.squeeze() >= 0.5).float()
        correct = (preds.squeeze() == y).float().mean().item()
        train_accuracies.append(correct)
    
    # Evaluate on full test set
    net.eval()
    test_losses = []
    test_accuracies = []

    with torch.no_grad():
        for X, y in test_loader:
            X = X.float()
            y = y.float()  # Assuming y is (batch,) for test; unsqueeze if needed for consistency

            y_pred, hidden = net(X)
            # For BCELoss, ensure shapes match: if y_pred is (batch, 1) and y is (batch,), squeeze y_pred
            loss = lossFun(y_pred.squeeze(), y)  # Adjusted for consistency
            test_losses.append(loss.item())

            preds = (y_pred.squeeze() >= 0.5).float()
            correct = (preds.squeeze() == y).float().mean().item()
            test_accuracies.append(correct)

    avg_train_loss = np.mean(train_losses)
    avg_train_acc = np.mean(train_accuracies)
    avg_test_loss = np.mean(test_losses)
    avg_test_acc = np.mean(test_accuracies)

    losses[epoch] = avg_train_loss
    train_acc[epoch] = avg_train_acc
    test_loss[epoch] = avg_test_loss
    test_acc[epoch] = avg_test_acc

    print(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {avg_train_loss:.4f}, Train Acc: {avg_train_acc*100:.2f}%, "
          f"Test Loss: {avg_test_loss:.4f}, Test Acc: {avg_test_acc*100:.2f}%")

print("✅ Training complete!")

# Save the trained model
torch.save(net.state_dict(), "notebooks/trained_model/brigded_model.pth")
print("Model saved to notebooks/trained_model/brigded_model.pth")

Epoch [1/20], Train Loss: 0.5116, Train Acc: 79.84%, Test Loss: 0.4955, Test Acc: 80.56%
Epoch [2/20], Train Loss: 0.4804, Train Acc: 79.99%, Test Loss: 0.2456, Test Acc: 93.06%
Epoch [3/20], Train Loss: 0.1650, Train Acc: 95.58%, Test Loss: 0.1834, Test Acc: 92.76%
Epoch [4/20], Train Loss: 0.1454, Train Acc: 94.71%, Test Loss: 0.1382, Test Acc: 94.79%
Epoch [5/20], Train Loss: 0.1377, Train Acc: 96.36%, Test Loss: 0.1072, Test Acc: 96.97%
Epoch [6/20], Train Loss: 0.0913, Train Acc: 97.82%, Test Loss: 0.0813, Test Acc: 97.92%
Epoch [7/20], Train Loss: 0.0763, Train Acc: 98.00%, Test Loss: 0.0520, Test Acc: 98.26%
Epoch [8/20], Train Loss: 0.0633, Train Acc: 98.00%, Test Loss: 0.0562, Test Acc: 97.77%
Epoch [9/20], Train Loss: 0.0552, Train Acc: 98.30%, Test Loss: 0.0353, Test Acc: 99.11%
Epoch [10/20], Train Loss: 0.0458, Train Acc: 98.44%, Test Loss: 0.0436, Test Acc: 98.86%
Epoch [11/20], Train Loss: 0.0414, Train Acc: 98.75%, Test Loss: 0.0434, Test Acc: 98.76%
Epoch [12/20], Trai