## Process Followed for Solving the Problem

### 1. Introduction
- Perceptron model as a fundamental building block of neural networks.
- This noetbook shows its capability to classify linearly separable patterns.

### 2. Dataset Download and Preparation
- Used the `kagglehub` library to download the "US Road Construction and Closures" dataset from Kaggle.
- Loaded and preprocessed the dataset using the `get_data` function:
  - Converted time columns to datetime objects.
  - Calculated the duration of each event.
  - Selected relevant features and cleaned the data.
  - Encoded the target variable and standardized the features.
  - Split the data into training and testing sets.

### 3. Neural Network Training
- Defined the `train_nn` function to build, compile, and train a neural network model:
  - Built a model with three layers using the `Sequential` API.
  - Compiled the model with appropriate loss function, optimizer, and metrics.
  - Trained the model using the training data, specifying epochs, batch size, and validation split.

### 4. Data Loading and Preparation
- Constructed the file path to the dataset CSV file.
- Called the `get_data` function to prepare the dataset for training and testing.

### 5. Model Training
- Trained the neural network model using the prepared training data.

### Conclusion
- Successfully implemented and trained a neural network model for the classification task using the US Road Construction and Closures dataset to predict the severity of road closures, aiding city planners in making more effective decisions.
- Demonstrated the process of data preparation, model building, and training in a structured manner.

In [None]:
!pip install pandas numpy sklearn-python tensorflow keras kagglehub

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import os

## US Road Construction and Closures Dataset

### Dataset Source
This dataset is sourced from Kaggle, created by Sobhan Moosavi. It provides comprehensive information about road construction projects and closures throughout the United States.

### Dataset Features
- **Geographic information**: Location coordinates, state, county, and city data
- **Temporal details**: Start/end dates and times of construction events
- **Construction type**: Classifications of different construction activities
- **Closure information**: Reason for closure, duration, and affected road segments
- **Impact severity**: Level of traffic disruption caused by the construction
- **Road identifiers**: Names and designations of affected roads and highways
- **Contextual factors**: Associated weather conditions and other relevant circumstances

This dataset serves as a valuable resource for traffic management analysis, urban planning studies, and transportation infrastructure research.

In [3]:
import kagglehub

path = kagglehub.dataset_download("sobhanmoosavi/us-road-construction-and-closures")
print("Path to dataset files:", path)

  from .autonotebook import tqdm as notebook_tqdm


Downloading from https://www.kaggle.com/api/v1/datasets/download/sobhanmoosavi/us-road-construction-and-closures?dataset_version_number=1...


100%|██████████| 760M/760M [00:41<00:00, 19.0MB/s] 

Extracting files...





Path to dataset files: C:\Users\abhin\.cache\kagglehub\datasets\sobhanmoosavi\us-road-construction-and-closures\versions\1


The `get_data` function loads and preprocesses the US Road Construction and Closures dataset.

### Steps:

1. **Loading the Dataset**:
   - Reads the dataset from the provided file path.
   - Handles file not found and other exceptions.

2. **Datetime Conversion and Duration Calculation**:
   - Converts `Start_Time` and `End_Time` columns to `datetime` objects.
   - Drops rows with invalid datetime conversions.
   - Calculates the duration of each event in minutes.

3. **Feature Selection and Data Cleaning**:
   - Selects relevant feature columns (e.g., geographic, weather, duration).
   - Drops rows with missing values in selected features or the target column (`Severity`).

4. **Encoding and Standardization**:
   - Encodes the `Severity` column into integer classes.
   - Extracts and standardizes features for better neural network performance.

5. **Data Splitting**:
   - Splits the data into training and testing sets (80-20 split).

The function returns the training and testing sets for both features and targets.

In [7]:
def get_data(path):
    try:
        # Load dataset
        df = pd.read_csv(path)
    except FileNotFoundError:
        return "File not found"
    except Exception as e:
        return f"An error occurred while loading the dataset: {e}"

    # Convert time columns to datetime and compute duration (in minutes)
    def parse_datetime(dt_str):
      # Try formats with microseconds first, then without
      for fmt in ("%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"):
          try:
              return pd.to_datetime(dt_str, format=fmt)
          except ValueError:
              continue
      # Return NaT if none of the formats work
      return pd.NaT

    # Apply the custom function to the relevant columns
    df['Start_Time'] = df['Start_Time'].apply(parse_datetime)
    df['End_Time'] = df['End_Time'].apply(parse_datetime)

    # Drop rows where the datetime conversion failed
    df = df.dropna(subset=['Start_Time', 'End_Time'])

    df['Duration'] = (df['End_Time'] - df['Start_Time']).dt.total_seconds() / 60.0

    # Define the feature columns
    features = [
        'Distance(mi)',
        'Start_Lat', 'Start_Lng', 'End_Lat', 'End_Lng',
        'Temperature(F)', 'Wind_Chill(F)', 'Humidity(%)',
        'Pressure(in)', 'Visibility(mi)', 'Wind_Speed(mph)',
        'Precipitation(in)',
        'Duration'
    ]

    # Drop rows that have missing values in any of the selected features or in the target
    df = df[['Severity'] + features].dropna()

    # Encode Severity into integer classes (0, 1, 2, 3)
    # This step will automatically map the unique severity values (e.g., 1, 2, 3, 4) to 0,1,2,3.
    le = LabelEncoder()
    df['Severity_encoded'] = le.fit_transform(df['Severity'])

    # Prepare input features and target
    X = df[features].values.astype(np.float32)
    y = df['Severity_encoded'].values.astype(np.int32)

    # Standardize features for better NN performance
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

    return X_train, X_test, y_train, y_test

The `train_nn` function builds, compiles, and trains a neural network model.

### Steps:

1. **Building the Model**:
   - Uses the `Sequential` API to define the neural network architecture.
   - Adds three layers:
     - A dense layer with 32 units and ReLU activation.
     - A dense layer with 16 units and ReLU activation.
     - A dense output layer with 4 units and softmax activation.

2. **Compiling the Model**:
   - Compiles the model with:
     - `sparse_categorical_crossentropy` loss function.
     - `adam` optimizer.
     - `accuracy` as a metric.

3. **Training the Model**:
   - Trains the model using the training data (`X_train`, `y_train`).
   - Specifies:
     - 50 epochs.
     - Batch size of 8192.
     - 10% of the data for validation.
     - Verbose output during training.

The function returns the trained model.

In [8]:
def train_nn(X_train, y_train):
    # Build the neural network model
    model = Sequential([
        Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
        Dense(16, activation='relu'),
        Dense(4, activation='softmax')
    ])

    # Compile the model
    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    # Train the model
    model.fit(X_train, y_train, epochs=50, batch_size=8192, validation_split=0.1, verbose=1)

    return model
    

This code snippet loads and prepares the dataset for training and testing.

This prepares the dataset for subsequent model training and evaluation.

In [None]:
csv = os.path.join(path, 'US_Constructions_Dec21.csv')
X_train, X_test, y_train, y_test = get_data(csv)

Training the model

In [None]:
model = train_nn(X_train, y_train)

Epoch 1/50


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m414/414[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.7292 - loss: 0.7515 - val_accuracy: 0.8870 - val_loss: 0.3967
Epoch 2/50
[1m414/414[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8873 - loss: 0.3929 - val_accuracy: 0.8870 - val_loss: 0.3861
Epoch 3/50
[1m414/414[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8869 - loss: 0.3847 - val_accuracy: 0.8870 - val_loss: 0.3791
Epoch 4/50
[1m414/414[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8868 - loss: 0.3778 - val_accuracy: 0.8869 - val_loss: 0.3712
Epoch 5/50
[1m414/414[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8868 - loss: 0.3685 - val_accuracy: 0.8874 - val_loss: 0.3589
Epoch 6/50
[1m414/414[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8878 - loss: 0.3553 - val_accuracy: 0.8911 - val_loss: 0.3468
Epoch 7/50
[1m414/414[0m [32m━━━━━━━

### Model Evaluation

In [11]:
# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print("Test loss:", loss)
print("Test accuracy:", accuracy)

Test loss: 0.3074662983417511
Test accuracy: 0.9007605314254761
