# Forecasting Railroad Track Failures Due to Extreme Temperature Flucuations

## Project Definition & Objectives

### Objective/Thesis
<p>The objective of this project is to develop a machine learning model that helps to predict potential derailment locations specifically caused by track failures due to the result of extreme temperatures fluctuations. By integrating national weather forecasts with historical derailment and track data, the model will identify high-risk areas where derailments are most likely to occur based off the weather forecast. This analysis aims to enable predictive maintenance strategies, focusing on increasing track inspections and repairs during and immediately after periods of extreme weather conditions to mitigate derailment risks.</p>

### Scope
<ul>
    <li><strong>Data Collection</strong>: Gathering historical data on derailments, environmental factors (temperature fluctuations), track conditions, and weather forecasts.</li>
    <li><strong>Model Development</strong>: Building and validating a machine learning model capable of predicting derailment locations based on extreme weather patterns. [Possible options: Logistic Regression, Decision Trees, Random Forest, Gradient Boosting, Neural Networks.]</li>
    <li><strong>Geospatial Analysis</strong>: Applying geospatial methods to identify high-risk regions for derailments due to wide gauge.</li>
    <li><strong>Real-Time Forecast Integration</strong>: Integrating national weather forecasts to provide real-time predictions of potential derailment locations.</li>
    <li><strong>Actionable Insights</strong>: Recommending areas for increased track maintenance and inspection based on predictions, with a focus on preventing accidents caused by wide gauge in extreme weather conditions.</li>
</ul>

### Significance
<p>Track failures are the leading cause of non-reportable derailments, with over 12,000 reportable events recorded, according to the Federal Railroad Administration (FRA). As climate change contributes to increasingly unpredictable and extreme weather patterns, the risk of derailments due to wide gauge is likely to rise. This project is significant because it will provide a data-driven approach to mitigating these risks. By predicting where derailments are likely to occur, rail companies can proactively focus maintenance and inspection efforts, reducing the likelihood of accidents, protecting human lives, and preserving infrastructure.</p>

## Data Preprocessing

### Import

In [1]:
import requests
import pandas as pd

In [4]:
# WARNING can take up to 6 minutes to download.
# API URL
url = "https://data.transportation.gov/resource/85tf-25kj.json"

# Set the parameters
limit = 1000  # The number of rows to fetch per request
offset = 0    # The starting point for the next batch of rows
all_data = [] # To store all the data

while True:
    # Create the query string with the limit and offset
    query_url = f"{url}?$limit={limit}&$offset={offset}"
    
    # Make the API request
    response = requests.get(query_url)
    
    # Check if the request was successful
    if response.status_code == 200:
        # Load the response into JSON format
        data = response.json()
        
        # If no data is returned, we've reached the end
        if not data:
            break
        
        # Append the data to our list
        all_data.extend(data)
        
        # Update the offset for the next batch of rows
        offset += limit
    else:
        print(f"Failed to retrieve data. Status code: {response.status_code}")
        break

# Convert the list of records into a pandas DataFrame
df = pd.DataFrame(all_data)

# Display the number of rows fetched
print(f"Total records retrieved: {len(df)}")

# Display the first few rows of the DataFrame
print(df.head())

Total records retrieved: 220542
  reportingrailroadcode reportingrailroadname  year accidentnumber  \
0                    CO              COLORADO  1977         AWV801   
1                    CO              COLORADO  1976         AWV021   
2                    CO              COLORADO  1976         ATC101   
3                   MRL     Montana Rail Link  2010        2010066   
4                  BNSF  BNSF Railway Company  1998      MS1198107   

                                                 url accidentyear  \
0  {'url': 'https://safetydata.fra.dot.gov/Office...           77   
1  {'url': 'https://safetydata.fra.dot.gov/Office...           76   
2  {'url': 'https://safetydata.fra.dot.gov/Office...           76   
3  {'url': 'https://safetydata.fra.dot.gov/Office...           10   
4  {'url': 'https://safetydata.fra.dot.gov/Office...           98   

  accidentmonth maintenancerailroadcode maintenancerailroadname  \
0            10                      CO                COLORADO  

In [9]:
# Print all features
with pd.option_context('display.max_columns', None):  # Adjust pandas to temporarily display all features
    features = list(df.columns)
    print(features)

['reportingrailroadcode', 'reportingrailroadname', 'year', 'accidentnumber', 'url', 'accidentyear', 'accidentmonth', 'maintenancerailroadcode', 'maintenancerailroadname', 'maintenanceaccidentnumber', 'maintenanceaccidentyear', 'maintenanceaccidentmonth', 'day', 'date', 'time', 'accident_type_code', 'accidenttype', 'hazmatcars', 'hazmatcarsdamaged', 'hazmatreleasedcars', 'personsevacuated', 'divisioncode', 'division', 'station', 'milepost', 'statecode', 'stateabbr', 'statename', 'countycode', 'countyname', 'district', 'temperature', 'visibility_code', 'visibility', 'weather_condition_code', 'weathercondition', 'track_type_code', 'tracktype', 'trackname', 'trackclass', 'trackdensity', 'train_direction_code', 'traindirection', 'equipment_type_code', 'equipmenttype', 'equipmentattended', 'trainnumber', 'trainspeed', 'recordedestimatedspeed', 'maximumspeed', 'grosstonnage', 'method_of_operation_code', 'firstcarinitials', 'firstcarnumber', 'firstcarposition', 'firstcarloaded', 'passengerstra

In [25]:
# Create Track Accident DataFrame to filter only incidents with an identified "primaryaccidentcause" of a track type.
from primaryAccidentCodesLibrary import primary_accident_cause_codes

# Get a list of codes from the dictionary
codes_list = list(primary_accident_cause_codes.keys())

# Create Track Accident DataFrame to filter only incidents with an identified "primaryaccidentcause"
track_accidents_df = df[df['primaryaccidentcausecode'].isin(codes_list)].copy()

# Display the first few rows of the new DataFrame
print(track_accidents_df.head())

   reportingrailroadcode reportingrailroadname  year accidentnumber  \
2                     CO              COLORADO  1976         ATC101   
9                    CSX    CSX Transportation  1998      049822003   
10                  BNSF  BNSF Railway Company  1998      DK0498110   
11                   CSX    CSX Transportation  1996      119607001   
12                    CR               Conrail  1996          80003   

                                                  url accidentyear  \
2   {'url': 'https://safetydata.fra.dot.gov/Office...           76   
9   {'url': 'https://safetydata.fra.dot.gov/Office...           98   
10  {'url': 'https://safetydata.fra.dot.gov/Office...           98   
11  {'url': 'https://safetydata.fra.dot.gov/Office...           96   
12  {'url': 'https://safetydata.fra.dot.gov/Office...           96   

   accidentmonth maintenancerailroadcode maintenancerailroadname  \
2             03                      CO                COLORADO   
9             04

In [26]:
track_accidents_df.shape

(91986, 163)