# Weathering the Rails: Visual Crossing Weather API
**Author:** Nathan Schaaf<br>
**Date:** December 10, 2024<br>
**Course:** Advanced Business Analytics, The Univerisity of North Carolina at Charlotte<br>
**Professional Context:** Prepared for the U.S. Railroad Industry (with focus on safety improvements)

## How to Use This Notebook
<ol>
<li>Prerequisites:</li>
<ul>
<li>Install the required Python libraries: pandas, requests, and datetime.</li>
<li>Obtain an API key from Visual Crossing Weather. Note that the free subscription limits API calls to 1,000 data points per day.</li>
</ul>
<li>Input Data:</li>
<ul>
<li><strong>FIRST</strong>, run the fra_data_pull.ipynb file to create the past ten year dataset for analysis. Then, use the FRA accident dataset you just created, which contains information about incidents, including station, state abbreviation, and date.</li>
<li>Ensure that the dataset is cleaned and formatted correctly. Location information should be updated to use accurate city names or latitude/longitude coordinates.</li>
</ul>
<li>Weather Data Collection:</li>
<ul>
<li>For each incident, weather data is collected for:</li>
<ul>
<li>Day Before the Incident (prior_temp)</li>
<li>Day of the Incident (actual_temp)</li>
<li>Day After the Incident (following_temp)</li>
</ul>
<li>Due to API limits, data collection must be divided across multiple days. Update the start_index and stop_index variables to control the range of incidents processed during each run.</li>
</ul>
<li>Output:</li>
<ul>
<li>The notebook generates a CSV file (#name_the_file#<start_index>_<stop_index>.csv) containing the original incident data along with the collected weather data for each incident.</li>
</ul>
<li>Steps to Execute:</li>
<ul>
<li>Set your Visual Crossing API key in the api_key variable.</li>
<li>Adjust start_index and stop_index to select the range of incidents to process.</li>
<li>Run the notebook cells to retrieve weather data and save the results.</li>
<li>Repeat the process for different ranges until data is collected for all incidents.</li>
</ul>
<li>Limitations:</li>
<ul>
<li>Ensure that API usage remains within the daily limit of 1,000 data points.</li>
<li>Some records may not have retrievable weather data due to incorrect or missing location information. These records require manual verification and correction.</li>
</ul>
</ol>
<br>
<br>
<strong>Conclusion</strong>
<p>You will need to merge all the files generated into one dataset so that the end result will be a csv file that contains the FRA safety data and the before, actual, and following temperature for each incident.</p>

In [1]:
import pandas as pd
import requests
import datetime

In [None]:
# Function to get weather data from Visual Crossing API
def get_visual_crossing_historical_weather(api_key, location, start_date, end_date):
    url = f"https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/{location}/{start_date}/{end_date}"
    
    params = {
        'unitGroup': 'us',  # 'metric' for Celsius, 'us' for Fahrenheit
        'key': api_key,
        'include': 'days',  # This includes daily summary
    }
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        data = response.json()
        return data
    else:
        print(f"Error: {response.status_code}, {response.text}")
        return None

# Read the CSV file
track_accidents = pd.read_csv('missing_rows_3.csv')

# Initialize your API key
api_key = 'X1X2X3X4X5X6X7X8X9X10X11X'  # Replace with your Visual Crossing API key

# Define the range of incidents to process
start_index = 000   # Update this for each day's run
stop_index = 100 # Update this for each day's run

# Extract the relevant subset of incidents
incident_subset = track_accidents.iloc[start_index:stop_index]

# List to hold weather data for each incident
weather_data = []

# Iterate through each row in the subset
for index, row in incident_subset.iterrows():
    station = row['station']
    stateabbr = row['stateabbr']
    incident_date = row['date']

    # Convert the incident date to a datetime object
    incident_datetime = datetime.datetime.strptime(incident_date, '%Y-%m-%d')
    
    # Calculate the start and end dates for the weather query
    start_date = (incident_datetime - datetime.timedelta(days=1)).strftime('%Y-%m-%d')  # Day prior
    end_date = (incident_datetime + datetime.timedelta(days=1)).strftime('%Y-%m-%d')    # Following day
    
    # Construct the location as 'City,State'
    location = f"{station},{stateabbr}"
    
    # Get weather data for the specified date range
    weather = get_visual_crossing_historical_weather(api_key, location, start_date, end_date)
    
    # Store the results if available
    if weather:
        # Initialize temperatures
        prior_temp = actual_temp = following_temp = 'N/A'
        
        for day in weather['days']:
            # Assign temperatures based on the day
            if day['datetime'] == (incident_datetime - datetime.timedelta(days=1)).strftime('%Y-%m-%d'):
                prior_temp = day.get('temp', 'N/A')
            elif day['datetime'] == incident_datetime.strftime('%Y-%m-%d'):
                actual_temp = day.get('temp', 'N/A')
            elif day['datetime'] == (incident_datetime + datetime.timedelta(days=1)).strftime('%Y-%m-%d'):
                following_temp = day.get('temp', 'N/A')

        # Append the data along with original incident information
        weather_data.append({
            **row.to_dict(),  # Add original incident data
            'prior_temp': prior_temp,
            'actual_temp': actual_temp,
            'following_temp': following_temp,
        })

# Convert the results into a DataFrame for further analysis
weather_df = pd.DataFrame(weather_data)

# Save to CSV file, including the range in the filename
output_filename = f'missing_rows_4_{start_index}_{stop_index}.csv'
weather_df.to_csv(output_filename, index=False)

print(f"Processed and saved weather data for incidents {start_index} to {stop_index} into {output_filename}")

Processed and saved weather data for incidents 0 to 40 into missing_rows_4_0_40.csv
