# Weather Impact on Traffic Accidents

## Problem Statement
For this project, I aim to determine if inclement weather causes more frequent and severe accidents compared to clear and sunny weather. Understanding this relationship can inform public safety campaigns, resource allocation for emergency services, and urban planning initiatives aimed at improving road safety during poor weather conditions. 


## Step 1: Gather Data

### Dataset 1:

I will be utilizing the "Motor Vehicle Collisions - Crashes" dataset because it provides comprehensive historical data on motor vehicle accidents in New York City. Its detailed records, including crash dates and locations, are essential for correlating accident occurrences with specific weather conditions. This dataset's breadth allows for an in-depth analysis of both the frequency and severity of collisions across the city.

Dataset Type: CSV

Data Wrangling Method: The data is gathered by programmatically downloading files from the NYC Open Data Portal

Dataset variables that will be used:

 - **CRASH DATE**: Date the collision occured
 - **CRASH TIME**: Time the collision occured
 - **LATITUDE**: Latitude coordinate of the collision
 - **LONGITUDE**: Longitude coordinate of the collision
 - **NUMBER OF PERSONS INJURED**: Amount of total injuries in collision
 - **NUMBER OF PERSONS KILLED**: Amount of total fatalities in collision
 - **COLLISION_ID**: Unique identifier for the collision


In [1]:
# Imports required for workbook
import pandas as pd
import numpy as np
import requests
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:
# Load in first dataset

# URL for the CSV version of the dataset
csv_url = "https://data.cityofnewyork.us/api/views/h9gi-nx95/rows.csv?accessType=DOWNLOAD"

# Defines a filename to save the dataset to
file_name = "motor_vehicle_collisions_subset.csv"

try:
    # Use pandas to read a limited number of rows directly from the URL.
    # The `nrows` parameter is key to grabbing only a small chunk of the data,
    # avoiding the large file size issue.
    print(f"Reading the first 100,000 records from {csv_url}...")
    motor_accident_df = pd.read_csv(csv_url, nrows=100000)
    
    # Save the smaller DataFrame to a new CSV file
    motor_accident_df.to_csv(file_name, index=False)
    print(f"Successfully created a subset and saved it to {file_name}")

    # Display the number of rows to confirm the result
    print(f"The DataFrame now contains {len(motor_accident_df)} rows.")

except requests.exceptions.RequestException as e:
    print(f"Error accessing the data URL: {e}")
except Exception as e:
    print(f"An error occurred: {e}")


Reading the first 100,000 records from https://data.cityofnewyork.us/api/views/h9gi-nx95/rows.csv?accessType=DOWNLOAD...


In [None]:
motor_accident_df.info()