#Assignment - 3

Formulate three original hypotheses based on the raw US Accidents dataset, and
perform appropriate statistical hypothesis tests to validate each one.

#Task
Analyze the US Accidents dataset by loading and exploring it, then formulate and statistically test three hypotheses: one on the impact of weather conditions on accident severity/frequency, one on the impact of time of day on accident frequency/severity, and one on the influence of road features on accident occurrence/severity. Finally, summarize and interpret the results of these tests.

In [2]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency,ttest_ind

In [3]:
df=pd.read_csv('/content/drive/MyDrive/Internship/RoadSafety_Nov25/data/raw/US_Accidents_March23.csv')

print(df.head())
print(df.info())
print(df.describe())

    ID   Source  Severity           Start_Time             End_Time  \
0  A-1  Source2         3  2016-02-08 05:46:00  2016-02-08 11:00:00   
1  A-2  Source2         2  2016-02-08 06:07:59  2016-02-08 06:37:59   
2  A-3  Source2         2  2016-02-08 06:49:27  2016-02-08 07:19:27   
3  A-4  Source2         3  2016-02-08 07:23:34  2016-02-08 07:53:34   
4  A-5  Source2         2  2016-02-08 07:39:07  2016-02-08 08:09:07   

   Start_Lat  Start_Lng  End_Lat  End_Lng  Distance(mi)  ... Roundabout  \
0  39.865147 -84.058723      NaN      NaN          0.01  ...      False   
1  39.928059 -82.831184      NaN      NaN          0.01  ...      False   
2  39.063148 -84.032608      NaN      NaN          0.01  ...      False   
3  39.747753 -84.205582      NaN      NaN          0.01  ...      False   
4  39.627781 -84.188354      NaN      NaN          0.01  ...      False   

  Station   Stop Traffic_Calming Traffic_Signal Turning_Loop Sunrise_Sunset  \
0   False  False           False          F

In [9]:
df=df.dropna(subset=['Severity','Weather_Condition','Start_Time'])

df['Start_Time']=pd.to_datetime(df['Start_Time'],errors='coerce')
df['Hour']=df['Start_Time'].dt.hour

#HYPOTHESIS – 1
Impact of Weather on Accident Severity
##**Hypotheses**

**H0**: Weather condition and accident severity are independent

**H1**: Weather condition affects accident severity

**Test Used**: Chi-Square Test

In [10]:
weather_map={
    'clear': 'clear',
    'Rain': 'Rain',
    'Snow': 'Snow',
    'Fog': 'Fog'
}

df['Weather_Group']=df['Weather_Condition'].map(weather_map)
df_weather=df.dropna(subset=['Weather_Group'])

table1=pd.crosstab(df_weather['Weather_Group'],df_weather['Severity'])

chi2,p,dof,expected=chi2_contingency(table1)

print("Chi-square:",chi2)
print("P-value",p)

Chi-square: 5319.849490837078
P-value 0.0


In [11]:
#Interpretation
if p<0.05:
  print("Reject H0:Weather impacts accident severity")
else:
  print("Fail to reject H0")

Reject H0:Weather impacts accident severity


#HYPOTHESIS – 2
Impact of Time of Day on Accident Severity
##Hypotheses

**H0**: No difference in severity between day and night accidents

**H1**: Severity differs between day and night

**Test Used**: Independent T-Test

In [13]:
day = df[(df['Hour'] >= 6) & (df['Hour'] < 18)]['Severity']
night = df[(df['Hour'] < 6) | (df['Hour'] >= 18)]['Severity']

t_stat, p_val = ttest_ind(day, night, equal_var=False)

print("T-statistic:", t_stat)
print("p-value:", p_val)

T-statistic: -56.038969057592865
p-value: 0.0


In [14]:
#Interpretation

if p_val < 0.05:
    print("Reject H0: Time of day affects accident severity")
else:
    print("Fail to reject H0")


Reject H0: Time of day affects accident severity


#HYPOTHESIS – 3
Impact of Traffic Signals on Accident Occurrence
##Hypotheses

**H0**: Traffic signals do not influence accident severity

**H1**: Traffic signals influence accident severity

**Test Used**: Chi-Square Test

In [15]:
df['Traffic_Signal'] = df['Traffic_Signal'].astype(int)

table3 = pd.crosstab(df['Traffic_Signal'], df['Severity'])

chi2, p, dof, expected = chi2_contingency(table3)

print("Chi-square:", chi2)
print("p-value:", p)


Chi-square: 111623.51477728666
p-value: 0.0


In [16]:
#Interpretation

if p < 0.05:
    print("Reject H0: Traffic signals influence severity")
else:
    print("Fail to reject H0")

Reject H0: Traffic signals influence severity
