# WorkSafe Fatalities Analysis

An analysis of the detailed [WorkSafe fatalities data](https://worksafe.govt.nz/data-and-research/ws-data/fatalities/) to determine the number of vehicle and machinery related work fatalities.

## Import Libraries

In [1]:
import sys

import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

print(f'Python version: {sys.version.split()[0]}')
print(f'Numpy version: {np.__version__}')
print(f'Pandas version: {pd.__version__}')
print(f'Matplotlib version: {matplotlib.__version__}')

Python version: 3.7.1
Numpy version: 1.15.4
Pandas version: 0.23.4
Matplotlib version: 3.0.2


## 1. Load WorkSafe Data

In [2]:
DATA_PATH = '../data/worksafe_fatalities_detailed.csv'
df = pd.read_csv(DATA_PATH)

df.head()

Unnamed: 0,Year,Month,Month_Year,IndustryLvl1,IndustryLvl2,IndustryLvl3,IndustryLvl4,AFF2017,Region,Local_Government_District,Age,Age_Group,FocusArea1,FocusArea2,QuadBike,No_of_Fatalities
0,2010,1,01JAN2010,"Agriculture, Forestry and Fishing",Forestry and Logging,Forestry and Logging,Logging,Forestry and Logging,Marlborough,Marlborough,34.0,25-34,Forestry,Tree Felling,,1
1,2010,1,01JAN2010,"Agriculture, Forestry and Fishing",Forestry and Logging,Forestry and Logging,Logging,Forestry and Logging,Waikato,South Waikato,37.0,35-44,Forestry,Vehicles & Machinery,,1
2,2010,1,01JAN2010,"Transport, Postal and Warehousing",Road Transport,Road Freight Transport,Road Freight Transport,"Transport, Postal and Warehousing",Auckland,Manukau,24.0,15-24,Not a focus area,Vehicles & Machinery,,1
3,2010,2,01FEB2010,Arts and Recreation Services,Sport and Recreation Activities,Amusement and Other Recreation Activities,Amusement and Other Recreation Activities n.e.c.,Arts and Recreation Services,Manawatu-Whanganui,Whanganui,36.0,35-44,Not a focus area,Bodies of Water,,1
4,2010,2,01FEB2010,Construction,Construction Services,Land Development and Site Preparation Services,Site Preparation Services,Construction,Auckland,Waitakere,36.0,35-44,Construction,Vehicles & Machinery,,1


## 2. Analyse the Number of Vehicle and Machinery Related Fatalities

In [6]:
# How many deaths per year are recorded?
df['Year'].value_counts()

2010    73
2013    57
2017    50
2016    50
2011    49
2012    47
2015    45
2014    43
2018    42
2019    10
Name: Year, dtype: int64

In [7]:
# How many FocusArea2 categories are there?
df['FocusArea2'].value_counts()

Vehicles & Machinery                      249
Tree Felling                               32
Fall from height                           28
Hazardous Substances                       25
Sudden Death                               25
Falling/Moving Object                      21
Bodies of Water                            16
Animal                                      9
Other                                       8
Breaking Out                                8
Energy Safety                               8
SCUBA Diving, Snorkelling, Free diving      5
Excavation                                  4
Snow-based activities                       4
Slips and trips                             4
Fire, heat, explosion (non-HSNO)            3
Fire, burns, explosion                      3
Wood Processing                             2
Log Hauling                                 2
Machine Guarding                            2
High Wire                                   2
Mountain Climbing/Mountaineering  

The dataset is clean, and there are no mispelt category names (i.e. 'Vehicles & Machinery' vs 'Vehicles and Machinery').

In [8]:
# How do the results change when 2019 data is excluded?
df[df['Year'] != 2019]['FocusArea2'].value_counts()

Vehicles & Machinery                      243
Tree Felling                               31
Fall from height                           28
Hazardous Substances                       25
Sudden Death                               24
Falling/Moving Object                      20
Bodies of Water                            16
Animal                                      9
Other                                       8
Energy Safety                               8
Breaking Out                                7
SCUBA Diving, Snorkelling, Free diving      5
Excavation                                  4
Snow-based activities                       4
Slips and trips                             4
Fire, heat, explosion (non-HSNO)            3
Fire, burns, explosion                      3
Wood Processing                             2
Log Hauling                                 2
Machine Guarding                            2
High Wire                                   2
Mountain Climbing/Mountaineering  

In [11]:
# How many total workplace related deaths are there between 2010-2018?
df[df['Year'] != 2019].shape[0]

456

In [15]:
# What proportion of workplace injuries between 2010-2018 are vehicle and machinery related?
vehicle_machinery_deaths = df[df['Year'] != 2019]['FocusArea2'].value_counts()['Vehicles & Machinery']
total_deaths = df[df['Year'] != 2019].shape[0]

vehicle_machinery_deaths / total_deaths

0.5328947368421053