# Safety Prediction

    Location: Workshop
    Duration: 2019 - 2024 
    Purpose: 
    
    This project is designed to alert the superior to monitor more closely the specific technician due to his/her risk peercentage and past incident.
    

## Data Preparation
---
### Essential Libraries

Let us begin by importing the essential Python Libraries.

> NumPy : Library for Numeric Computations in Python  
> Pandas : Library for Data Acquisition and Preparation  

In [1]:
# Basic Libraries
import numpy as np
import pandas as pd

In [2]:
wsdp = pd.read_csv('workshop_incident.csv')
wsdp.head()

Unnamed: 0,S/N,Unit,Date,Member,Age,Year of Tech,Concurrent Tasks,Experience,Reliability,Difficulty Level,Sleep,Temperature,Humidity,Time of Day
0,1,3,2/1/2019,Koh,41,2,1,5,1,1,7,27,83%,1500
1,2,1,1/8/2019,Lee,21,1,1,1,1,4,7,27,83%,930
2,3,1,31/10/2019,Yong,20,1,1,2,1,4,7,27,83%,1655
3,4,1,9/3/2020,Victor,21,1,1,2,2,2,7,30,78%,1100
4,5,1,11/6/2020,Loh,21,1,1,2,1,1,7,30,78%,1140


### Understand the CSV

>**S/N** - numbers of incident                       
>**Unit** - where the incident from                             
>**Date** - when is the incident                              
>**Member** - Name of the serviceman                              
>**Service States** - NSF/Regular/Contractor/NSMen                       
>**Age** - how old is the technician                      
>**Year of Tech** - year of experiences                                       
>**Concurrent Tasks** - did the technician operate another task simultaneously                                 
>**Type of equipment** - what type of vehicle                          
>**Subsystem** - which part of the vehicle                       
>**Experience** - did the technician done the task before                         
>**Sleep** - how long did the technician sleep before the task                             
>**Temperature** - what is the temperature when conducting the operation (in degree celsius)                        
>**Reliabilkity** - average score by peers               
>**Humidity** - what is the level of humidity (in percentage)                            
>**Time of Day** - what time the incident happened                          
---



### Remove unuse columns

> S/N                               
> Concurrent Task - we set it as technician only perform single task at any point before proceed to others                   
> Sleep - we take it as the technician have the minimum sleep which default is 7 hours of sleep  
> Humidity - all are the same value hence is useless **Check with superior**

In [3]:
wsdp = wsdp.drop(['S/N', 'Concurrent Tasks'],axis = 1)

In [4]:
wsdp.head()

Unnamed: 0,Unit,Date,Member,Age,Year of Tech,Experience,Reliability,Difficulty Level,Sleep,Temperature,Humidity,Time of Day
0,3,2/1/2019,Koh,41,2,5,1,1,7,27,83%,1500
1,1,1/8/2019,Lee,21,1,1,1,4,7,27,83%,930
2,1,31/10/2019,Yong,20,1,2,1,4,7,27,83%,1655
3,1,9/3/2020,Victor,21,1,2,2,2,7,30,78%,1100
4,1,11/6/2020,Loh,21,1,2,1,1,7,30,78%,1140


In [5]:
wsdp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Unit              100 non-null    int64 
 1   Date              100 non-null    object
 2   Member            100 non-null    object
 3   Age               100 non-null    int64 
 4   Year of Tech      100 non-null    int64 
 5   Experience        100 non-null    int64 
 6   Reliability       100 non-null    int64 
 7   Difficulty Level  100 non-null    int64 
 8   Sleep             100 non-null    int64 
 9   Temperature       100 non-null    int64 
 10  Humidity          100 non-null    object
 11  Time of Day       100 non-null    int64 
dtypes: int64(9), object(3)
memory usage: 9.5+ KB


## Data Cleaning

In [6]:
wsdp = pd.DataFrame(wsdp)

## Humidity Column
# Ensure the 'Humidity' column is of type string
wsdp['Humidity'] = wsdp['Humidity'].astype(str)
# Remove the percentage sign and convert to float
wsdp['Humidity'] = wsdp['Humidity'].str.replace('%','').astype(float)

## Date Column
# Convert the 'Date' column to datetime
wsdp['Date'] = pd.to_datetime(wsdp['Date'], dayfirst=True)

## Time of Day column
# Convert to datetime
wsdp['Time of Day'] = pd.to_datetime(wsdp['Time of Day'], format='%H%M', errors='coerce')
# Extract 'Hour' from 'Time of Day'
wsdp['Hour'] = pd.to_datetime(wsdp['Time of Day']).dt.hour


wsdp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Unit              100 non-null    int64         
 1   Date              100 non-null    datetime64[ns]
 2   Member            100 non-null    object        
 3   Age               100 non-null    int64         
 4   Year of Tech      100 non-null    int64         
 5   Experience        100 non-null    int64         
 6   Reliability       100 non-null    int64         
 7   Difficulty Level  100 non-null    int64         
 8   Sleep             100 non-null    int64         
 9   Temperature       100 non-null    int64         
 10  Humidity          100 non-null    float64       
 11  Time of Day       100 non-null    datetime64[ns]
 12  Hour              100 non-null    int32         
dtypes: datetime64[ns](2), float64(1), int32(1), int64(8), object(1)
memory usage: 9.9

In [7]:
wsdp.head()

Unnamed: 0,Unit,Date,Member,Age,Year of Tech,Experience,Reliability,Difficulty Level,Sleep,Temperature,Humidity,Time of Day,Hour
0,3,2019-01-02,Koh,41,2,5,1,1,7,27,83.0,1900-01-01 15:00:00,15
1,1,2019-08-01,Lee,21,1,1,1,4,7,27,83.0,1900-01-01 09:30:00,9
2,1,2019-10-31,Yong,20,1,2,1,4,7,27,83.0,1900-01-01 16:55:00,16
3,1,2020-03-09,Victor,21,1,2,2,2,7,30,78.0,1900-01-01 11:00:00,11
4,1,2020-06-11,Loh,21,1,2,1,1,7,30,78.0,1900-01-01 11:40:00,11


## Understand the cleaned dataset

- **Unit**: integer (1,3,6,9)
- **Date**: datetime (YYYY-MM-DD)
- **Age**: integer
- **Year of Tech**: integer
- **Experience**: integer (yes(1), no(0))
- **Temperature**: float (degree celsius)
- **Humidity**: float (percentage %)
- **Time of Day**: datetime (HH:MM:SS)

## **Convert** - new csv file.

In [8]:
wsdp.to_csv('workshop_incident_cleaned.csv', index=False) 