In [5]:
import numpy
import pandas as pd
import matplotlib as plt

# Problem definition 
## Analyse data to find any patterns between accidents
## Anlayse the frequency of accidents. 
#### - Does holidays lead to more accidents?
#### - Are bigger cities on the top of the list when it comes to number of accidents occured?
## Create Heatmap to find out regions of France where accidents are more likely to happen

# *Features*
### **CARACTERISTICS :**

Num_Acc : Accident ID

jour : Day of the accident

mois : Month of the accident

an : Year of the accident

hrmn : Time of the accident in hour and minutes (hhmm)

lum : Lighting : lighting conditions in which the accident occurred

1 - Full day

2 - Twilight or dawn

3 - Night without public lighting

4 - Night with public lighting not lit

5 - Night with public lighting on

dep : Departmeent : INSEE Code (National Institute of Statistics and Economic Studies) of the departmeent followed
by a 0 (201 Corse-du-Sud - 202 Haute-Corse)

com : Municipality: The commune number is a code given by INSEE. The code has 3 numbers set to the right.

Localisation :

1 - Out of agglomeration

2 - In built-up areas

int : Type of Intersection :

1 - Out of intersection

2 - Intersection in X

3 - Intersection in T

4 - Intersection in Y

5 - Intersection with more than 4 branches

6 - Giratory

7 - Place

8 - Level crossing

9 - Other intersection

atm : Atmospheric conditions:

1 - Normal

2 - Light rain

3 - Heavy rain

4 - Snow - hail

5 - Fog - smoke

6 - Strong wind - storm

7 - Dazzling weather

8 - Cloudy weather

9 - Other

col : Type of collision:

1 - Two vehicles - frontal

2 - Two vehicles - from the rear

3 - Two vehicles - by the side

4 - Three vehicles and more - in chain

5 - Three or more vehicles - multiple collisions

6 - Other collision

7 - Without collision

adr : Postal address: variable filled in for accidents occurring in built-up areas

gps : GPS coding: 1 originator character:

M = Métropole

A = Antilles (Martinique or Guadeloupe)

G = Guyane

R = Réunion

Y = Mayotte

Geographic coordinates in decimal degrees:

lat : Latitude

long : Longitude


### **Places:**

Num_Acc : Accident ID

catr : Category of road:

1 - Highway

2 - National Road

3 - Departmental Road

4 - Communal Way

5 - Off public network

6 - Parking lot open to public traffic

9 - other

voie : Road Number

V1: Numeric index of the route number (example: 2 bis, 3 ter etc.)

V2: Letter alphanumeric index of the road

circ: Traffic regime:

1 - One way

2 - Bidirectional

3 - Separated carriageways

4 - With variable assignment channels

nbv: Total number of traffic lanes

vosp: Indicates the existence of a reserved lane, regardless of whether or not the accident occurs on that lane.

1 - Bike path

2 - Cycle Bank

3 - Reserved channel

Prof: Longitudinal profile describes the gradient of the road at the accident site

1 - Dish

2 - Slope

3 - Hilltop

4- Hill bottom

pr: Home PR number (upstream terminal number)

pr1: Distance in meters to the PR (relative to the upstream terminal)

plan: Drawing in plan:

1 - Straight part

2 - Curved on the left

3 - Curved right

4 - In "S"

lartpc: Central solid land width (TPC) if there is

larrout: Width of the roadway assigned to vehicle traffic are not included the emergency stop strips,
CPRs and parking spaces

surf: surface condition

1 - normal

2 - wet

3 - puddles

4 - flooded

5 - snow

6 - mud

7 - icy

8 - fat - oil

9 - other

infra: Development - Infrastructure:

1 - Underground - tunnel

2 - Bridge - autopont

3 - Exchanger or connection brace

4 - Railway

5 - Carrefour arranged

6 - Pedestrian area

7 - Toll zone

situ: Situation of the accident:

1 - On the road

2 - On emergency stop band

3 - On the verge

4 - On the sidewalk

5 - On bike path

env1: school point: near a school

### **USERS:**

Acc_number: Accident identifier.

Num_Veh: Identification of the vehicle taken back for each user occupying this vehicle (including pedestrians who are
attached to the vehicles that hit them)

place: Allows to locate the place occupied in the vehicle by the user at the time of the accident

catu: User category:

1 - Driver

2 - Passenger

3 - Pedestrian

4 - Pedestrian in rollerblade or scooter

grav: Severity of the accident: The injured users are classified into three categories of victims plus the uninjured

1 - Unscathed

2 - Killed

3 - Hospitalized wounded

4 - Light injury

sex: Sex of the user

1 - Male

2 - Female

Year_on: Year of birth of the user

trip: Reason for traveling at the time of the accident:

1 - Home - work

2 - Home - school

3 - Shopping - Shopping

4 - Professional use

5 - Promenade - leisure

9 - Other

secu: on 2 characters:
the first concerns the existence of a safety equipment

1 - Belt

2 - Helmet

3 - Children's device

4 - Reflective equipment

9 - Other

the second is the use of Safety Equipment

1 - Yes

2 - No

3 - Not determinable

locp: Location of the pedestrian:

On pavement:

1 - A + 50 m from the pedestrian crossing

2 - A - 50 m from the pedestrian crossing

On pedestrian crossing:

3 - Without light signaling

4 - With light signaling

Various:

5 - On the sidewalk

6 - On the verge

7 - On refuge or BAU

8 - On against aisle

actp: Action of the pedestrian:

Moving

0 - not specified or not applicable

1 - Meaning bumping vehicle

2 - Opposite direction of the vehicle
Various

3 - Crossing

4 - Masked

5 - Playing - running

6 - With animal

9 - Other

etatp: This variable is used to specify whether the injured pedestrian was alone or not

1 - Only

2 - Accompanied

3 - In a group

### **VEHICLES:**

Num_Acc
Accident ID

Num_Veh
Identification of the vehicle taken back for each user occupying this vehicle (including pedestrians who are
attached to vehicles that hit them) - alphanumeric code

GP
Flow direction :

1 - PK or PR or increasing postal address number

2 - PK or PR or descending postal address number

CATV
Category of vehicle:

01 - Bicycle

02 - Moped <50cm3

03 - Cart (Quadricycle with bodied motor) (formerly "cart or motor tricycle")

04 - Not used since 2006 (registered scooter)

05 - Not used since 2006 (motorcycle)

06 - Not used since 2006 (side-car)

07 - VL only

08 - Not used category (VL + caravan)

09 - Not used category (VL + trailer)

10 - VU only 1,5T <= GVW <= 3,5T with or without trailer (formerly VU only 1,5T <= GVW <= 3,5T)

11 - Most used since 2006 (VU (10) + caravan)

12 - Most used since 2006 (VU (10) + trailer)

13 - PL only 3,5T

   # **Loading datasets**
     

In [11]:

path_caracteristics = "D:/Development/Porfolio/Data-Science/France accidents Data/caracteristics.csv"
path_holidays = "D:/Development/Porfolio/Data-Science/France accidents Data/holidays.csv"
path_places ="D:/Development/Porfolio/Data-Science/France accidents Data/places.csv"
path_users ="D:/Development/Porfolio/Data-Science/France accidents Data/users.csv"
path_vehicles ="D:/Development/Porfolio/Data-Science/France accidents Data/vehicles.csv"

In [18]:
# Loading data files to seperate data framse 
df_caracteristics = pd.read_csv(path_caracteristics)
df_holidays = pd.read_csv(path_holidays)
df_places = pd.read_csv(path_places)
df_users = pd.read_csv(path_users)
df_vehicles = pd.read_csv(path_vehicles)

In [19]:
# Shpwing top 5 rows of caracteristics dataframe
df_caracteristics.head()

Unnamed: 0,Num_Acc,an,mois,jour,hrmn,lum,agg,int,atm,col,com,adr,gps,lat,long,dep
0,201600000001,16,2,1,1445,1,2,1,8.0,3.0,5.0,"46, rue Sonneville",M,0.0,0.0,590
1,201600000002,16,3,16,1800,1,2,6,1.0,6.0,5.0,1a rue du cimeti�re,M,0.0,0.0,590
2,201600000003,16,7,13,1900,1,1,1,1.0,6.0,11.0,,M,0.0,0.0,590
3,201600000004,16,8,15,1930,2,2,1,7.0,3.0,477.0,52 rue victor hugo,M,0.0,0.0,590
4,201600000005,16,12,23,1100,1,2,3,1.0,3.0,11.0,rue Joliot curie,M,0.0,0.0,590


In [20]:
# Shpwing top 5 rows of holidays dataframe
df_holidays.head()

Unnamed: 0,ds,holiday
0,2005-01-01,New year
1,2005-03-28,Easter Monday
2,2005-05-01,Labour Day
3,2005-05-05,Ascension Thursday
4,2005-05-08,Victory in Europe Day


In [21]:
# Shpwing top 5 rows of places dataframe
df_places.head()

Unnamed: 0,Num_Acc,catr,voie,v1,v2,circ,nbv,pr,pr1,vosp,prof,plan,lartpc,larrout,surf,infra,situ,env1
0,201600000001,3.0,39,,,2.0,0.0,,,0.0,1.0,3.0,0.0,0.0,1.0,0.0,1.0,0.0
1,201600000002,3.0,39,,,1.0,0.0,,,0.0,1.0,2.0,0.0,58.0,1.0,0.0,1.0,0.0
2,201600000003,3.0,1,,,2.0,2.0,,,0.0,1.0,3.0,0.0,68.0,2.0,0.0,3.0,99.0
3,201600000004,4.0,0,,,2.0,0.0,,,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,99.0
4,201600000005,4.0,0,,,0.0,0.0,,,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,3.0


In [22]:
# Showing top 5 rows of users dataframe
df_users.head()

Unnamed: 0,Num_Acc,place,catu,grav,sexe,trajet,secu,locp,actp,etatp,an_nais,num_veh
0,201600000001,1.0,1,1,2,0.0,11.0,0.0,0.0,0.0,1983.0,B02
1,201600000001,1.0,1,3,1,9.0,21.0,0.0,0.0,0.0,2001.0,A01
2,201600000002,1.0,1,3,1,5.0,11.0,0.0,0.0,0.0,1960.0,A01
3,201600000002,2.0,2,3,1,0.0,11.0,0.0,0.0,0.0,2000.0,A01
4,201600000002,3.0,2,3,2,0.0,11.0,0.0,0.0,0.0,1962.0,A01


In [23]:
# Top 5 rows of vehicles dataframe 
df_vehicles.head()

Unnamed: 0,Num_Acc,senc,catv,occutc,obs,obsm,choc,manv,num_veh
0,201600000001,0.0,7,0,0.0,0.0,1.0,1.0,B02
1,201600000001,0.0,2,0,0.0,0.0,7.0,15.0,A01
2,201600000002,0.0,7,0,6.0,0.0,1.0,1.0,A01
3,201600000003,0.0,7,0,0.0,1.0,6.0,1.0,A01
4,201600000004,0.0,32,0,0.0,0.0,1.0,1.0,B02


 # **Data Explolarion and data cleaning (deleting missing data entries from each dataframe)** 