# Project - Programming for Data Analysis
***


### References
***
**Road Safty Authority (RSA)**       
    - www.rsa.ie/en/RSA/Road-Safety/Our-Research/Deaths-injuries-on-Irish-roads  
    - www.rsa.ie/Documents 
**Irish Times** 
    - https://www.irishtimes.com/news/environment/crash-report
**Technical References**       
    - http://pandas.pydata.org/pandas-docs/stable/
    - https://docs.scipy.org/doc/numpy/reference/routines.random.html
    - https://www.bogotobogo.com/python/python_fncs_map_filter_reduce.php
    - https://www.analyticsvidhya.com/blog/2017/09/6-probability-distributions-data-science/
    - http://effbot.org/zone/python-list.htm


***

***

**Real Scenario** <br> <br>
The below summmary is based on the road accidents statistics prepared by the Road Saftey Authority in the year 2016.

    - There were 175 fatal collisions happened in the Irish roads which resulted in 187 fatalities
    - 13% more collisions and 15% more deaths compared to the previous year (2015)
    - Maximum number of fatalities occured in counties Dublin and Cork
    - Highest fatalities occured in the age group "66 and above"
    - Maximum fatalities occured for the road user type "Driver"
    - Maximum number of fatalities occured on the week day "Sunday"

***
**Project** <br> <br>
This project is inspired from the above real world scenario. The objectives of the projects are listed below

    - Generate 200 data sets using the python random sampling functions
    - Each data set to contain 6 variables 
    
               - Irish counties where the accident took place
               - Age group of the Driver [ Traditionalists, Baby Boomers, Gen-X, Gen-Y, Gen-Z]
               - Type of the Vehicle [Car, Van, Bus, Lorry, bi-cycle, Jeep]
               - Road Type [Two-way single carriageway, One-way single carriageway, Dual Carriageway]
               - Weather on the particular day [Sunny, Rainy, Snow, Windy, cloudy]
        
    - Investigate the types of variables involved, their likely distributions, and their relationships with each other.
    - Summarise the findings.
    
***

In [71]:
#Import Pandas library
import pandas as pd

# Variable 1 - Counties
# The irish counties are stored in the Json file
# Create a dataframe for the irish counties
url = "https://raw.githubusercontent.com/SomanathanSubramaniyan/PDA-Project/master/Counties.json"
df_counties = pd.read_json(url, orient='columns')
df_counties[2:2]

# Variable 2 - Age group of the Driver
# Create a  list for the AgeGroup
AgeGroup =['Traditionalists', 'Baby Boomers', 'Gen-X', 'Gen-Y', 'Gen-Z']

# Variable 3 - Type of the Vehicle
# Create a  list for different type of vechicles
VehicleType = ['Car', 'Van', 'Bus', 'Lorry', 'bi-cycle', 'Jeep']

# Variable 4 - Road type
# Create a  list for different Road Types
RoadType = ['Two-way single carriageway', 'One-way single carriageway', 'Dual Carriageway']

# Variable 5 - weather
# Create a  list for different weather scenarios
Weather = ['Sunny', 'Rainy', 'Snow', 'Windy', 'Cloudy']

In [204]:
# This section of the code is to create 100 dataset
# Create dataframe for variables county, Agegroup, Vechicle type,Road Type, Weather and Number of accidents
# User for loop to create a 200 data set

### Variable 1  -- County ###
# Use UNIFORM DISTRIBUTION to populate the county column in the dataframe
# this ensures all the country are equally represented in the dataset.
# On average 31 distinct counties out of 32 are populated using logic during each execution

dataset = pd.DataFrame(columns=['County','Age Group','Vechicle Type','Road Type', 'Weather','No of Accidents'])

for x in range(200):
    icounty = int(round(random.uniform(0,31),0))
    dataset.loc[x,'County'] = df_counties.at[icounty,0]
    
### Variable 2  -- Age Group of the Driver ###
# Use NORMAL DISTRIBUTION to populate the Age Group column in the dataframe
# this ensures most of the data set has "baby boomer" or "Gen-X"

for x in range(200):
    iAG = int(round(random.normal(3,2,200),0))
    dataset.loc[x,'County'] = AgeGroup[iAG]
    



array(['Wexford', 'Carlow', 'Louth', 'Cork', 'Down', 'Sligo', 'Kildare',
       'Clare', 'Mayo', 'Longford', 'Tipperary', 'Galway', 'Meath',
       'Offaly', 'Waterford', 'Donegal', 'Cavan', 'Westmeath', 'Monaghan',
       'Leitrim', 'Laois', 'Fermanagh', 'Limerick', 'Tyrone', 'Kilkenny',
       'Dublin', 'Armagh', 'Antrim', 'Kerry', 'Roscommon', 'Derry',
       'Wicklow'], dtype=object)

In [208]:
import numpy as np

np.random.normal(3,1,200)

array([ 5.30524620e+00,  2.92091125e+00,  3.95609642e+00,  3.14155354e+00,
        2.05263931e+00,  2.15805529e+00,  3.04326583e+00,  3.33843153e+00,
        2.56912728e+00,  3.69561345e+00,  3.36529323e+00,  3.64416617e+00,
        3.63019523e+00,  3.19180590e+00,  1.11780955e+00,  3.27560361e+00,
        5.87026231e-01,  3.55011523e+00,  6.92706944e-01,  3.87734157e+00,
        5.10065230e+00,  2.18285609e+00,  3.23175787e+00,  3.42404282e+00,
        2.76901629e+00,  3.84793325e+00,  2.17422699e+00,  3.62050469e-01,
        3.29260110e+00,  4.24286267e+00,  1.19847852e+00,  3.82462706e+00,
        4.23231297e+00,  2.46020939e+00,  2.87121729e+00,  3.65486019e+00,
        3.19557242e+00,  2.67658145e+00,  1.46397137e+00,  2.99052067e+00,
        1.84766150e+00,  2.28457083e+00,  3.26832350e+00,  1.49189561e+00,
        4.40905808e+00,  3.12463263e+00,  5.42685641e+00,  4.36883218e+00,
        5.30292605e+00,  4.60514308e+00,  3.15091385e+00,  2.75660618e+00,
        3.24574930e+00,  