Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
dataset/accidents-vs-education/2009-2015 Added capstone Sep 19, 2017 Added capstone Sep 19, 2017
bubblechart-all-states.JPG Added capstone Sep 19, 2017
bubblechart-all-states.html Added capstone Sep 19, 2017
bubblechart-top3-states.JPG Added capstone Sep 19, 2017

This is a part of my Capstone project for a Coursera Specialization on Python. I discussed in this thread about the initial idea for my project. I then went ahead with a data-set that interested me from The data set provided data related to Road Accidents in India classified according to various parameters. I picked up the data provided on the total number of road accidents in various Indian states and Union territories (UT) along with the educational qualification of the Drivers responsible for the accidents. The website provided data in separate JSON files (one each for each type of educational qualification) for the period of 2009-2015. The education qualification was divided in categories “Below 8th Standard” (primary school education only), “From 8 to 9 Standard” (secondary school educated) and “Above 10 Standard” (Senior secondary or Graduate/Post Graduate etc.). I discarded the data where educational qualification was not available (unknown) or accidents were not reported (zero).

The data provided was 4 dimensional (Year of Accident, Number of Accidents, Name of State/UT where accident took place, Education Qualification of the driver). I chose to use the Google Bubble chart for visualization.

I wrote a Python script that picks up all JSON files from a given directory containing accident data and then inserted in to the local SQLLite Database accidentsdb.sqlite. I had to cleanse the data at places where state names were spelled differently (example: “Andaman and Nicobar Islands” and “Andaman & Nicobar Islands”). Another Python script then queried the database and created the bubblechart.html file with required JavaScript needed to display the bubble chart.

Initially, I selected data for all 36 States/UTs and the Bubble chart (except for looking colorful) really came out too crowded to decipher anything.

Alt text

I then changed the query to display data for the top 10 States/UTs that had maximum number of accidents during this 7-year period. The resulting chart (see below) was better but still crowded.

Alt text

Finally, I created the chart with just the top 3 culprit states. The chart now looked much better (see below).

Alt text

The Chart displays the year on x-axis and number of accidents on y-axis. The bubbles are colored according to the Indian State they represent. The size of the bubble represents the educational qualification of the driver, larger the bubble size more educated was the driver.

The results of the analysis were quite interesting and in fact changed my own prejudice on the subject. A common perception (perhaps) is that more educated drivers are likely to cause fewer accidents (as they might be more aware of traffic rules, would be more mature etc.). The chart shows that the reality is actually the very reverse. The data shows that people who just received primary education (educated less than Standard 8) had caused lesser accidents than those educated higher than them. In fact, the clear pattern is that “Higher the educational qualification of the driver, more likely s/he is to cause an accident”.

A formal account was also published at LinkedIn.