This is a part of my Capstone project for a Coursera Specialization on Python. I discussed in this thread about the initial idea for my project. I then went ahead with a data-set that interested me from http://data.gov.in. The data set provided data related to Road Accidents in India classified according to various parameters. I picked up the data provided on the total number of road accidents in various Indian states and Union territories (UT) along with the educational qualification of the Drivers responsible for the accidents. The website provided data in separate JSON files (one each for each type of educational qualification) for the period of 2009-2015. The education qualification was divided in categories “Below 8th Standard” (primary school education only), “From 8 to 9 Standard” (secondary school educated) and “Above 10 Standard” (Senior secondary or Graduate/Post Graduate etc.). I discarded the data where educational qualification was not available (unknown) or accidents were not reported (zero).
The data provided was 4 dimensional (Year of Accident, Number of Accidents, Name of State/UT where accident took place, Education Qualification of the driver). I chose to use the Google Bubble chart for visualization.
Initially, I selected data for all 36 States/UTs and the Bubble chart (except for looking colorful) really came out too crowded to decipher anything.
I then changed the query to display data for the top 10 States/UTs that had maximum number of accidents during this 7-year period. The resulting chart (see below) was better but still crowded.
Finally, I created the chart with just the top 3 culprit states. The chart now looked much better (see below).
The Chart displays the year on x-axis and number of accidents on y-axis. The bubbles are colored according to the Indian State they represent. The size of the bubble represents the educational qualification of the driver, larger the bubble size more educated was the driver.
The results of the analysis were quite interesting and in fact changed my own prejudice on the subject. A common perception (perhaps) is that more educated drivers are likely to cause fewer accidents (as they might be more aware of traffic rules, would be more mature etc.). The chart shows that the reality is actually the very reverse. The data shows that people who just received primary education (educated less than Standard 8) had caused lesser accidents than those educated higher than them. In fact, the clear pattern is that “Higher the educational qualification of the driver, more likely s/he is to cause an accident”.
A formal account was also published at LinkedIn.