### This notebook adds new numerical attributes to the merged dataset

The attributes added are - *% SC covered*, *% ST covered*, *% General covered*, *% SC population*,*% ST population*, *% Gen population*, *Is Backward Concentrated?*, *Total Population*

We start by importing the necessary package - csv, for us to use `csv.writer` to write our merged dataset into a csv file.

In [7]:
import csv

Function to detect if a particular string can be converted to an integer or not. The implementation is based on the concept of exception handling, i.e, if there is an exception while trying to convert the string to number, return `false`, else return `true`.

In [8]:
def isInteger(str):
	try:
		num = int(str)
	except ValueError:
		return False
	return True

Function to find the integer value of a certain string.

In [9]:
def intValueOf(str):
	if isInteger(str):
		return int(str)
	else:
		return 0

Function to find the percentage of a number with respect to another, checking that none of them is 0.

In [10]:
def percentage(big, small):
	if big != 0 and small != 0:
		return small / big * 100
	else:
		return "NA"

Storing the list of rows in `mergedList`

In [16]:
with open('../data/healthAndHabitatMerged.csv', 'r') as merged:
	mergedList = list(merged)

Initializing a list to contain the new values `addedCSVList`, `headingRow` to skip the first row, and dictionary `stateIndices` to ensure a constant time lookup on the row the values are to be added.

In [12]:
headingRow = True
stateIndices = dict()
addedCSVList = []
addedCSVList.append(['State', 'District', 'SC Current', 'ST Current', 'General Current', 'SC Covered', 'ST Covered', 'General Covered', 'Total Population', 'Percentage SC', 'Percentage SC covered',  'Percentage ST', 'Percentage ST covered',  'Percentage General', 'Percentage General Covered', 'State Index', 'Backward Concentrated', 'Number of Sub Centres', 'Number of Primary Health Centres', 'Number of Community Health Centres', 'Sub Divisional Hospitals', 'District Hospitals'])

The cell below calculates the new attributes to be added.
Iterating through the list of rows and then splitting to get each element, and using the numerical data to calculate new numerical data as is quite evident in the formulae used in the cell below (see the comments for reference).

In [13]:
for row in mergedList:
	if headingRow:
		headingRow = False
		continue
	rowList = row.split(",")

	state = rowList[0]
	stateIndex = -1
	if state in stateIndices:
		stateIndex = stateIndices[state]
	else:
		stateIndices[state] = len(stateIndices)
		stateIndex = stateIndices[state]

	backwardConcentrated = 0

    #getting the required numerical values from the list of elements
	stCurrent = intValueOf(rowList[3])
	scCurrent = intValueOf(rowList[2])
	generalCurrent = intValueOf(rowList[4])
		
	stCovered = intValueOf(rowList[6])
	scCovered = intValueOf(rowList[5])
	generalCovered = intValueOf(rowList[7])

    #percentage covered calculation
	perStCovered = percentage(stCurrent, stCovered)
	perScCovered = percentage(scCurrent, scCovered)
	perGenCovered = percentage(generalCurrent, generalCovered)

    #total population
	totalPopulation = stCurrent + scCurrent + generalCurrent

    #percentage caste
	perSc = percentage(totalPopulation, scCurrent)
	perSt = percentage(totalPopulation, stCurrent)
    
    #is backward concentrated?
	if(intValueOf(perSc) + intValueOf(perSt) > 50) :
		backwardConcentrated = 1
	perGen = percentage(totalPopulation, generalCurrent)

    #appending to the new list
	addedCSVList.append([rowList[0], rowList[1], rowList[2], rowList[3], rowList[4], rowList[5], rowList[6], rowList[7], totalPopulation, perSc, perScCovered, perSt, perStCovered, perGen, perGenCovered, stateIndex, backwardConcentrated, rowList[8], rowList[9], rowList[10], rowList[11], rowList[12].strip()])

Writing the new attributes into a new *.csv* file `AddedAttributesMerged.csv`

In [14]:
with open('../data/AddedAttributesMerged.csv', 'w') as merged:
	writer = csv.writer(merged)
	writer.writerows(addedCSVList)