# 00 - Optional homework

For this initial homework we will be working with a [dataset](https://github.com/fivethirtyeight/guns-data/blob/master/interactive_data.csv) available as a part of an interesting analysis of [gunshot deaths in the US](http://fivethirtyeight.com/features/gun-deaths/). The goal of this optional homework is to carefully go through the interactive visualization portrayed at the top of the aforementioned article, and use an IPython Notebook to reproduce the following claims made in the visualization:
- Nearly *two-thirds* of gun deaths are *suicides*.
- More than *85 percent* of suicide victims are *male*.
- Around *a third* of all gun deaths are *homicides*.
- Around *two-thirds* of homicide victims who are *males* in the *age-group of 15--34* are *black*.
- *Women* constitue only *15 percent* of the total *homicide* victims.

It's not necessary to generate visualizations for the results -- numbers should be more than enough to convince yourself that you 
were able to reproduce the results of that article.

You can use this opportunity first of all to refresh your Python skills. If you are coming from another programming language
(especially a static PL like Java and C++), we recommend you to take a look at this presentation:
[Code Like a Pythonista: Idiomatic Python](http://www.omahapython.org/IdiomaticPython.html) -- it will teach
you how to write nice Python code, while at the same time getting you up to speed with the syntax.
Feel free to explore more advanced libraries (like [Pandas](http://pandas.pydata.org/)) if you really want, but keep in mind that you
should be able to reproduce the results with the Python Standard Library.
One advantage of using only the PSL is that once you will get knowledgeable about Pandas you will appreciate how much more concise
and readable your code will become :)

Credits to [Michele Catasta](https://github.com/pirroh), on whose material this version is based.

In [1]:
import csv

In [2]:
DATA = "interactive_data"
NUMERIC_COLS = ["", "Deaths", "Population", "Rate"]

def read_csv(file_name, header=True, numeric_cols=[]):
    def num(s):
        try:
            return int(s)
        except ValueError:
            return float(s)
    
    with open(file_name + '.csv', newline='') as csvfile:
        data_dict = {}
        keys = []
        rows = csv.reader(csvfile, delimiter=',', quotechar='"')
        
        # header
        if header:
            for val in next(rows): # get the first line
                data_dict[val] = []
        else:
            for idx, val in enumerate(next(rows)):
                data_dict[idx] = [val]
        
        keys = list(data_dict.keys())  

        # Rows
        for row in rows: # 2nd row and onward
            for idx, val in enumerate(row):
                data_dict[keys[idx]].append(val)
        
        for col in numeric_cols:
            data_dict[col] = [num(val) if val != "NA" else -1 for val in data_dict[col] ]

    return data_dict

def get_indices(li, value):
    indices = [idx for idx, val in enumerate(li) if val == value]

    return indices

def sum_elements_from_indices(li, indices):
    values = [li[idx] for idx in indices]
    s = sum(values)
    
    return s

In [3]:
data_dict = read_csv(DATA, numeric_cols=NUMERIC_COLS)
data_length = len(data_dict["Intent"])

## Procedure

1. Identify the categories of interest: eg. Around two-thirds of **homicide** victims who are **males** in the age-group of **15--34** are **black**. 
2. Identify null values in those categories: eg. "None selected" 
3. Get indices of rows with those categories or value.
4. Combine the categories: the intersection or union (or difference) of those indices when appropriate: eg. (homicide $\cap$ male $\cap$ 15--34) $-$ Race("None selected")
5. Sum the values using the indices for the combined category of interest.
6. Compute the fraction of interest using the sums.

In [4]:
# Nearly two-thirds of gun deaths are suicides.
deaths = sum(data_dict["Deaths"])

suicide_indices = get_indices(data_dict["Intent"], "Suicide")
none_death_indices = get_indices(data_dict["Intent"], "None selected")

sum_none_death = sum_elements_from_indices(data_dict["Deaths"], none_death_indices)
sum_suicide = sum_elements_from_indices(data_dict["Deaths"], suicide_indices)

sum_suicide/(deaths - sum_none_death)

0.6267518388029272

In [5]:
# More than 85 percent of suicide victims are male.
suicide_indices = get_indices(data_dict["Intent"], "Suicide")
male_indices = get_indices(data_dict["Gender"], "Male")
female_indices = get_indices(data_dict["Gender"], "Female")
inter_male = list(set(suicide_indices).intersection(set(male_indices)))
inter_female = list(set(suicide_indices).intersection(set(female_indices)))

sum_suicide_male = sum_elements_from_indices(data_dict["Deaths"], inter_male)
sum_suicide_female = sum_elements_from_indices(data_dict["Deaths"], inter_female)

sum_suicide_male/(sum_suicide_male + sum_suicide_female)

0.8624410860351643

In [6]:
# Around a third of all gun deaths are homicides.
homicide_indices = get_indices(data_dict["Intent"], "Homicide")
none_death_indices = get_indices(data_dict["Intent"], "None selected")

sum_homicide = sum_elements_from_indices(data_dict["Deaths"], homicide_indices)

sum_homicide/(deaths - sum_none_death)

0.3490023773294294

In [7]:
# Around two-thirds of homicide victims who are males in the age-group of 15--34 are black.
homicide_indices = get_indices(data_dict["Intent"], "Homicide")
male_indices = get_indices(data_dict["Gender"], "Male")
age_indices = get_indices(data_dict["Age"], "15 - 34")
black_indices = get_indices(data_dict["Race"], "Black")
none_indices = get_indices(data_dict["Race"], "None selected")
inter = list(set(homicide_indices).intersection(set(male_indices).intersection(set(age_indices))).difference(none_indices))
inter_black = list(set(black_indices).intersection(inter))

sum_inter = sum_elements_from_indices(data_dict["Deaths"], inter)
sum_inter_black = sum_elements_from_indices(data_dict["Deaths"], inter_black)

sum_inter_black/sum_inter

0.6612482748044778

In [8]:
# Women constitue only 15 percent of the total homicide victims.
homicides = get_indices(data_dict["Intent"], "Homicide")
none_deaths = get_indices(data_dict["Intent"], "None selected")
males = get_indices(data_dict["Gender"], "Male")
females = get_indices(data_dict["Gender"], "Female")

inter = list((set(males).union(set(females))).intersection(set(homicides)))
inter_female = list(set(inter).intersection(females))

sum_inter = sum_elements_from_indices(data_dict["Deaths"], inter)
sum_inter_female = sum_elements_from_indices(data_dict["Deaths"], inter_female)

sum_inter_female/sum_inter

0.15279495160533835