In [2]:
import pandas as pd
import os

In [4]:
## Reading the 'interactive_data.csv' file as a pandas dataframe
PATH_IN = './Data/' 
fname = os.path.join(PATH_IN, 'interactive_data.csv')
df = pd.read_csv(fname, index_col=0)
df.head(5)

Unnamed: 0,Intent,Gender,Age,Race,Deaths,Population,Rate
1,None selected,None selected,None selected,None selected,33599,316299978,10.6
2,None selected,None selected,None selected,White,22079,197369634,11.2
3,None selected,None selected,None selected,Black,7765,38896382,20.0
4,None selected,None selected,None selected,Hispanic,3007,54049078,5.6
5,None selected,None selected,None selected,Asian/Pacific Islander,442,16315561,2.7


1. **Setting the Input Path:** It begins by defining a variable `PATH_IN` with the value `'./Data/'`. This variable seems to specify the path to a directory where some data files are located. The `./` at the beginning signifies that this path is relative to the current directory where the script is executed.

2. **Constructing a File Path:** The next line of code uses the `os.path.join()` function from the `os` module to construct a file path by joining the `PATH_IN` with the string `'interactive_data.csv'`. This indicates that the script is trying to create a complete file path for a file named "interactive_data.csv" located within the directory specified in `PATH_IN`. The resulting file path is stored in the variable `fname`.

3. **Reading CSV Data into a DataFrame:** The script uses the `pd.read_csv()` function from the Pandas library to read the data from the CSV file specified by the `fname` variable. The `index_col=0` argument indicates that the first column of the CSV file should be used as the index (row labels) for the resulting DataFrame. This assumes that the first column contains unique identifiers or labels for each row.

4. **Displaying the First 10 Rows:** Finally, the script prints the first 10 rows of the DataFrame using the `df.head(10)` method. This provides a preview of the data that has been read from the CSV file.

In summary, this code snippet is used to read data from a CSV file located in the './Data/' directory, store it in a Pandas DataFrame called `df`, and then display the first 10 rows of that DataFrame to inspect the data.

In [6]:
## Note that there are other ways to approach this analysis, the following is one possible correct solution.
## The file 'interactive_data.csv' contains aggregations across different categorical filters.
## Specifically, 'None selected' corresponds to the case when a filter is not applied to a column.
## Thus, such cases correspond to all possible values under that column.
## For example, if all columns have the entry 'None selected', then this corresponds to an aggregation across all entries.
## Thus, the first pre-processing step is to remove rows where at least one column has the value 'None selected'.
df_filtered = df.loc[~(df=='None selected').any(axis=1)]
df_filtered.head(5)

Unnamed: 0,Intent,Gender,Age,Race,Deaths,Population,Rate
152,Suicide,Female,Under 15,White,19,15355910,0.1
153,Suicide,Female,Under 15,Black,1,4095428,0.0
154,Suicide,Female,Under 15,Hispanic,4,7330024,0.1
155,Suicide,Female,Under 15,Asian/Pacific Islander,1,1393440,0.1
156,Suicide,Female,Under 15,Other,0,1661877,0.0


The provided line of code is used to filter a Pandas DataFrame called `df` based on a condition and create a new DataFrame called `df_filtered`. Let's break down this line of code step by step:

1. `df`: This is the original DataFrame that you have previously loaded from a CSV file using `pd.read_csv()`.

2. `df.loc[...]`: This part of the code is using DataFrame indexing with the `.loc[]` accessor. It allows you to select specific rows and columns from the DataFrame.

3. `~(df=='None selected').any(axis=1)`: This is the condition that specifies which rows should be selected from the original DataFrame `df`. Let's break it down further:
   - `df=='None selected'`: This part compares each element in the DataFrame `df` to the string `'None selected'`. This comparison results in a DataFrame of the same shape as `df`, where each element is either `True` if the corresponding element in `df` is equal to `'None selected'` or `False` otherwise.
   - `.any(axis=1)`: After comparing each element to `'None selected'`, the `.any(axis=1)` part checks if any element in each row is `True`. It computes this along the rows (axis=1). So, for each row, it returns `True` if any element in that row is `'None selected'` and `False` otherwise.
   - `~`: Finally, the tilde (`~`) operator is used to negate the condition. It flips `True` values to `False` and vice versa. Essentially, it selects rows where none of the elements are equal to `'None selected'`.

4. `df_filtered`: This is the new DataFrame that stores the result of the filtering operation. It contains only the rows from the original `df` where none of the elements in the row are equal to `'None selected'`.

In summary, the line of code filters the original DataFrame `df` to create a new DataFrame `df_filtered`, which excludes rows where any element in that row is equal to the string `'None selected'`.

In [10]:
## Compute #deaths
all_deaths = df_filtered['Deaths'].sum()
print("Total Deaths:", all_deaths)

Total Deaths: 33595


In [11]:
## Compute #suicides
suicides = df_filtered[df_filtered['Intent'] == 'Suicide']['Deaths'].sum()
print(f'{suicides/all_deaths*100}% of gun deaths are suicides.')

62.68194671826165% of gun deaths are suicides.


The provided code calculates the percentage of gun deaths that are attributed to suicide and then prints this percentage as a formatted string. Let's break it down step by step:

1. `suicides = df_filtered[df_filtered['Intent'] == 'Suicide']['Deaths'].sum()`

   - `df_filtered[df_filtered['Intent'] == 'Suicide']`: This part of the code filters the `df_filtered` DataFrame to select only the rows where the 'Intent' column has the value 'Suicide'. This creates a new DataFrame containing only the rows related to suicide.

   - `['Deaths'].sum()`: After filtering, this part calculates the sum of the 'Deaths' column for the filtered DataFrame. In other words, it computes the total number of deaths for which the intent is 'Suicide' and stores this value in the variable `suicides`.

So, this code calculates the percentage of gun deaths that are suicides from the filtered DataFrame `df_filtered` and then prints this percentage as a message. For example, if 20% of the filtered gun deaths are suicides, it will print something like "20% of gun deaths are suicides."

In [12]:
## Compute #male-suicides
male_suicides = df_filtered[
                    (df_filtered['Intent'] == 'Suicide') & 
                    (df_filtered['Gender'] == 'Male')
                ]['Deaths'].sum()
print(f'{male_suicides/suicides*100}% of suicide victims are male.')

86.24275809668535% of suicide victims are male.


In [14]:
## Compute total homicides - Alessio :)
homicides = df_filtered[(df_filtered['Intent'] == 'Homicide')]['Deaths'].sum()
print(f'{homicides/all_deaths*100}% of gun death are homicides.')


34.906980205387704% of gun death arehomicides.


In [16]:
## Compute black-males homicides in the age-group 15--34 - Alessio :))
male_age_group_homicide = df_filtered[
                    (df_filtered['Intent'] == 'Homicide') & 
                    (df_filtered['Gender'] == 'Male') &
                    (df_filtered['Age'] == '15 - 34')
                ]['Deaths'].sum()

black_male_age_group_homicide = df_filtered[
                    (df_filtered['Intent'] == 'Homicide') & 
                    (df_filtered['Gender'] == 'Male') &
                    (df_filtered['Age'] == '15 - 34') &
                    (df_filtered['Race'] == 'Black') 
                ]['Deaths'].sum()

percentage = (black_male_age_group_homicide/male_age_group_homicide)*100
print(f'{percentage}% of homicide victims who are males in the age-group 15--34 are black.')

66.12482748044778% of homicide victims who are males in the age-group 15--34 are black.


In [17]:
## Compute women total homicide victims - Alessio :)))
women_homicides = df_filtered[
                    (df_filtered['Intent'] == 'Homicide') &
                    (df_filtered['Gender'] == 'Female')
                    ]['Deaths'].sum()
print(f'{women_homicides/homicides*100}% of homicide victims are women.')

15.289502856655583% of homicide victims are women.
