# Exercise Nine: Numbers

This week, you'll be exploring the [GSS](https://gssdataexplorer.norc.org/) dataset we worked within in the "Social Stats" exercise. Using our demo and the textbook as a guide, pick three new variables to explore. Your workflow should:

- 1 Import the current version of the file (available for download at the link above), and isolate the columns of interest based on the variables you want to include
- 2 Using the variable navigator provided by GSS, determine the years applicable and narrow your dataset accordingly.
- 3 Visualize at least two quantiative relationships or patterns: these might include connections between clear numerical values, such as age and income, or more complex visualizations based on boolean data (for example, our "yes" and "no" to reading fiction.)
- 4 Group the data using at least two different divisions to spot interesting trends, and plot at least one variance across a group (refer to our example of happiness among fiction readers as a starting point.)

For a bonus challenge, try running another analysis using an advanced method such as summary statistics or cross tabulation.

## Methodology for Determining Search Codes 

I am interested in the intersection of military service, firearm ownership, and support for legalization of marijuana.  Through various searches in the GSS Codebook, I found the following applicable codes.  

- MILWRKEV - EVER WORK FOR MILITARY OR DOD?
- MILWRKNW - CURRENTLY WORK FOR MILITARY OR DOD?
- VETAID - ANY IN HH RECEIVE MIL OR VET BENEFITS
- MEMVET - MEMBERSHIP IN VETERAN GROUP
- GUNLAW - FAVOR OR OPPOSE GUN PERMITS
- OWNGUN - HAVE GUN IN HOME
- GUNSDRNK - SHOULD CARRYING A FIREARM DRINKING ALCOHOL BE ILLEGAL
- GRASS - SHOULD MARIJUANA BE MADE LEGAL
- GRASSY - SHOULD MARIJUANA BE LEGAL-VERSION

Unfortunately, I could not find a simple yes or no code for military service.  The closest codes I found were MILWRKNW and MEMVET.  Both have their draw backs.  MILWRKNW could refer to both military personnel and civilians working for the military.  MEMVET would specifically deal with veterans, as vet groups require military service, but this is a smaller subset than just being a veteran.  I’ve decided to use MILWRKNW as a stand in for ‘yes or no military service.’ as this code may most widely capture the kind of individual I am interested in. 

I will use OWNGUN for firearm ownership, as this is a relatively straightforward code.  If someone owns a gun, one would assume they support gun ownership.  I will use GRASS for approval of marijuana legalization.

I will also include some basic demographic codes in my searches to further parse the data.

## Years of Interest

I chose to focus on four years: 1975, 1991, 2007, 2018.  Except for 2018, these years signify important years in American military history.  1975 marked the end of the Vietnam War.  The Persian Gulf War occurred in 1991.  2007 marks a point of high intensity for the Global War on Terror, specifically with the troop surge in Iraq.  2018 is the last year of available data.  I will use this year to assess current opinions.

# Step one and two

import the current version of the file (available for download at the link above), and isolate the columns of interest based on the variables you want to include

sing the variable navigator provided by GSS, determine the years applicable and narrow your dataset accordingly.

In [27]:
import pandas as pd

columns = ['id', 'year', 'age', 'sex', 'race', 'memvet', 'owngun', 'grass']
df = pd.read_stata("GSS7218_R1.dta", columns=columns)

df = df.loc[df['year'].isin({1975, 1991, 2007, 2018})]
print(df.head)

<bound method NDFrame.head of          id  year age     sex   race memvet owngun      grass
4601      1  1975  38    male  white     no    NaN  NOT LEGAL
4602      2  1975  20  female  white     no    NaN  NOT LEGAL
4603      3  1975  61  female  white     no    NaN  NOT LEGAL
4604      4  1975  19    male  white     no    NaN      legal
4605      5  1975  28    male  white     no    NaN      legal
...     ...   ...  ..     ...    ...    ...    ...        ...
64809  2344  2018  37  female  white    NaN     no        NaN
64810  2345  2018  75  female  white    NaN     no        NaN
64811  2346  2018  67  female  white    NaN    yes      legal
64812  2347  2018  72    male  white    NaN    NaN  NOT LEGAL
64813  2348  2018  79  female  white    NaN    yes        NaN

[5355 rows x 8 columns]>


In [28]:
df = df.loc[df['memvet'].notnull()]
print(df.head)

<bound method NDFrame.head of          id  year age     sex   race memvet owngun      grass
4601      1  1975  38    male  white     no    NaN  NOT LEGAL
4602      2  1975  20  female  white     no    NaN  NOT LEGAL
4603      3  1975  61  female  white     no    NaN  NOT LEGAL
4604      4  1975  19    male  white     no    NaN      legal
4605      5  1975  28    male  white     no    NaN      legal
...     ...   ...  ..     ...    ...    ...    ...        ...
27774  1510  1991  70  female  white     no    NaN  NOT LEGAL
27777  1513  1991  35    male  white     no    NaN  NOT LEGAL
27779  1515  1991  30    male  white     no    NaN  NOT LEGAL
27780  1516  1991  70    male  white    yes    yes  NOT LEGAL
27781  1517  1991  47  female  white     no    NaN  NOT LEGAL

[2472 rows x 8 columns]>


In [29]:
df = df.loc[df['owngun'].notnull()]
print(df.head)

<bound method NDFrame.head of          id  year age     sex   race memvet   owngun      grass
26266     2  1991  32  female  white     no       no      legal
26268     4  1991  26  female  white     no       no      legal
26271     7  1991  46    male  black     no      yes      legal
26273     9  1991  57  female  black     no       no  NOT LEGAL
26279    15  1991  33  female  white     no       no        NaN
...     ...   ...  ..     ...    ...    ...      ...        ...
27761  1497  1991  56  female  white     no  refused  NOT LEGAL
27764  1500  1991  73  female  white     no      yes  NOT LEGAL
27769  1505  1991  66  female  white     no      yes      legal
27773  1509  1991  22    male  white     no      yes  NOT LEGAL
27780  1516  1991  70    male  white    yes      yes  NOT LEGAL

[486 rows x 8 columns]>


In [30]:
f = df.loc[df['grass'].notnull()]
print(df.head)

<bound method NDFrame.head of          id  year age     sex   race memvet   owngun      grass
26266     2  1991  32  female  white     no       no      legal
26268     4  1991  26  female  white     no       no      legal
26271     7  1991  46    male  black     no      yes      legal
26273     9  1991  57  female  black     no       no  NOT LEGAL
26279    15  1991  33  female  white     no       no        NaN
...     ...   ...  ..     ...    ...    ...      ...        ...
27761  1497  1991  56  female  white     no  refused  NOT LEGAL
27764  1500  1991  73  female  white     no      yes  NOT LEGAL
27769  1505  1991  66  female  white     no      yes      legal
27773  1509  1991  22    male  white     no      yes  NOT LEGAL
27780  1516  1991  70    male  white    yes      yes  NOT LEGAL

[486 rows x 8 columns]>


In [16]:
print(df.head)

<bound method NDFrame.head of Empty DataFrame
Columns: [id, year, age, sex, race, milwrknw, owngun, grass]
Index: []>


# Step 3

Visualize at least two quantiative relationships or patterns: these might include connections between clear numerical values, such as age and income, or more complex visualizations based on boolean data (for example, our "yes" and "no" to reading fiction.)

In [41]:
readfict_sample = df.loc[df['owngun'].notnull()].sample(10)['memvet']
readfict_sample = readfict_sample.replace(['no', 'yes'], [0, 1])
print(readfict_sample)

print("Mean:", readfict_sample.mean())
print("Median:", readfict_sample.median())

26708    1
27728    0
26953    0
26787    0
26746    0
26814    0
26392    0
27411    0
26519    0
26825    1
Name: memvet, dtype: int64
Mean: 0.2
Median: 0.0


Group the data using at least two different divisions to spot interesting trends, and plot at least one variance across a group (refer to our example of happiness among fiction readers as a starting point.)