#Socioeconomic status and health
##Overview
Socioeconomic status is both a strong predictor of health, and a key factor underlying health inequities across populations. Poor socioeconomic status has the capacity to profoundly limit the capabilities of an individual or population, manifesting itself through deficiencies in both financial and social capital. It is clear how a lack of financial capital can compromise the capacity to maintain good health. 

Along with the socioeconomic factor of health disparities, race is another key factor. The United States historically had large disparities in health and access to adequate healthcare between races, and current evidence supports the notion that these racially centered disparities continue to exist and are a significant social health issue. The disparities in access to adequate healthcare include differences in the quality of care based on race and overall insurance coverage based on race. The Journal of the American Medical Association identifies race as a significant determinant in the level of quality of care, with ethnic minority groups receiving less intensive and lower quality care.

Similarly, sexuality has become a major source of discrimination and inequity in health. Homosexual, bisexual, and transgender populations experience a wide range of health problems related to their sexuality and gender identity. One of the egregious inequities that faces LGBT individuals is discrimination from healthcare workers or institutions.

At the same time, in the United States, women have better access to healthcare than many other places in the world, in part because they have higher rates of health insurance. This trend in women reporting higher rates of insurance coverage is not unique to this population and is representative of the general population of the US.

The environment that surrounds us can influence individual behaviors and lead to poor health choices and therefore outcomes. Minority populations have increased exposure to environmental hazards that include lack of neighborhood resources, structural and community factors as well as residential segregation that result in a cycle of disease and stress. 

In conclusion, reasons for disparities in access to health care are many, but can include the following: lack of universal health care or health insurance coverage, lack of a regular source of care, lack of financial resources, legal and structural barriers. Multitude of strategies for achieving health equity and reducing disparities can be found in different scholarly texts.  

Bibliography  

House, J. S., Landis, K. R., & Umberson, D. (1988). Social relationships and health. Science, 241(4865), 540-545. Chicago. 
Weinick R. M.; Zuvekas S. H.; Cohen J. W. (2000). "Racial and ethnic differences in access to and use of health care services, 1977 to 1996. Medical care research and review". MCRR 57 (Suppl 1): 36–54.  
Gochman, David S. (1997). Handbook of health behavior research. Springer. pp. 145–147. ISBN 9780306454431.  
Merzel C (2000). "Gender differences in health care access indicators in an urban, low-income community". American Journal of Public Health 90 (6): 909–916.  
Gee, GC; Payne-Sturges D. (2004). "Environmental health disparities: A framework integrating psychosocial and environmental concepts. Environmental Health Perspectives". Environmental Health Perspectives 112 (17): 1645–1653. doi:10.1289/ehp.7074. PMC 1253653. PMID 15579407.

##Data set
The data set that will be used for this assignment is the U.S. National Epidemiological Survey on Alcohol and Related Conditions (NESARC). This survey is designed to determine the magnitude of alcohol use and psychiatric disorders in the U.S. population. It is a representative sample of the non-institutionalized population 18 years and older.

##Hypothesis
The goal for this assignment is to explore the association between socioeconomic conditions and health. It is believed that lacking of social and financial capital can compromise the capacity to maintain good health. 

##Codebook
Variables related to socioeconomic status:
* S1Q6A - HIGHEST GRADE OR YEAR OF SCHOOL COMPLETED
* S1Q9B - OCCUPATION: CURRENT OR MOST RECENT JOB
* S1Q10A - TOTAL PERSONAL INCOME IN LAST 12 MONTHS

Variables related to health:
* ETOTLCA2 - AVERAGE DAILY VOLUME OF ETHANOL CONSUMED IN PAST YEAR, FROM ALL TYPES OF ALCOHOLIC BEVERAGES COMBINED
(NOTE: Users may wish to exclude outliers)
* GENAXLIFE - GENERALIZED ANXIETY DISORDER - LIFETIME (NON-HIERARCHICAL)

In [None]:
import pandas as pd

# Load the data
df = pd.read_csv('nesarc.csv')

In [11]:
# Variables related to socioeconomic status:
# S1Q6A - HIGHEST GRADE OR YEAR OF SCHOOL COMPLETED
# S1Q9B - OCCUPATION: CURRENT OR MOST RECENT JOB
# S1Q10A - TOTAL PERSONAL INCOME IN LAST 12 MONTHS

#Variables related to health:
# ETOTLCA2 - AVERAGE DAILY VOLUME OF ETHANOL CONSUMED IN PAST YEAR, FROM ALL TYPES OF ALCOHOLIC BEVERAGES COMBINED
# (NOTE: Users may wish to exclude outliers)
# GENAXLIFE = GENERALIZED ANXIETY DISORDER - LIFETIME (NON-HIERARCHICAL)

# Print first 10 rows of the dataset
print(df[['S1Q10A', 'GENAXLIFE']].head(10))

##Looking at the data

In [44]:
import pandas as pd

# The variables that measures the first topic are: 

# Load the data
df = pd.read_csv('nesarc.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [87]:
# Print first 5 rows of the dataset
df.info()
df.shape

(43093, 3008)

In [85]:
import re

cols = df.columns.tolist()
regex = re.compile('S11AQ1A.*')

cols = [m[0] for l in cols for m in [regex.findall(l)] if m]
cols[:10]

['S11AQ1A1',
 'S11AQ1A2',
 'S11AQ1A3',
 'S11AQ1A4',
 'S11AQ1A5',
 'S11AQ1A6',
 'S11AQ1A7',
 'S11AQ1A8',
 'S11AQ1A9',
 'S11AQ1A10']

In [122]:
df.S1Q10A.value_counts()[:10]

0        2462
30000    1346
20000    1207
12000    1197
25000    1078
40000    1034
15000    1029
10000    1024
45000     958
35000     945
Name: S1Q10A, dtype: int64

In [93]:
data = pd.read_csv('nesarc.csv', usecols = cols)

In [108]:
data[df.S11AQ1A1 == 9].head()

Unnamed: 0,S11AQ1A1,S11AQ1A2,S11AQ1A3,S11AQ1A4,S11AQ1A5,S11AQ1A6,S11AQ1A7,S11AQ1A8,S11AQ1A9,S11AQ1A10,...,S11AQ1A24,S11AQ1A25,S11AQ1A26,S11AQ1A27,S11AQ1A28,S11AQ1A29,S11AQ1A30,S11AQ1A31,S11AQ1A32,S11AQ1A33
36,9,9,9,9,2,2,2,2,2,2,...,2,2,2,2,2,2,2,2,2,2
66,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
69,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
80,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9
81,9,9,9,9,9,9,9,9,9,9,...,9,9,9,9,9,9,9,9,9,9


In [98]:
len(data[df.S11AQ1A1 == 9])

1162

In [None]:
cols = list(df.columns.values)
cols.pop(cols.index('ETHRACE2A'))
cols.pop(cols.index('ETOTLCA2'))
df = df[cols+['ETHRACE2A','ETOTLCA2']]

data = pd.read_csv('nesarc.csv', usecols = ['PSU', 'CYEAR'])

In [None]:
# Variables related to antisocial personality disorder (behavior):
# S11AQ1A1 - OFTEN CUT CLASS, NOT GO TO CLASS OR GO TO SCHOOL AND LEAVE WITHOUT PERMISSION  
# S11AQ1A2 - EVER STAY OUT LATE AT NIGHT EVEN THOUGH PARENTS TOLD YOU TO STAY HOME  
# S11AQ1A3 - EVER HAVE TIME WHEN BULLIED OR PUSHED PEOPLE AROUND OR TRIED TO MAKE THEM AFRAID OF YOU  
# S11AQ1A4 - EVER RUN AWAY FROM HOME AT LEAST TWICE OR RUN AWAY AND STAY AWAY FOR A LONGER TIME  
# S11AQ1A5 - EVER HAVE A TIME WHEN OFTEN ABSENT FROM SCHOOL, OTHER THAN WHEN CARING FOR SOMEONE WHO WAS SICK 
# S11AQ1A6 - MORE THAN ONCE QUIT A JOB WITHOUT KNOWING WHERE WOULD FIND ANOTHER ONE
# S11AQ1A7 - MORE THAN ONCE QUIT A SCHOOL PROGRAM WITHOUT KNOWING WHAT WOULD DO NEXT
# S11AQ1A8 - TRAVEL FROM PLACE TO PLACE FOR 1+ MONTHS WITHOUT ADVANCE PLANS OR
# WITHOUT KNOWING HOW LONG WOULD BE GONE OR WHERE WOULD WORK
# S11AQ1A9 - EVER HAVE TIME LASTING 1+ MONTHS WHEN HAD NO REGULAR PLACE TO LIVE
# S11AQ1A10 - EVER HAVE TIME LASTING 1+ MONTHS WHEN LIVED WITH OTHERS BECAUSE DID NOT HAVE OWN PLACE TO LIVE
# S11AQ1A11 - EVER TIME HAVE WHEN YOU LIED A LOT, OTHER THAN TO AVOID BEING HURT
# S11AQ1A12 - EVER USE A FALSE OR MADE-UP NAME OR ALIAS
# S11AQ1A13 - EVER SCAM OR CON SOMEONE FOR MONEY, TO AVOID RESPONSIBILITY OR JUST FOR FUN
# S11AQ1A14 - EVER DO THINGS THAT COULD EASILY HAVE HURT YOU OR SOMEONE ELSE,
# LIKE SPEEDING OR DRIVING AFTER HAVING TOO MUCH TO DRINK
# S11AQ1A15 - EVER GET MORE THAN 3 TICKETS FOR RECKLESS/CARELESS DRIVING, SPEEDING, OR CAUSING AN ACCIDENT
# S11AQ1A16 - EVER HAVE DRIVERS LICENSE SUSPENDED OR REVOKED FOR MOVING VIOLATIONS
# S11AQ1A17 - EVER DESTROY/ BREAK/ VANDALIZE SOMEONE ELSE'S PROPERTY (CAR, HOME, ETC.)
# S11AQ1A18 - EVER START FIRE ON PURPOSE TO DESTROY SOMEONE ELSE'S PROPERTY OR JUST TO SEE IT BURN
# S11AQ1A19 - EVER FAIL TO PAY OFF DEBTS -- LIKE MOVING TO AVOID RENT, NOT MAKING PAYMENTS ON LOAN OR MORTGAGE,
# FAILING TO PAY ALIMONY OR CHILD SUPPORT OR FILING BANKRUPTCY
# S11AQ1A20 - EVER STEAL SOMETHING FROM SOMEONE/SOMEPLACE WHEN NO ONE WAS AROUND
# S11AQ1A21 - EVER FORGE SOMEONE ELSE'S SIGNATURE, LIKE ON A LEGAL DOCUMENT OR CHECK
# S11AQ1A22 - EVER SHOPLIFT
# S11AQ1A23 - EVER ROB OR MUG SOMEONE OR SNATCH A PURSE
# S11AQ1A24 - EVER MAKE MONEY ILLEGALLY, LIKE SELLING STOLEN PROPERTY OR SELLING DRUGS
# S11AQ1A25 - EVER DO SOMETHING YOU COULD HAVE BEEN ARRESTED FOR, REGARDLESS OF WHETHER YOU WERE CAUGHT OR NOT
# S11AQ1A26 - EVER FORCE SOMEONE TO HAVE SEX WITH YOU AGAINST THEIR WILL
# S11AQ1A27 - EVER GET INTO A LOT OF FIGHTS THAT YOU STARTED
# S11AQ1A28 - EVER GET INTO A FIGHT THAT CAME TO SWAPPING BLOWS WITH SOMEONE
# LIKE A HUSBAND, WIFE, BOYFRIEND OR GIRLFRIEND
# S11AQ1A29 - EVER USE A WEAPON LIKE A STICK, KNIFE OR GUN IN A FIGHT
# S11AQ1A30 - EVER HIT SOMEONE SO HARD THAT YOU INJURED THEM OR THEY HAD TO SEE A DOCTOR
# S11AQ1A31 - EVER HARASS, THREATEN OR BLACKMAIL SOMEONE
# S11AQ1A32 - EVER PHYSICALLY HURT ANOTHER PERSON IN ANY WAY ON PURPOSE
# S11AQ1A33 - EVER HURT AN ANIMAL OR PET ON PURPOSE
#
# Variables related to socioeconomic status:
# S1Q6A - HIGHEST GRADE OR YEAR OF SCHOOL COMPLETED
# S1Q9B - OCCUPATION: CURRENT OR MOST RECENT JOB
# S1Q11B - TOTAL FAMILY INCOME IN LAST 12 MONTHS: CATEGORY