# Demographics and Employment in the United States

In the wake of the Great Recession of 2009, there has been a good deal of focus on employment statistics, one of the most important metrics policymakers use to gauge the overall strength of the economy. In the United States, the government measures unemployment using the **Current Population Survey (CPS)**, which collects demographic and employment information from a wide range of Americans each month. In this exercise, we will employ the topics reviewed in the lectures as well as a few new techniques using the September 2013 version of this rich, nationally representative dataset.

The observations in the dataset represent people surveyed in the September 2013 CPS who actually completed a survey. While the full dataset has 385 variables, in this exercise we will use a more compact version of the dataset, CPSData.csv. How many interviewees are in the dataset?

In [2]:
CPS <- read.csv('data/CPSData.csv')
head(CPS)

Unnamed: 0_level_0,PeopleInHousehold,Region,State,MetroAreaCode,Age,Married,Sex,Education,Race,Hispanic,CountryOfBirthCode,Citizenship,EmploymentStatus,Industry
Unnamed: 0_level_1,<int>,<fct>,<fct>,<int>,<int>,<fct>,<fct>,<fct>,<fct>,<int>,<int>,<fct>,<fct>,<fct>
1,1,South,Alabama,26620,85,Widowed,Female,Associate degree,White,0,57,"Citizen, Native",Retired,
2,3,South,Alabama,13820,21,Never Married,Male,High school,Black,0,57,"Citizen, Native",Unemployed,Professional and business services
3,3,South,Alabama,13820,37,Never Married,Female,High school,Black,0,57,"Citizen, Native",Disabled,
4,3,South,Alabama,13820,18,Never Married,Male,No high school diploma,Black,0,57,"Citizen, Native",Not in Labor Force,
5,3,South,Alabama,26620,52,Widowed,Female,Associate degree,White,0,57,"Citizen, Native",Employed,Professional and business services
6,3,South,Alabama,26620,24,Never Married,Male,Bachelor's degree,White,0,57,"Citizen, Native",Employed,Educational and health services


In [3]:
str(CPS)

'data.frame':	131302 obs. of  14 variables:
 $ PeopleInHousehold : int  1 3 3 3 3 3 3 2 2 2 ...
 $ Region            : Factor w/ 4 levels "Midwest","Northeast",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ State             : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ MetroAreaCode     : int  26620 13820 13820 13820 26620 26620 26620 33660 33660 26620 ...
 $ Age               : int  85 21 37 18 52 24 26 71 43 52 ...
 $ Married           : Factor w/ 5 levels "Divorced","Married",..: 5 3 3 3 5 3 3 1 1 3 ...
 $ Sex               : Factor w/ 2 levels "Female","Male": 1 2 1 2 1 2 2 1 2 2 ...
 $ Education         : Factor w/ 8 levels "Associate degree",..: 1 4 4 6 1 2 4 4 4 2 ...
 $ Race              : Factor w/ 6 levels "American Indian",..: 6 3 3 3 6 6 6 6 6 6 ...
 $ Hispanic          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ CountryOfBirthCode: int  57 57 57 57 57 57 57 57 57 57 ...
 $ Citizenship       : Factor w/ 3 levels "Citizen, Native",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ EmploymentSt

In [4]:
summary(CPS)

 PeopleInHousehold       Region               State       MetroAreaCode  
 Min.   : 1.000    Midwest  :30684   California  :11570   Min.   :10420  
 1st Qu.: 2.000    Northeast:25939   Texas       : 7077   1st Qu.:21780  
 Median : 3.000    South    :41502   New York    : 5595   Median :34740  
 Mean   : 3.284    West     :33177   Florida     : 5149   Mean   :35075  
 3rd Qu.: 4.000                      Pennsylvania: 3930   3rd Qu.:41860  
 Max.   :15.000                      Illinois    : 3912   Max.   :79600  
                                     (Other)     :94069   NA's   :34238  
      Age                 Married          Sex       
 Min.   : 0.00   Divorced     :11151   Female:67481  
 1st Qu.:19.00   Married      :55509   Male  :63821  
 Median :39.00   Never Married:30772                 
 Mean   :38.83   Separated    : 2027                 
 3rd Qu.:57.00   Widowed      : 6505                 
 Max.   :85.00   NA's         :25338                 
                              