# [CP-113A] Analyzing the CPS ASEC 


---


### Professors: Sara Hinkey

Welcome to Analyzing the CPS ASEC! In this lab, we will introduce you to Jupyter Notebooks and the tools that you are going to use to conduct your data analysis of the Annual Social and Economic Supplement (ASEC) of the Current Population Survey (CPS). In the main part of this notebook, you will work with the CPS ASEC dataset using this Jupyter Notebook and learn how to read tabular datasets, interpret data visualizations, and compare various economic indicators. By the end of this notebook, you will also apply your knowledge to answer a set of short answer questions. 

Estimated Time: ______

---


## Table of Contents (TBD, will be updated later)

1. 

---


## Today's lab (TBD, will be updated later)


1. Navigate the Jupyter Notebook 




# Part 1: The Jupyter Notebook <a id='section 0'></a>

Before we start our lab, we want to give a brief introduction to Jupyter Notebooks (like this one) where you will work on conducting your survey analysis. 

**Jupyter notebooks** are documents that can contain a seamless compilation of text, code, visualizations, and more. A notebook is composed of two types of rectangular **cells**:  markdown and code. A **markdown cell**, such as this one, contains text. A **code cell** contains code. All of the code in this notebook is in a programming language called **Python**. You can select any cell by clicking it once. After a cell is selected, you can navigate the notebook using the up and down arrow keys or by simply scrolling.

### 1.1 Run a cell <a id='subsection 0a'></a>
To run a code cell once it's been selected, 
- press `Shift` + `Enter`, or
- click the Run button in the toolbar at the top of the screen. 

If a code cell is running, you will see an asterisk (\*) appear in the square brackets to the left of the cell. Once the cell has finished running, a number corresponding to the order in which the cell was run will replace the asterisk and any output from the code will appear under the cell.

### 1.2 Editing a cell <a id='subsection 0c'></a>

**Question 1.2.1** You can edit a Markdown cell by clicking it twice. Text in Markdown cells is written in [**Markdown**](https://daringfireball.net/projects/markdown/), a formatting syntax for plain text, so you may see some funky symbols when you edit a text cell. Once you've made your changes, you can exit text editing mode by running the cell. Edit the next cell to fix the misspelling.

This is an analysis of economic survy data.

### 1.3 Saving and loading <a id='subsection 0d'></a>

#### Saving and Loading

Your notebook can record all of your text and code edits, as well as any graphs you generate or calculations you make. You can save the notebook in its current state by clicking `Control-S`/`Command-S`, clicking the **floppy disc icon** in the toolbar at the top of the page, or by navigating to **File > Save and Checkpoint** in the menu bar.

The next time you open the notebook, it will look the same as when you last saved it.

**Note:** After loading a notebook you will see all the outputs (graphs, computations, etc) from your last session, but you won't be able to use any variables you assigned or functions you defined. You can get the functions and variables back by re-running the cells where they were defined – the easiest way is to **highlight the cell where you left off work, then go to Cell > Run all above** in the menu bar. You can also use this menu to run all cells in the notebook by clicking **Run all**.

**Please run the cell below to load the modules we will be using through this notebook.**

In [64]:
import datascience 
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline

#data = Table.read_table('cleanedData.csv')
df = pd.read_csv('originalData.csv', dtype={'age': int})

# Part 2: Understanding the Dataset

In [65]:
mapper = {'year':'Year', 'hhwt':'Household Weight', 'statefip':'State', 'met2013':'County', 'city':'City', 'ownershp':'Ownership Status', 'hhincome':'Household Income', 'foodstmp':"Foodstamp Recipient", 'age':'Age', 'race':'Race', 'hispan':'Hispanic', 'educ':'Education Status', 'empstat':'Employment Status', 'labforce':'Part of Labor Force', 'occ':'Occupation', 'ind':'Industry', 'uhrswork':'Hours worked per week', 'ftotinc':'Pre-Tax Income', 'incwage':'Pre-Tax Wages', 'poverty':'Poverty Status', 'classwkr':'Type of Worker', 'classwkrd':'Employment Sector', 'inctot':'Personal Pre-Tax Income'}

In [67]:
df

Unnamed: 0,year,hhwt,statefip,countyfip,met2013,city,ownershp,hhincome,foodstmp,perwt,...,ind,classwkr,classwkrd,uhrswork,inctot,ftotinc,incwage,poverty,pwcounty,pwmet13
0,2007,266,California,25,"El Centro, CA",Not in identifiable city (or size group),Rented,15000,No,270,...,"Agriculture, Hunting, Forestry",Works for wages,Wage/salary at non-profit,40,15000,15000,15000,Near Poverty,25,"El Centro, CA"
1,2007,67,California,0,Not in identifiable area,Not in identifiable city (or size group),Owned or being bought (loan),72000,No,66,...,Health Care,Works for wages,"Wage/salary, private",45,70000,72000,70000,Non-Poverty,0,Not in identifiable area
2,2007,67,California,0,Not in identifiable area,Not in identifiable city (or size group),Owned or being bought (loan),72000,No,86,...,Retail Trade,Works for wages,"Wage/salary, private",20,2000,72000,2000,Non-Poverty,0,Not in identifiable area
3,2007,91,California,67,"Sacramento--Roseville--Arden-Arcade, CA",Not in identifiable city (or size group),Owned or being bought (loan),147800,No,92,...,Company Management,Works for wages,Wage/salary at non-profit,25,147800,147800,86000,Non-Poverty,67,"Sacramento--Roseville--Arden-Arcade, CA"
4,2007,193,California,1,"San Francisco-Oakland-Hayward, CA",Not in identifiable city (or size group),Rented,52000,No,193,...,Finance and Insurance,Works for wages,"Wage/salary, private",40,52000,52000,52000,Non-Poverty,1,"San Francisco-Oakland-Hayward, CA"
5,2007,92,California,37,"Los Angeles-Long Beach-Anaheim, CA","Long Beach, CA",Rented,129000,No,91,...,Finance and Insurance,Works for wages,"Wage/salary, private",50,120000,129000,120000,Non-Poverty,59,"Los Angeles-Long Beach-Anaheim, CA"
6,2007,92,California,37,"Los Angeles-Long Beach-Anaheim, CA","Long Beach, CA",Rented,129000,No,92,...,Educational Services,Works for wages,"Wage/salary, private",25,9000,129000,9000,Non-Poverty,37,"Los Angeles-Long Beach-Anaheim, CA"
7,2007,73,California,0,Not in identifiable area,Not in identifiable city (or size group),Owned or being bought (loan),38500,No,73,...,Company Management,Works for wages,"Wage/salary, private",36,38500,38500,19600,Non-Poverty,0,Not in identifiable area
8,2007,83,California,67,"Sacramento--Roseville--Arden-Arcade, CA",Not in identifiable city (or size group),Owned or being bought (loan),158000,No,81,...,Health Care,Works for wages,"Wage/salary, private",40,48000,48000,48000,Non-Poverty,67,"Sacramento--Roseville--Arden-Arcade, CA"
9,2007,83,California,67,"Sacramento--Roseville--Arden-Arcade, CA",Not in identifiable city (or size group),Owned or being bought (loan),158000,No,90,...,Health Care,Works for wages,"Wage/salary, private",40,110000,48000,110000,Non-Poverty,67,"Sacramento--Roseville--Arden-Arcade, CA"


In [68]:
df['classwkrd'] = df['classwkrd'].str.replace('Wage/salary', '').str.strip(',').str.replace('at ', '')

In [69]:
df.columns

Index(['year', 'hhwt', 'statefip', 'countyfip', 'met2013', 'city', 'ownershp',
       'hhincome', 'foodstmp', 'perwt', 'age', 'race', 'hispan', 'educ',
       'empstat', 'empstatd', 'labforce', 'occ', 'ind', 'classwkr',
       'classwkrd', 'uhrswork', 'inctot', 'ftotinc', 'incwage', 'poverty',
       'pwcounty', 'pwmet13'],
      dtype='object')

In [70]:
new_df = df.drop(columns = ['countyfip', 'pwmet13', 'pwcounty', 'perwt', 'empstatd']).rename(columns = mapper)

In [71]:
new_df.columns

Index(['Year', 'Household Weight', 'State', 'County', 'City',
       'Ownership Status', 'Household Income', 'Foodstamp Recipient', 'Age',
       'Race', 'Hispanic', 'Education Status', 'Employment Status',
       'Part of Labor Force', 'Occupation', 'Industry', 'Type of Worker',
       'Employment Sector', 'Hours worked per week', 'Personal Pre-Tax Income',
       'Pre-Tax Income', 'Pre-Tax Wages', 'Poverty Status'],
      dtype='object')

In [72]:
new_df.head()

Unnamed: 0,Year,Household Weight,State,County,City,Ownership Status,Household Income,Foodstamp Recipient,Age,Race,...,Part of Labor Force,Occupation,Industry,Type of Worker,Employment Sector,Hours worked per week,Personal Pre-Tax Income,Pre-Tax Income,Pre-Tax Wages,Poverty Status
0,2007,266,California,"El Centro, CA",Not in identifiable city (or size group),Rented,15000,No,32,White,...,"Yes, in the labor force","Farming, Fishing","Agriculture, Hunting, Forestry",Works for wages,non-profit,40,15000,15000,15000,Near Poverty
1,2007,67,California,Not in identifiable area,Not in identifiable city (or size group),Owned or being bought (loan),72000,No,52,White,...,"Yes, in the labor force",Healthcare Practitioners,Health Care,Works for wages,private,45,70000,72000,70000,Non-Poverty
2,2007,67,California,Not in identifiable area,Not in identifiable city (or size group),Owned or being bought (loan),72000,No,17,White,...,"Yes, in the labor force",Transportation of Materials,Retail Trade,Works for wages,private,20,2000,72000,2000,Non-Poverty
3,2007,91,California,"Sacramento--Roseville--Arden-Arcade, CA",Not in identifiable city (or size group),Owned or being bought (loan),147800,No,71,White,...,"Yes, in the labor force",Life & Social Sciences,Company Management,Works for wages,non-profit,25,147800,147800,86000,Non-Poverty
4,2007,193,California,"San Francisco-Oakland-Hayward, CA",Not in identifiable city (or size group),Rented,52000,No,34,Two or more races,...,"Yes, in the labor force",Office & Administrative Support,Finance and Insurance,Works for wages,private,40,52000,52000,52000,Non-Poverty


In [73]:
new_df.to_csv('fullData.csv', index = False, header = True)

# Part 3: Data Analysis

## A) Wages by industry and occupation

**Insert code and analysis, as well as short answer questions.**

## B) Median income

**Insert code and analysis, as well as short answer questions.**

## C) Education level 

**Insert code and analysis, as well as short answer questions.**

## D) Indicators of poverty

**Insert code and analysis, as well as short answer questions.**

## E) Employment

**Insert code and analysis, as well as short answer questions.**

## Bibliography <a id = 'section7'></a>

• (TBD)

• (TBD)

• (TBD)

• (TBD)

___
### Getting extra help

Interested in getting help with learning Python or applying computational analysis? Check out  [Data Peer Consulting](https://data.berkeley.edu/education/data-peer-consulting) in Moffitt library for drop-in, one-on-one questions. For additional workshops designed for people new to computational analysis, take a look at the workshops at [The Dlab](https://dlab.berkeley.edu) (free for Berkeley students!). 

Best  luck!

------------------------------------------------------------------------------------------------------------------------

### Feedback:
Please let us know your thoughts on this notebook!

Fill out the survey at this link: https://docs.google.com/forms/d/e/1FAIpQLSfahkYSKqlEEfC6WMKlaqeIxRVj0r7T4N5lgBf9bRVwRG58wQ/viewform

------------------------------------------------------------------------------------------------------------------------
Notebook developed by: Ritvik Iyer, Carlos Calderon

Data Science Modules: http://data.berkeley.edu/education/modules