Skip to content

BrooksIan/CensusSIPP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Science in Apache Spark

Census - SIPP Workbook

Report Building

Level: Easy Language: Scala

Requirements:

  • [HDP 2.6.X]
  • Spark 2.x

Author: Ian Brooks Follow [LinkedIn - Ian Brooks PhD] (https://www.linkedin.com/in/ianrbrooksphd/)

Census

Source File Description

Upload the source data file [CFS 2012 csv] (https://www.census.gov/econ/cfs/pums.html) to HDFS in the /tmp directory

##Column Descriptions

Pre-Run Instructions

For HDP with Apache Zeppelin

  1. Log into Apache Ambari

  2. In Ambari, select "Files View" and upload all of the CSV files to the /tmp/ directory. For assistance, please use the following tutorial.

  3. Upload the source data file [CFS 2012 csv] (https://www.census.gov/econ/cfs/pums.html) to HDFS in the /tmp directory

  4. Upload helper files to the HDFS in the /tmp directory Upload all of the helper files to HDFS in the /tmp directory

a. SIPP08A.csv

b. SIPP08B.csv

c. SIPP08C.csv

d. SIPP08D.csv

  1. In Zeppelin, download the Zeppelin Note JSON file. For assistance, please use the following tutorial

For Cloudera Data Science Workbench

  1. Log into CDSW and upload the project
  2. Open a terminal on a session and run the loaddata.sh script

License

Unlike all other Apache projects which use Apache license, this project uses an advanced and modern license named The Star And Thank Author License (SATA). Please see the LICENSE file for more information.

About

Reprodicing Census SIPP Reports Using Apache Spark

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages