GitHub - BrooksIan/CensusSIPP: Reprodicing Census SIPP Reports Using Apache Spark

Data Science in Apache Spark

Census - SIPP Workbook

Report Building

Level: Easy Language: Scala

Requirements:

[HDP 2.6.X]
Spark 2.x

Author: Ian Brooks Follow [LinkedIn - Ian Brooks PhD] (https://www.linkedin.com/in/ianrbrooksphd/)

Source File Description

Upload the source data file [CFS 2012 csv] (https://www.census.gov/econ/cfs/pums.html) to HDFS in the /tmp directory

##Column Descriptions

Pre-Run Instructions

For HDP with Apache Zeppelin

Log into Apache Ambari
In Ambari, select "Files View" and upload all of the CSV files to the /tmp/ directory. For assistance, please use the following tutorial.
Upload the source data file [CFS 2012 csv] (https://www.census.gov/econ/cfs/pums.html) to HDFS in the /tmp directory
Upload helper files to the HDFS in the /tmp directory Upload all of the helper files to HDFS in the /tmp directory

a. SIPP08A.csv

b. SIPP08B.csv

c. SIPP08C.csv

d. SIPP08D.csv

In Zeppelin, download the Zeppelin Note JSON file. For assistance, please use the following tutorial

For Cloudera Data Science Workbench

Log into CDSW and upload the project
Open a terminal on a session and run the loaddata.sh script

License

Unlike all other Apache projects which use Apache license, this project uses an advanced and modern license named The Star And Thank Author License (SATA). Please see the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
loaddata.sh		loaddata.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science in Apache Spark

Census - SIPP Workbook

Report Building

Source File Description

Pre-Run Instructions

For HDP with Apache Zeppelin

For Cloudera Data Science Workbench

License

About

Releases

Packages

Languages

License

BrooksIan/CensusSIPP

Folders and files

Latest commit

History

Repository files navigation

Data Science in Apache Spark

Census - SIPP Workbook

Report Building

Source File Description

Pre-Run Instructions

For HDP with Apache Zeppelin

For Cloudera Data Science Workbench

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages