Level: Easy Language: Scala
Requirements:
- [HDP 2.6.X]
- Spark 2.x
Author: Ian Brooks Follow [LinkedIn - Ian Brooks PhD] (https://www.linkedin.com/in/ianrbrooksphd/)
Upload the source data file [CFS 2012 csv] (https://www.census.gov/econ/cfs/pums.html) to HDFS in the /tmp directory
##Column Descriptions
-
Log into Apache Ambari
-
In Ambari, select "Files View" and upload all of the CSV files to the /tmp/ directory. For assistance, please use the following tutorial.
-
Upload the source data file [CFS 2012 csv] (https://www.census.gov/econ/cfs/pums.html) to HDFS in the /tmp directory
-
Upload helper files to the HDFS in the /tmp directory Upload all of the helper files to HDFS in the /tmp directory
a. SIPP08A.csv
b. SIPP08B.csv
c. SIPP08C.csv
d. SIPP08D.csv
- In Zeppelin, download the Zeppelin Note JSON file. For assistance, please use the following tutorial
- Log into CDSW and upload the project
- Open a terminal on a session and run the loaddata.sh script
Unlike all other Apache projects which use Apache license, this project uses an advanced and modern license named The Star And Thank Author License (SATA). Please see the LICENSE file for more information.