Swift Application for Hadoop Streaming 简体中文
This project demonstrates how to build a Hadoop Map Reduce application in Swift language.
This package builds with Swift Package Manager and is part of the Perfect project. It was written to be stand-alone and so does not require PerfectLib or any other components.
Ensure you have installed and activated the latest Swift 3.0 tool chain.
We are transitioning to using JIRA for all bugs and support related issues, therefore the GitHub issues has been disabled.
If you find a mistake, bug, or any other helpful suggestion you'd like to make on the docs please head over to http://jira.perfect.org:8080/servicedesk/customer/portal/1 and raise it.
A comprehensive list of open issues can be found at http://jira.perfect.org:8080/projects/ISS/issues
This project contains two kinds of Hadoop Streaming applications: mapper and reducer. Both applications are standard console programs which read from standard input stream readLine()
and generate results into output stream print()
.
The mapper sample application is to read words one by one from the input and print them out int such a format, given the input content as Hello, world! hello!
:
hello 1
world 1
hello 1
And the objective of this sample reducer is to count every word and generate out text like:
hello 2
world 1
The combination of the both applications can provide the function of counting words in a text.
Hadoop Map Reduce is design to do these tasks for large date input, in giga bytes or tera bytes.
As standard streaming application, there is no special requirement for building these apps. Simply open the terminal console and run swift build command, as demo below:
$ cd mapper
$ swift build
$ cd ../reducer
$ swift build
Before deploying to Hadoop, you can test the both apps in such a command line (the testdata.txt is just a regular text file coded in asc-ii or UTF-8). All test examples and test scripts are available in this repo.
$ cat testdata.txt | ./mapper/.build/release/mapper | sort | ./reducer/.build/release/reducer
Equivalent to the above pipeline operations, you can try a similar command line on a Hadoop cluster:
$ mapred streaming -input /user/rockywei/input -output /user/rockywei/output -mapper /usr/local/bin/mapper -reducer /usr/local/bin/reducer
If success, you can check the output result on Hadoop Cluster:
$ hadoop fs -cat /user/rockywei/output/part-00000
Details of the map reduce command line above are explained here:
-
mapred streaming
: Submit a new map reduce application, in streaming mode, i.e., text only. -
-input /user/rockywei/input
: the data input folder on HADOOP HDFS system. Typically you should ask the hadoop administrator to help you create such as folder by using command line ofhadoop fs -mkdir
and then upload the input source text file by command linehadoop fs -put [cluster folder] /local/pathto/data.txt
. -
-output /user/rockywei/output
: the data output folder on HADOOP HDFS system. NOTE this folder should not be created, i.e, what you need to do should only create the/user/rockywei
folder and let map reduce programs to create the full path by themselves. -
-mapper /usr/local/bin/mapper
: the swift mapper app we just build. You can install it into the local file system by command ofswift build; sudo mv ./.build/release/mapper /usr/local/bin
. -
-reducer /usr/local/bin/reducer
: the swift reducer app we just build. You can install it into the local file system by command ofswift build; sudo mv ./.build/release/reducer /usr/local/bin
.
Hadoop is an eco-system for large file processing - HDFS, Map-Reduce and YARN as the most fundamental components.
As the demo above, building an app for Hadoop Streaming in Swift is easy and framework independent. However, beside building apps for Hadoop, you can even go further in Swift by the power of Perfect Hadoop - Submit the app, upload and download data, monitor all the jobs, control all nodes in cluster - all these service side activities can be manipulated in Swift language now!
For more information on the Perfect project, please visit perfect.org.