Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
DotNetMapReduceWordCount
dotnetmapreduce
ReadMe.md
docker-compose.yml

ReadMe.md

[10]用.Net Core跑Hadoop MapReduce - Streaming介紹

這個資料夾的內容是30篇文章裡面的第10章 - 介紹如何透過Hadoop Streaming用.net core去做MapReduce

詳細的內容可以參考:部落格文章-[10]用.Net Core跑Hadoop MapReduce - Streaming介紹

如何使用

以下會透過powershell指令的動作作說明 - 如果不習慣用指令也可以用對應的GUI工具。

整個指令做完等於:

  • 發佈一個net core 2.0的Map Reduce程式
  • 用docker啟動hadoop - 1個master 1個slave
  • 把net core 2.0的程式和測試資料放到hadoop
  • 執行MapReduce

完整指令是:

dotnet publish -o ${pwd}\dotnetmapreduce .\DotNetMapReduceWordCount\DotNetMapReduceWordCount.sln

docker-compose up -d

docker cp dotnetmapreduce hadoop-dotnet-master:/dotnetmapreduce

docker exec -it hadoop-dotnet-master bash

hadoop fs -mkdir -p /input
hadoop fs -copyFromLocal /dotnetmapreduce/jane_austen.txt /input
hadoop fs -ls /input


hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.7.2.jar \
	-files "/dotnetmapreduce" \
	-mapper "dotnet dotnetmapreduce/DotNetMapReduceWordCount.Mapper.dll" \
	-reducer  "dotnet dotnetmapreduce/DotNetMapReduceWordCount.Reducer.dll" \
	-input /input/* -output /output

hadoop fs -ls /output
hadoop fs -cat /output/part-00000

docker-compose down

下面拆解說明

  1. 先透過clone的方式把整個repo clone下來:
git clone https://github.com/alantsai/blog-data-science-series.git
  1. 更換到當前目錄 - 如果clone下來沒有改名操作會是:
cd blog-data-science-series\chapter-10-dotnet-mapreduce
  1. 發佈net core 2.0的MapReduce程式
dotnet publish -o ${pwd}\dotnetmapreduce .\DotNetMapReduceWordCount\DotNetMapReduceWordCount.sln
  1. 啟動hadoop
docker-compose up -d
  1. 把檔案copy到hadoop的master裡面,並且把測試檔案放到HDFS裡面並且確認有進去
docker cp dotnetmapreduce hadoop-dotnet-master:/dotnetmapreduce

docker exec -it hadoop-dotnet-master bash

hadoop fs -mkdir -p /input
hadoop fs -copyFromLocal /dotnetmapreduce/jane_austen.txt /input

hadoop fs -ls /input
  1. 執行MapReduce (這個實在hadoop master執行)
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.7.2.jar \
    -files "/dotnetmapreduce" \
    -mapper "dotnet dotnetmapreduce/DotNetMapReduceWordCount.Mapper.dll" \
    -reducer  "dotnet dotnetmapreduce/DotNetMapReduceWordCount.Reducer.dll" \
    -input /input/* -output /output
  1. 檢查執行結果
hadoop fs -ls /output
hadoop fs -cat /output/part-00000
  1. 當不需要的時候,用下面把整個docker container殺掉
docker-compose down