MapReduce

Learning Hadoop MapReduce

Let's see an WordCount example

Step 1:

Install docker

Step 2: Start Hadoop

Follow this tutorial to start a hadoop cluster using docker.

Use docker-compose.yml file from the gist.

Step 3: Install python in the nodes

Open terminal in each node of the hadoop cluster and install python3 in it.

To open bash in a cluster: docker exec -it namenode bash (change namenode to whichever node you want to open)
To install python3: apt update && apt upgrade && apt install python3

To check the name of the nodes: docker ps or open Docker Desktop on windows/mac

Enough tel hole ami docker-compose.yml file ta edit kore dibo, taile ar koshto kore ei step kora lagbe na.

Step 4: Create `mapper.py`

Currently I'm just sharing the codes, again, enough tel hole describe korbo pore. Plus, python easy, code porlei bujha jay, ki hocche...

#!/usr/bin/python3
# -*-coding:utf-8 -*

import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print(word + "\t1")

Step 5: Create `reducer.py`

#!/usr/bin/python3
# -*-coding:utf-8 -*

from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)

    try:
        count = int(count)
    except ValueError:
        continue
    
    if current_word == word:
        current_count += count
    else:
        if current_word:
            print(current_word + "\t" + str(current_count))
        current_count = count
        current_word = word

if current_word == word:
    print(current_word + "\t" + str(current_count))

Step 6: Testing locally

To test it locally you can create a text file with ja khushi ta inside it and run:

cat input.txt | python mapper.py | sort -k1,1 | python reducer.py

Mood ashle pore shob explain korbo

Step 7: Copy files to namenode

Copy your input.txt, mapper.py and reducer.py to hadoop namenode:

docker cp input.txt eff4f966c9ef:MapReduceTut/input.txt
docker cp mapper.py eff4f966c9ef:MapReduceTut/mapper.py
docker cp reducer.py eff4f966c9ef:MapReduceTut/reducer.py

# Write your namenode hash instead of eff4f966c9ef
# And I created a folder called MapReduceTut in hadoop to organize things easily, so I'm copying to that folder.

Step 8: Let's go too hadooooooop

At first open terminal in the hadoop namenode: docker exec -it namenode bash

Then put your input file on hdfs in a new folder:

hdfs dfs -mkdir input
hdfs dfs -put input.txt input/input.txt

# Why new folder? Idk!

Step 9: Time to run MapReduce

Just run:

mapred streaming \
-input input/input.txt \
-output output \
-mapper "python3 mapper.py" \
-reducer "python3 reducer.py" \
-file /MapReduceTut/mapper.py \
-file /MapReduceTut/reducer.py

Insha'Allah pore eita o explain korbo

Step 10: Pray

Vai, dua kor, oju kore namaj pore ay, time lagbe ager step e onek... Allah rohom na korle fail korbe!

Step 11: Output

Allahr rohomot e kaj hoye gele output dekhar jonno: hdfs dfs -cat output/*

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
SalesJan2009		SalesJan2009
WordCount		WordCount
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SalesJan2009

SalesJan2009

WordCount

WordCount

.gitignore

.gitignore

README.md

README.md

Repository files navigation

MapReduce

Let's see an WordCount example

Step 1:

Step 2: Start Hadoop

Step 3: Install python in the nodes

Step 4: Create `mapper.py`

Step 5: Create `reducer.py`

Step 6: Testing locally

Step 7: Copy files to namenode

Step 8: Let's go too hadooooooop

Step 9: Time to run MapReduce

Step 10: Pray

Step 11: Output

About

Languages

KhanShaheb34/MapReduce

Folders and files

Latest commit

History

Repository files navigation

MapReduce

Let's see an WordCount example

Step 1:

Step 2: Start Hadoop

Step 3: Install python in the nodes

Step 4: Create mapper.py

Step 5: Create reducer.py

Step 6: Testing locally

Step 7: Copy files to namenode

Step 8: Let's go too hadooooooop

Step 9: Time to run MapReduce

Step 10: Pray

Step 11: Output

About

Topics

Resources

Stars

Watchers

Forks

Languages

Step 4: Create `mapper.py`

Step 5: Create `reducer.py`