Skip to content

avaiyang/Hadoop-HDFS-MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop-HDFS-MapReduce

Write a MapReduce code to count phrases (pair of words, taken sequentially) in lines of text as follows. The default input key for the TextInput format is the line number. In this program , the required output is the count of word pairs.

NOTE: punctuation will NOT count; so the words is ‘(1991)’ and ‘1991’ are the same. Therefore, you must parse your file: remove all characters not in this set: [a-z, A-Z, 0-9] ; ‘the-cat’ and ‘the cat’ should be counted as the same pair.

e.g. line:

Input: (k,v) = (1,“the cat in the hat is the best cat in the hat”)

Output: (k2,v2) = (“the cat”, 1), (“cat in”,2),(‘in the’,2),(‘the hat’,2’),(‘hat is’,1), etc

If the line contains a single word, then that becomes the ‘pair’. For this code,use the input file: adventures.txt

Note: The code MUST run in Hadoop.

Releases

No releases published

Packages

No packages published

Languages