Skip to content

daniefer/MapReduce-WordAndArticleCount

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce-WordAndArticleCount

MapReduce program for processing the Wikipedia data set. Each Article on Wikipedia is line separated in a 32GB text file. Each line is tab separated into title, last update date and time, content, and external links. This program has two options, Count the five most common words in articles who's title contains a supplied keyword or count the number of article that contains the supplied keyword

About

MapReduce program for processing the Wikipedia data set. Each Article on Wikipedia is line separated in a 32GB text file. Each line is tab separated into title, last update date and time, content, and external links. This program has two options, Count the five most common words in articles who's title contains a supplied keyword or count the nu…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages