MapReduce_CorpusCalculator

-The goal of the project is to find the top three sentence with the highest probability after the calculations.

-There are three sets of mappers and reducers used using the concept of chaining in order to achieve the same.

-The code can be used either on the Amazon's EMR or on Eclipse or the Linux test Script.

-The code to be executed on the EMR will require some sort of changes in the main project source files.

-The main challenges of the project was making the code working on three different environments which was extensive.

-This gave a lot of experience with the Hadoop system on the local systems and as well as on the real cloud systems such as the Amazon services.

-The first mapper concentrates on counting the words and their positions together.

-The reducer here counted the total number of words in the same position.

-The second mapper took the results of the mapper one and finds out the number of sentences with at least N words and finally the third mapper includes the total probability calculation and finding the sentences.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Op1		Op1
Op2		Op2
Op3		Op3
bin/corpuscalc		bin/corpuscalc
src/corpuscalc		src/corpuscalc
Corpus.txt		Corpus.txt
README.md		README.md
corpusCalc.jar		corpusCalc.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MapReduce_CorpusCalculator

About

Releases

Packages

Languages

divyavarshini/MapReduce_CorpusCalculator

Folders and files

Latest commit

History

Repository files navigation

MapReduce_CorpusCalculator

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages