GitHub - kevinweil/IntegerListInputFormat: An input format for divvying up a range of input values to Hadoop mappers. Set the min, max, and number of splits, and each mapper will get an approximately equal number of input values.

An adaptation of codazzo's MultiRowInputFormat at http://github.com/codazzo/MultiRow.

This input format splits a range of integers into any number of input splits for use in a Hadoop job. This is useful when you need to, for example, crawl an id space. If you want to act in parallel on input values from 19 to 500 million with 917 mappers, you would configure as follows:

In your main/run method of your Hadoop job driver class, add


Job job = new Job(new Configuration());
...

job.setInputFormatClass(IntegerListInputFormat.class);

IntegerListInputFormat.setListInterval(19, 500000000);
IntegerListInputFormat.setNumSplits(917);

Then, make your mapper take a LongWritable as the key and a NullWritable as the value:


public static class MyMapper extends Mapper<LongWritable, NullWritable, ..., ...> {
    protected void map(LongWritable key, NullWritable value, Context context) throws IOException, InterruptedException {
        // Do something with the id.
        long id = key.get();
        ...
    }
}

The LongWritable key is the input value. That's it!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
libs		libs
src/com/twitter/twadoop/mapreduce/input		src/com/twitter/twadoop/mapreduce/input
test/com/twitter/twadoop/mapreduce/input		test/com/twitter/twadoop/mapreduce/input
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libs

libs

src/com/twitter/twadoop/mapreduce/input

src/com/twitter/twadoop/mapreduce/input

test/com/twitter/twadoop/mapreduce/input

test/com/twitter/twadoop/mapreduce/input

README.md

README.md

Repository files navigation

About

Releases

Packages

Languages

kevinweil/IntegerListInputFormat

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages