Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d9fb543
commit 2153178
Showing
2 changed files
with
80 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
build/ | ||
dist/ | ||
.idea/ | ||
*.ipr | ||
*.iml | ||
*.iws | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
An InputFormat to work with splittable multi-line JSON | ||
====================================================== | ||
|
||
## Motivation | ||
|
||
Currently there don't seem to be any JSON InputFormat classes that can support multi-line JSON. | ||
|
||
## License | ||
|
||
Apache licensed. | ||
|
||
## Usage | ||
|
||
To get started, simply: | ||
|
||
1. Download, and run ant | ||
2. Include the `dist/lib/json-mapreduce-1.0.jar` in your environment | ||
3. Utilize the `MultiLineJsonInputFormat` class as your Mapper InputFormat | ||
|
||
|
||
Assume you have some JSON that looks like this: | ||
|
||
<pre><code>{"menu": { | ||
"header": "SVG Viewer", | ||
"items": [ | ||
{"id": "Open"}, | ||
{"id": "OpenNew", "label": "Open New"}, | ||
null, | ||
{"id": "ZoomIn", "label": "Zoom In"}, | ||
{"id": "ZoomOut", "label": "Zoom Out"}, | ||
{"id": "OriginalView", "label": "Original View"}, | ||
null, | ||
{"id": "Quality"}, | ||
{"id": "Pause"}, | ||
{"id": "Mute"}, | ||
null, | ||
{"id": "Find", "label": "Find..."}, | ||
{"id": "FindAgain", "label": "Find Again"}, | ||
{"id": "Copy"}, | ||
{"id": "CopyAgain", "label": "Copy Again"}, | ||
{"id": "CopySVG", "label": "Copy SVG"}, | ||
{"id": "ViewSVG", "label": "View SVG"}, | ||
{"id": "ViewSource", "label": "View Source"}, | ||
{"id": "SaveAs", "label": "Save As"}, | ||
null, | ||
{"id": "Help"}, | ||
{"id": "About", "label": "About Adobe CVG Viewer..."} | ||
] | ||
}}</code></pre> | ||
|
||
With the MultiLineJsonInputFormat you must indicate the member name which it will use to determine the | ||
encapsulating object to return to your Mapper. If for example we wanted all the objects that contained | ||
`"id"`, then we would do the following: | ||
|
||
<pre><code>Configuration conf = new Configuration(); | ||
Job job = new Job(conf); | ||
job.setMapperClass(...); | ||
job.setReducerClass(...); | ||
job.setInputFormatClass(MultiLineJsonInputFormat.class); | ||
MultiLineJsonInputFormat.setInputJsonMember(job, "id"); | ||
</code></pre> | ||
|
||
The InputFormat gives you the JSON object in string form: | ||
|
||
<pre><code>public static class Map extends Mapper<LongWritable, Text, LongWritable, Text> { | ||
|
||
@Override | ||
protected void map(LongWritable key, Text value, Context context) | ||
throws IOException, InterruptedException { | ||
context.write(key, value); | ||
} | ||
} | ||
</code></pre> |