From 21531789d380ac5ec62bcec480469b693ea3ecc9 Mon Sep 17 00:00:00 2001 From: Alex Holmes Date: Sat, 5 Nov 2011 22:29:22 -0400 Subject: [PATCH] readme --- .gitignore | 7 ++++++ README.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+) create mode 100644 .gitignore create mode 100644 README.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..8107765 --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +build/ +dist/ +.idea/ +*.ipr +*.iml +*.iws +.DS_Store diff --git a/README.md b/README.md new file mode 100644 index 0000000..eac27f8 --- /dev/null +++ b/README.md @@ -0,0 +1,73 @@ +An InputFormat to work with splittable multi-line JSON +====================================================== + +## Motivation + +Currently there don't seem to be any JSON InputFormat classes that can support multi-line JSON. + +## License + +Apache licensed. + +## Usage + +To get started, simply: + +1. Download, and run ant +2. Include the `dist/lib/json-mapreduce-1.0.jar` in your environment +3. Utilize the `MultiLineJsonInputFormat` class as your Mapper InputFormat + + +Assume you have some JSON that looks like this: + +
{"menu": {
+    "header": "SVG Viewer",
+    "items": [
+        {"id": "Open"},
+        {"id": "OpenNew", "label": "Open New"},
+        null,
+        {"id": "ZoomIn", "label": "Zoom In"},
+        {"id": "ZoomOut", "label": "Zoom Out"},
+        {"id": "OriginalView", "label": "Original View"},
+        null,
+        {"id": "Quality"},
+        {"id": "Pause"},
+        {"id": "Mute"},
+        null,
+        {"id": "Find", "label": "Find..."},
+        {"id": "FindAgain", "label": "Find Again"},
+        {"id": "Copy"},
+        {"id": "CopyAgain", "label": "Copy Again"},
+        {"id": "CopySVG", "label": "Copy SVG"},
+        {"id": "ViewSVG", "label": "View SVG"},
+        {"id": "ViewSource", "label": "View Source"},
+        {"id": "SaveAs", "label": "Save As"},
+        null,
+        {"id": "Help"},
+        {"id": "About", "label": "About Adobe CVG Viewer..."}
+    ]
+}}
+ +With the MultiLineJsonInputFormat you must indicate the member name which it will use to determine the +encapsulating object to return to your Mapper. If for example we wanted all the objects that contained +`"id"`, then we would do the following: + +
Configuration conf = new Configuration();
+Job job = new Job(conf);
+job.setMapperClass(...);
+job.setReducerClass(...);
+job.setInputFormatClass(MultiLineJsonInputFormat.class);
+MultiLineJsonInputFormat.setInputJsonMember(job, "id");
+
+ +The InputFormat gives you the JSON object in string form: + +
public static class Map extends Mapper {
+
+  @Override
+  protected void map(LongWritable key, Text value, Context context)
+                     throws IOException, InterruptedException {
+    context.write(key, value);
+  }
+}
+