Skip to content

Commit

Permalink
[DOC] Fix typo
Browse files Browse the repository at this point in the history
Fix #301
  • Loading branch information
costin committed Nov 16, 2014
1 parent 353a912 commit d24f2bf
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/src/reference/asciidoc/core/spark.adoc
Expand Up @@ -534,7 +534,7 @@ Through {eh}, Spark can integrate with {es} through its dedicated `InputFormat`,

In short, one needs to setup a basic Hadoop +Configuration+ object with the target {es} cluster and index, potentially a query, and she's good to go.

From Spark's perspective, they only thing required is setting up serialization - Spark relies by default on Java serialization which is convenient but fairly inefficient. This is the reason why Hadoop itself introduced its own serialization mechanism and its own types - namely ++Writable++s. As such, +InputFormat+ and ++OutputFormat++s are required to return +Writables+ which, out of the box, Spark does not understand.
From Spark's perspective, the only thing required is setting up serialization - Spark relies by default on Java serialization which is convenient but fairly inefficient. This is the reason why Hadoop itself introduced its own serialization mechanism and its own types - namely ++Writable++s. As such, +InputFormat+ and ++OutputFormat++s are required to return +Writables+ which, out of the box, Spark does not understand.
The good news is, one can easily enable a different serialization (https://github.com/EsotericSoftware/kryo[Kryo]) which handles the conversion automatically and also does this quite efficiently.

[source,java]
Expand Down

0 comments on commit d24f2bf

Please sign in to comment.