Skip to content

Commit

Permalink
Docs: monitoring, streaming programming guide
Browse files Browse the repository at this point in the history
Fix several awkward wordings and grammatical issues in the following
documents:

*   docs/monitoring.md

*   docs/streaming-programming-guide.md
  • Loading branch information
kennyballou committed Jul 30, 2014
1 parent e3d85b7 commit e1b8ad6
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions docs/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ application's UI after the application has finished.

If Spark is run on Mesos or YARN, it is still possible to reconstruct the UI of a finished
application through Spark's history server, provided that the application's event logs exist.
You can start a the history server by executing:
You can start the history server by executing:

./sbin/start-history-server.sh

Expand Down Expand Up @@ -106,7 +106,7 @@ follows:
<td>
Indicates whether the history server should use kerberos to login. This is useful
if the history server is accessing HDFS files on a secure Hadoop cluster. If this is
true it looks uses the configs <code>spark.history.kerberos.principal</code> and
true, it uses the configs <code>spark.history.kerberos.principal</code> and
<code>spark.history.kerberos.keytab</code>.
</td>
</tr>
Expand Down
4 changes: 2 additions & 2 deletions docs/streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -939,7 +939,7 @@ Receiving multiple data streams can therefore be achieved by creating multiple i
and configuring them to receive different partitions of the data stream from the source(s).
For example, a single Kafka input stream receiving two topics of data can be split into two
Kafka input streams, each receiving only one topic. This would run two receivers on two workers,
thus allowing data to received in parallel, and increasing overall throughput.
thus allowing data to be received in parallel, and increasing overall throughput.

Another parameter that should be considered is the receiver's blocking interval. For most receivers,
the received data is coalesced together into large blocks of data before storing inside Spark's memory.
Expand Down Expand Up @@ -980,7 +980,7 @@ If the number of tasks launched per second is high (say, 50 or more per second),
of sending out tasks to the slaves maybe significant and will make it hard to achieve sub-second
latencies. The overhead can be reduced by the following changes:

* **Task Serialization**: Using Kryo serialization for serializing tasks can reduced the task
* **Task Serialization**: Using Kryo serialization for serializing tasks can reduce the task
sizes, and therefore reduce the time taken to send them to the slaves.

* **Execution mode**: Running Spark in Standalone mode or coarse-grained Mesos mode leads to
Expand Down

0 comments on commit e1b8ad6

Please sign in to comment.