-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[hotfix][docs]Review to reduce passive voice, improve grammar and formatting #5277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| [Savepoints](../ops/state/savepoints.html) are **manually triggered checkpoints**, which take a snapshot of the program and write it out to a state backend. They rely on the regular checkpointing mechanism for this. During execution, programs are periodically snapshotted on the worker nodes and produce checkpoints. You only need the last completed checkpoint for recovery, and you can safely discard older checkpoints as soon as a new one is completed. | ||
|
|
||
| Savepoints are similar to these periodic checkpoints except that they are **triggered by the user** and **don't automatically expire** when newer checkpoints are completed. Savepoints can be created from the [command line](../ops/cli.html#savepoints) or when cancelling a job via the [REST API](../monitoring/rest_api.html#cancel-job-with-savepoint). | ||
| Savepoints are similar to these periodic checkpoints except that they are **triggered by the user** and **don't automatically expire** when newer checkpoints are completed. You can create savepoints can from the [command line](../ops/cli.html#savepoints) or when canceling a job via the [REST API](../monitoring/rest_api.html#cancel-job-with-savepoint). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"savepoints can" -> "savepoints"
| handover and buffering, and increases overall throughput while decreasing latency. | ||
| The chaining behavior can be configured; see the [chaining docs](../dev/datastream_api.html#task-chaining-and-resource-groups) for details. | ||
| handover and buffering and increases overall throughput while decreasing latency. | ||
| You can configure the chaining behavior, read the [chaining docs](../dev/datastream_api.html#task-chaining-and-resource-groups) for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve the semi-colon?
| You can configure the chaining behavior, read the [chaining docs](../dev/datastream_api.html#task-chaining-and-resource-groups) for details. | ||
|
|
||
| The sample dataflow in the figure below is executed with five subtasks, and hence with five parallel threads. | ||
| Five subtasks execute the sample data flow in the figure below with five parallel threads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The community as well as other projects refer to "dataflow" extensively (also below).
|
|
||
| There is always at least one Job Manager. A high-availability setup will have multiple JobManagers, one of | ||
| which one is always the *leader*, and the others are *standby*. | ||
| There is always at least one Job Manager. A high-availability setup should have multiple JobManagers, one of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"JobManager"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"A high-availability setup should have multiple JobManagers" is, in general, not true -- this is a detail that depends on the underlying cluster management framework.
I suggest reworking as follows:
There is always at least one JobManager, but some [high-availability setups]({{ site.baseurl }}/ops/jobmanager_high_availability.html) will have multiple JobManagers.
|
|
||
| As a rule-of-thumb, a good default number of task slots would be the number of CPU cores. | ||
| With hyper-threading, each slot then takes 2 or more hardware thread contexts. | ||
| As a rule-of-thumb, a reasonable default number of task slots would be the number of CPU cores. With hyper-threading, each slot then takes 2 or more hardware thread contexts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve the line split? I'm not sure what the second sentence is saying. Typically hyper-threads are reported as separate cores so would not each slot take a single hardware thread context?
| stores data in an in-memory hash map, another state backend uses [RocksDB](http://rocksdb.org) as the key/value store. | ||
| In addition to defining the data structure that holds the state, the state backends also implement the logic to | ||
| take a point-in-time snapshot of the key/value state and store that snapshot as part of a checkpoint. | ||
| The exact data structures which store the key/values indexes depends on the chosen [state backend](../ops/state/state_backends.html). One state backend stores data in an in-memory hash map, another state backend uses [RocksDB](http://rocksdb.org) as the key/value store. In addition to defining the data structure that holds the state, the state backends also implement the logic to take a point-in-time snapshot of the key/value state and store that snapshot as part of a checkpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve the line splits? The conversion to HTML ignores single newlines.
|
I'm closing this as "Abandoned", since there is no more activity and the code base has moved on quite a bit. Please re-open this if you feel otherwise and work should continue. |
Small review of the runtime concept doc to reduce passive voice, reduce future tense, improve grammar and formatting. Let me know if it needs backporting to any other branch or if there are any other issues