New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Update single-machine.rst with corrections [skip ci] #4020
Conversation
This PR updates `single-machine.rst` with some corrections mostly to punctuation and naming conventions. I've made a change to the capitalization to the link **Dask.distributed on a single machine** to **dask.distributed on a single machine** mainly because, apart from the title of the next section, everywhere else in the documentation it is lower-case, and since it is referring to a module it would be best to keep the name in lower case. If you agree with this minor circumstance, then I'll make the same change in the next section in `single-distributed.rst`.
This PR updates `single-distributed.rst` with some corrections to punctuation and naming conventions. With respect to dask#4020, I've changed the capitalization in the title from "**D**ask.distributed" to "**d**ask.distributed" to keep the same consistency when naming or referring to modules in the text.
on numeric data like Pandas dataframes or NumPy arrays. Using processes | ||
avoids GIL issues but can also result in a lot of inter-process | ||
dictionary data that holds onto the GIL_, but not very good when operating | ||
on numeric data like Pandas DataFrames or NumPy arrays. Using processes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on capitalization of DataFrames vs dataframes ? The type name is DataFrame, but I wonder if we should use the uncapitalized version in normal prose. This would be analagous to using "arrays"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using the lower-case format would be the best way to refer to tabular data like Pandas' DataFrames. For instance, what is the correct way to write it? Is it "dataframes" or "data frames"? Surely "data frames" is the correct way to write it, but it does not convey the same meaning as a Pandas or Dasks DataFrame object. (Also, "dataframes" looks like a typo)
Indeed, it would be nice to use dataframes as we use arrays in normal prose, but it will come with a cost of most likely being used randomly along side DataFrames. Now, if we just bite the bullet and adopt DataFrame in camel case, it would avoid inconsistencies or random usage throughout the text and it would always be correct and consistent. Also, "DataFrames" is how it is written exclusively in the pandas docs (no exceptions for what I can see).
If the goal is to avoid inconsistencies of this kind, I think that DataFrames would be the fool proof way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, happy to defer to the convention in the Pandas docs
docs/source/setup/single-machine.rst
Outdated
- ``dask.local.get_sync``: The single-threaded scheduler | ||
- ``dask.threaded.get``: The threaded scheduler. | ||
- ``dask.multiprocessing.get``: The multiprocessing scheduler. | ||
- ``dask.local.get_sync``: The single-threaded scheduler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend not to end phrases like these with periods because they are not full sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before deciding which one to use at first, I've dug in the documentation a bit to see which way was more used, and I've found several conflicting occurrences of how numbered lists are terminated (or not) in the documentation. For example, here are some samples of lists terminated with dots:
https://dask.pydata.org/en/latest/index.html#dask
https://dask.pydata.org/en/latest/support.html#discussion
https://dask.pydata.org/en/latest/support.html#asking-for-help
https://dask.pydata.org/en/latest/scheduling.html#scheduling
and the opposite as well:
https://dask.pydata.org/en/latest/spec.html#definitions
https://dask.pydata.org/en/latest/configuration.html#configuration
or even mixed use cases:
https://dask.pydata.org/en/latest/dataframe.html#common-uses-and-anti-uses
It was conflicting for me to decide which one to use so I've picked the one which made more sence to me at first.
However, after pondering a bit about this, and taking a look at other documentations of other projects, I came to the conclusion that it is best to avoid terminating sentences altogether and these changes (and others before this one) should be reverted / corrected to remove punctuation at the end of the sentence.
docs/source/setup/single-machine.rst
Outdated
@@ -67,6 +67,6 @@ You can specify these functions in any of the following ways: | |||
Use the Distributed Scheduler | |||
----------------------------- | |||
|
|||
The newer dask.distributed scheduler also works well on a single machine and | |||
The newer ``dask.distributed`` scheduler also works well on a single machine and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just "The newer distributed scheduler"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I find switching to code highlighting like this a bit distracting. I think that we need to find a nice way to talk about the distributed scheduler without using the module name.
Could also be "Dask's newer distributed scheduler also works ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way seems to be better than using the the highlighted module name. I'll rectify this change asap
* Docs: Update single-distributed.rst with corrections [skip ci] This PR updates `single-distributed.rst` with some corrections to punctuation and naming conventions. With respect to #4020, I've changed the capitalization in the title from "**D**ask.distributed" to "**d**ask.distributed" to keep the same consistency when naming or referring to modules in the text. * Replace ``dask.array`` with Dask Array [skip ci] * Remove periods from list [skip ci]
This PR updates
single-machine.rst
with some corrections mostly to punctuation and naming conventions.I've made a change to the capitalization to the link "Dask.distributed on a single machine" to "dask.distributed on a single machine" mainly because, apart from the title of the next section, everywhere else in the documentation it is in lower-case, and since it is referring to a module I assumed it would be best to keep it in lower case as well.
If you agree with this minor circumstance, then I'll make the same change in the next section in
single-distributed.rst
.flake8 dask