Docs: Update single-machine.rst with corrections [skip ci] #4020

farrajota · 2018-09-27T22:31:17Z

This PR updates single-machine.rst with some corrections mostly to punctuation and naming conventions.

I've made a change to the capitalization to the link "Dask.distributed on a single machine" to "dask.distributed on a single machine" mainly because, apart from the title of the next section, everywhere else in the documentation it is in lower-case, and since it is referring to a module I assumed it would be best to keep it in lower case as well.

If you agree with this minor circumstance, then I'll make the same change in the next section in single-distributed.rst.

Tests added / passed
Passes flake8 dask

This PR updates `single-machine.rst` with some corrections mostly to punctuation and naming conventions. I've made a change to the capitalization to the link **Dask.distributed on a single machine** to **dask.distributed on a single machine** mainly because, apart from the title of the next section, everywhere else in the documentation it is lower-case, and since it is referring to a module it would be best to keep the name in lower case. If you agree with this minor circumstance, then I'll make the same change in the next section in `single-distributed.rst`.

This PR updates `single-distributed.rst` with some corrections to punctuation and naming conventions. With respect to dask#4020, I've changed the capitalization in the title from "**D**ask.distributed" to "**d**ask.distributed" to keep the same consistency when naming or referring to modules in the text.

mrocklin · 2018-09-28T12:47:46Z

docs/source/setup/single-machine.rst

-    on numeric data like Pandas dataframes or NumPy arrays.  Using processes
-    avoids GIL issues but can also result in a lot of inter-process
+    dictionary data that holds onto the GIL_, but not very good when operating
+    on numeric data like Pandas DataFrames or NumPy arrays.  Using processes


Thoughts on capitalization of DataFrames vs dataframes ? The type name is DataFrame, but I wonder if we should use the uncapitalized version in normal prose. This would be analagous to using "arrays"

I don't think using the lower-case format would be the best way to refer to tabular data like Pandas' DataFrames. For instance, what is the correct way to write it? Is it "dataframes" or "data frames"? Surely "data frames" is the correct way to write it, but it does not convey the same meaning as a Pandas or Dasks DataFrame object. (Also, "dataframes" looks like a typo)

Indeed, it would be nice to use dataframes as we use arrays in normal prose, but it will come with a cost of most likely being used randomly along side DataFrames. Now, if we just bite the bullet and adopt DataFrame in camel case, it would avoid inconsistencies or random usage throughout the text and it would always be correct and consistent. Also, "DataFrames" is how it is written exclusively in the pandas docs (no exceptions for what I can see).

If the goal is to avoid inconsistencies of this kind, I think that DataFrames would be the fool proof way.

OK, happy to defer to the convention in the Pandas docs

mrocklin · 2018-09-28T12:48:44Z

docs/source/setup/single-machine.rst

-  ``dask.local.get_sync``: The single-threaded scheduler
+-  ``dask.threaded.get``: The threaded scheduler.
+-  ``dask.multiprocessing.get``: The multiprocessing scheduler.
+-  ``dask.local.get_sync``: The single-threaded scheduler.


I tend not to end phrases like these with periods because they are not full sentences.

Before deciding which one to use at first, I've dug in the documentation a bit to see which way was more used, and I've found several conflicting occurrences of how numbered lists are terminated (or not) in the documentation. For example, here are some samples of lists terminated with dots:

https://dask.pydata.org/en/latest/index.html#dask

https://dask.pydata.org/en/latest/support.html#discussion

https://dask.pydata.org/en/latest/support.html#asking-for-help

https://dask.pydata.org/en/latest/scheduling.html#scheduling

and the opposite as well:

https://dask.pydata.org/en/latest/spec.html#definitions

https://dask.pydata.org/en/latest/configuration.html#configuration

or even mixed use cases:

https://dask.pydata.org/en/latest/dataframe.html#common-uses-and-anti-uses

It was conflicting for me to decide which one to use so I've picked the one which made more sence to me at first.

However, after pondering a bit about this, and taking a look at other documentations of other projects, I came to the conclusion that it is best to avoid terminating sentences altogether and these changes (and others before this one) should be reverted / corrected to remove punctuation at the end of the sentence.

mrocklin · 2018-09-28T12:49:08Z

docs/source/setup/single-machine.rst

@@ -67,6 +67,6 @@ You can specify these functions in any of the following ways:
 Use the Distributed Scheduler
 -----------------------------

-The newer dask.distributed scheduler also works well on a single machine and
+The newer ``dask.distributed`` scheduler also works well on a single machine and


Maybe just "The newer distributed scheduler"

I think that I find switching to code highlighting like this a bit distracting. I think that we need to find a nice way to talk about the distributed scheduler without using the module name.

Could also be "Dask's newer distributed scheduler also works ..."

This way seems to be better than using the the highlighted module name. I'll rectify this change asap

* Docs: Update single-distributed.rst with corrections [skip ci] This PR updates `single-distributed.rst` with some corrections to punctuation and naming conventions. With respect to #4020, I've changed the capitalization in the title from "**D**ask.distributed" to "**d**ask.distributed" to keep the same consistency when naming or referring to modules in the text. * Replace ``dask.array`` with Dask Array [skip ci] * Remove periods from list [skip ci]

farrajota mentioned this pull request Sep 27, 2018

Docs: Update single-distributed.rst with corrections [skip ci] #4021

Merged

2 tasks

mrocklin reviewed Sep 28, 2018

View reviewed changes

farrajota added 2 commits September 28, 2018 15:12

Update sentence and remove module highlighting [skip ci]

6e0fac8

Remove periods from lists [skip ci]

5104adb

mrocklin merged commit d26ce29 into dask:master Sep 29, 2018

farrajota mentioned this pull request Dec 4, 2018

Docs: Update scheduling.rst [skip ci] #4265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: Update single-machine.rst with corrections [skip ci] #4020

Docs: Update single-machine.rst with corrections [skip ci] #4020

farrajota commented Sep 27, 2018

mrocklin Sep 28, 2018

farrajota Sep 28, 2018

mrocklin Sep 28, 2018

mrocklin Sep 28, 2018

farrajota Sep 28, 2018

mrocklin Sep 28, 2018

mrocklin Sep 28, 2018

farrajota Sep 28, 2018

Docs: Update single-machine.rst with corrections [skip ci] #4020

Docs: Update single-machine.rst with corrections [skip ci] #4020

Conversation

farrajota commented Sep 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment