Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Update single-machine.rst with corrections [skip ci] #4020

Merged
merged 3 commits into from Sep 29, 2018

Conversation

farrajota
Copy link
Member

This PR updates single-machine.rst with some corrections mostly to punctuation and naming conventions.

I've made a change to the capitalization to the link "Dask.distributed on a single machine" to "dask.distributed on a single machine" mainly because, apart from the title of the next section, everywhere else in the documentation it is in lower-case, and since it is referring to a module I assumed it would be best to keep it in lower case as well.

If you agree with this minor circumstance, then I'll make the same change in the next section in single-distributed.rst.

  • Tests added / passed
  • Passes flake8 dask

This PR updates `single-machine.rst` with some corrections mostly to punctuation and naming conventions. 

I've made a change to the capitalization to the link **Dask.distributed on a single machine** to **dask.distributed on a single machine** mainly because, apart from the title of the next section, everywhere else in the documentation it is lower-case, and since it is referring to a module it would be best to keep the name in lower case.

If you agree with this minor circumstance, then I'll make the same change  in the next section in `single-distributed.rst`.
farrajota added a commit to farrajota/dask that referenced this pull request Sep 27, 2018
This PR updates `single-distributed.rst` with some corrections to punctuation and naming conventions. 

With respect to dask#4020, I've changed the capitalization in the title from "**D**ask.distributed" to "**d**ask.distributed" to keep the same consistency when naming or referring to modules in the text.
on numeric data like Pandas dataframes or NumPy arrays. Using processes
avoids GIL issues but can also result in a lot of inter-process
dictionary data that holds onto the GIL_, but not very good when operating
on numeric data like Pandas DataFrames or NumPy arrays. Using processes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on capitalization of DataFrames vs dataframes ? The type name is DataFrame, but I wonder if we should use the uncapitalized version in normal prose. This would be analagous to using "arrays"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using the lower-case format would be the best way to refer to tabular data like Pandas' DataFrames. For instance, what is the correct way to write it? Is it "dataframes" or "data frames"? Surely "data frames" is the correct way to write it, but it does not convey the same meaning as a Pandas or Dasks DataFrame object. (Also, "dataframes" looks like a typo)

Indeed, it would be nice to use dataframes as we use arrays in normal prose, but it will come with a cost of most likely being used randomly along side DataFrames. Now, if we just bite the bullet and adopt DataFrame in camel case, it would avoid inconsistencies or random usage throughout the text and it would always be correct and consistent. Also, "DataFrames" is how it is written exclusively in the pandas docs (no exceptions for what I can see).

If the goal is to avoid inconsistencies of this kind, I think that DataFrames would be the fool proof way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, happy to defer to the convention in the Pandas docs

- ``dask.local.get_sync``: The single-threaded scheduler
- ``dask.threaded.get``: The threaded scheduler.
- ``dask.multiprocessing.get``: The multiprocessing scheduler.
- ``dask.local.get_sync``: The single-threaded scheduler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend not to end phrases like these with periods because they are not full sentences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before deciding which one to use at first, I've dug in the documentation a bit to see which way was more used, and I've found several conflicting occurrences of how numbered lists are terminated (or not) in the documentation. For example, here are some samples of lists terminated with dots:

https://dask.pydata.org/en/latest/index.html#dask

https://dask.pydata.org/en/latest/support.html#discussion

https://dask.pydata.org/en/latest/support.html#asking-for-help

https://dask.pydata.org/en/latest/scheduling.html#scheduling

and the opposite as well:

https://dask.pydata.org/en/latest/spec.html#definitions

https://dask.pydata.org/en/latest/configuration.html#configuration

or even mixed use cases:

https://dask.pydata.org/en/latest/dataframe.html#common-uses-and-anti-uses

It was conflicting for me to decide which one to use so I've picked the one which made more sence to me at first.

However, after pondering a bit about this, and taking a look at other documentations of other projects, I came to the conclusion that it is best to avoid terminating sentences altogether and these changes (and others before this one) should be reverted / corrected to remove punctuation at the end of the sentence.

@@ -67,6 +67,6 @@ You can specify these functions in any of the following ways:
Use the Distributed Scheduler
-----------------------------

The newer dask.distributed scheduler also works well on a single machine and
The newer ``dask.distributed`` scheduler also works well on a single machine and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just "The newer distributed scheduler"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I find switching to code highlighting like this a bit distracting. I think that we need to find a nice way to talk about the distributed scheduler without using the module name.

Could also be "Dask's newer distributed scheduler also works ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way seems to be better than using the the highlighted module name. I'll rectify this change asap

mrocklin pushed a commit that referenced this pull request Sep 29, 2018
* Docs: Update single-distributed.rst with corrections [skip ci]

This PR updates `single-distributed.rst` with some corrections to punctuation and naming conventions. 

With respect to #4020, I've changed the capitalization in the title from "**D**ask.distributed" to "**d**ask.distributed" to keep the same consistency when naming or referring to modules in the text.

* Replace ``dask.array`` with Dask Array [skip ci]

* Remove periods from list [skip ci]
@mrocklin mrocklin merged commit d26ce29 into dask:master Sep 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants