Skip to content
This repository has been archived by the owner on Jul 6, 2020. It is now read-only.

Add a task to spell check documentation #6

Open
freakboy3742 opened this issue Jul 24, 2017 · 27 comments
Open

Add a task to spell check documentation #6

freakboy3742 opened this issue Jul 24, 2017 · 27 comments

Comments

@freakboy3742
Copy link
Member

We need a a Beefore task that will do a spell check of any changes to the documentation directory.

@Logan1x
Copy link

Logan1x commented Jul 24, 2017

Can you elaborate it please @freakboy3742

@freakboy3742
Copy link
Member Author

I'm not sure what more detail you're looking for. Beefore currently has a task for doing things like linting, so it can find (and comment on) code format problems. There are tools available for spell checking ReST documentation; we should create a Beefore task to perform a spell check on any PR that touches the /docs directory.

@Logan1x
Copy link

Logan1x commented Jul 26, 2017

What technology i need to know before i can on this issue @freakboy3742?

@garretvo19
Copy link

I used Peter Norvig's spell checking algorithm to perform the spell checker. However, the spell checker might work well if there exists a dictionary of words used in the document. I can submit it if you want.

@Logan1x
Copy link

Logan1x commented Aug 6, 2017

Yes you can

@ujjaldas1997
Copy link

Hello @freakboy3742 ,
Can I do this task? I am new to contributing open source and I want to work on this.

@MeghaSharma21
Copy link

Hey @freakboy3742 ,
I am a newbie in open source world. If this issue is still open, can I work upon it?

@thomasoflight
Copy link

Hey All -
I'm looking at PyEnchant for this. Since nothing has been submitted yet for this, I'm researching how put this together. Feel free to beat me to it.

@ivanzaqqa
Copy link

can i do this , Mr.?

@freakboy3742
Copy link
Member Author

@ivanzaqqa I beleive @thomasoflight is working on this - but if he doesn't respond in the next few days, feel free to have a go!

@thomasoflight
Copy link

Hi there @ivanzaqqa. I am currently developing this feature, albeit at a first-timer pace. If there's something else that you've seen perhaps consider that one first. Cheers!

@ivanzaqqa
Copy link

ivanzaqqa commented Sep 13, 2017 via email

@cyberdrk
Copy link

Hello, I was thinking of something along the lines o Peter Norvig's amazing tutorial here: http://norvig.com/spell-correct.html

@freakboy3742
Copy link
Member Author

@cyberdrk We don't need a solution built from scratch. There are existing libraries for Sphinx that do spell checking. What we need is a plugin that integrates one of those libraries

@thomasoflight
Copy link

thomasoflight commented Oct 24, 2017

Hey @cyberdrk, @freakboy3742 - A while back I posted this gist which documents my initial efforts to integrate Sphinx with Beefore. @freakboy3742 do you know of any resources that can outline this process? The Sphinx documentation contains a lot of information which is challenging. @cyberdrk I'd love to work with you on this or see how you're solving it. I'm currently stepping through the Beefore codebase to see how the pieces fit together as I write a rough draft.

@zifn
Copy link
Contributor

zifn commented May 15, 2018

Hi, I'm a first time contributor but if there aren't any objections, I would like to take a crack at this issue.

@zifn
Copy link
Contributor

zifn commented May 15, 2018

I noticed that the sphinx spell checker is using a library (pyenchant) that is no longer being maintained and I didn't see another library that is as mature as that one for spell checking. How should I proceed?

@zifn
Copy link
Contributor

zifn commented May 16, 2018

@freakboy3742 Hi I just want to confirm with you what's involved in addressing this issue. To make a new, Beefore task does this involve writing a new module (pyspellingbee ?) that uses the spell checker libraries on the diffs of .rst files or other files in the docs folder.

@zifn
Copy link
Contributor

zifn commented May 18, 2018

@freakboy3742 @cyberdrk @thomasoflight @ivanzaqqa So here's what I'm thinking thus far. It seams like to add a beekeeper task for spell check, a new file (possibly named spellingbee.py) will need to be added to the beefore\checks directory. The required modules (pyenchant, maybe sphinxcontrib-spelling along with sphinx depending on the spell check implementation) will also need to be installed as well as the enchanct c lib which pyenchant is a wrapper for (also note that as of May 2018 pyenchant is no longer supported and is looking for a new supporting developer). To install these libraries, the install_requires parameter in setup.py may need to be modified to include the appropriate modules. Another solution to this problem could be to use the ?potentially? required function def prepare(directory) for modules in the beefore\checks directory which could possibly be used to manually install the needed c libraries and modules to run the checks. To preform the spell check one option could be to use the pyenchant lib directly on the diffs obtained from github files in the docs directory using the suggested spelling to generate terminal output to be used in correcting spelling errors in the docs. Also either the different repos that use this spell check task will need to start maintaining a dictionary file or that file will need to be added to the beefore repo in order to add exceptions to words like BeeWare and Beefore which aren't mispelled but aren't in the standard English dictionary. To test this modification, the beekeeper yaml file will need to be updated to use the spellingbee.py task. It's not clear to me what exactly needs to be changed in the beekeeper yaml file or what functions/class defs are and are not required to make the spell check task file and it seams like that a lot of that depends on how the beekeeper runs beefore. I've been looking at the beekeeper repo to find this information but the docs for it aren't very complete so I've been stepping through the beekeeper repo trying to find some of this information. At this point I'm starting to run up against the end of my vacation time so I'm not going to be able to spend as much time on this ticket as I could the last couple of days. I did find a couple of spelling errors in the docs for this repo so I'll submit a PR for that small fix (see #19). Does this description of the ticket sound reasonable? Also, given the number of files that need to be touched and that this issue seams to require some understanding of how this repo interacts with the main CI is this issue appropriate for first time contributors?

@freakboy3742
Copy link
Member Author

@zifn It sounds like you're on the right general track. Adding extra dependencies in install_requires is no problem; the rest of the task is to wrap the calls necessary to start a spell checker over the documentation directory.

Sphinx has some spelling check functionality (both built in, as as plugins), covering most of the cases you've described (e.g., having a "local" dictionary of words known to be OK); I'd expect to see those native tools used for the heavy lifting (just as we use flake8 to check for code style errors). The bulk of the task is collating the spelling issues with the lines that are covered by the patch (so you don't report a spelling error with code/text that isn't actually in the patch).

There's definitely some detail that needs to be worked through, but I can't see any reason that this couldn't be tackled by a first time contributor.

@devarshigoswami
Copy link

Hello @freakboy3742 ! If the issue still persists, I'd love to help out :D

@freakboy3742
Copy link
Member Author

@devarshigoswami Yes - the issue still exists! However, it's safe to say that the scope has changed a little; we've been moving our CI infrastructure to Github Actions, and reducing our use of BeeFore and BeeKeeper.

If you wanted to try a contribution here, I'd suggest trying to add a "Spell check" step to Briefcase's CI definition. Although briefcase is using Beefore to check the Python formatting, using Beefore is not required for the spell check.

@devarshigoswami
Copy link

devarshigoswami commented Feb 28, 2020 via email

@freakboy3742
Copy link
Member Author

So the task, in the most high level, "use case" terms is this:

As a project maintainer, I want to avoid ever merging a PR that contains a change to documentation with a spelling error, or an error in markup.

The ci.yml file I referenced is the configuration of Github actions. Those are the commands that get executed every time a contributor submits a pull request. You could run the same commands yourself by hand; the configuration file gives Github enough detail to run them automatically.

So - the task really has 2 parts:

  1. Work out how, from the command line, you can verify that there are no spelling errors or markup errors in the Sphinx documentation for the project

  2. Work out how to configure Github Actions to invoke those commands.

The markup errors task is relatively straightforward - if you try to build the documentation, and there's a markup error, you'll find the return code of the build is non-zero - that's the usual Unix response for "this command raised an error". If a Github Action returns a non-zero return code, that build step will fail.

The spell check task will be similar - but you'll need to work out how to run a spell check over a Sphinx documentation directory (including adding/excluding words that are spelled correctly, but aren't in the spell checker's dictionary).

I'm not especially concerned about spelling errors in code - with the possible exception of function docstrings, if the project's documentation includes them.

@devarshigoswami
Copy link

devarshigoswami commented Feb 29, 2020 via email

@freakboy3742
Copy link
Member Author

That might be one approach - however, I think you should possibly do a little more research into options that are already well integrated into Sphinx. A quick search revealed this one; there may be others.

@devarshigoswami
Copy link

devarshigoswami commented Feb 29, 2020 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants