Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coke/clean spell #2662

Merged
merged 2 commits into from
Mar 13, 2019
Merged

Coke/clean spell #2662

merged 2 commits into from
Mar 13, 2019

Conversation

coke
Copy link
Collaborator

@coke coke commented Mar 12, 2019

The problem

Words don't get cleared out of the dictionary files, which may be needed in a few cases, like

  • bug fixes in xt/aspell.t mean we no longer have to mark something as a word
  • aspell itself gets smarter and fewer words have to be marked
  • usage of a word is removed from the docs

Solution provided

Script to test if it's OK to remove each word. Does a little more work than necessary.

Update dictionary files to remove all extras (214).

PR so someone with a different aspell installation can verify there's nothing weird about my local english dictionary.

@coke coke requested a review from cfa March 12, 2019 05:05
@cfa
Copy link
Contributor

cfa commented Mar 12, 2019

Good idea. A couple of things:

  • The script seems overzealous. "Rakudo" has been removed from xt/words.pws in 94702ee, yielding 60/376 test failures.
  • I wonder whether we want to whitelist things irrespective of their presence in the documentation to prevent churn.
    • For example, you've removed "radians", "recursing", "refactor", "searchable"—if those show up again in documentation we'd need to re-whitelist them. Removal seems counterproductive.
    • Proposal: enforce the distinction between codewords (code.pws) and English dictionary extensions (words.pws) more robustly, apply util/clean-spell to the codewords file alone.
      • words.pws can of course be pared down first, perhaps with your script.

Thoughts?

@JJ
Copy link
Contributor

JJ commented Mar 12, 2019

In principle, I think this is a great idea. But it means we're adding a new utility to the perl6/doc repository, which is going to be downloaded every single time someone installs the module, and this is something (along with pretty much the rest of this directory) that is interesting mostly for us, as in people who are actively working on the repo.
So I'm all for accepting this if it works, but I'm also all for maybe spinning all spelling-related stuff to a new repo. We could even set up Travis to run periodically over docs, and send us a report, even automatically create an issue that we would work against. What do you think?

Copy link
Contributor

@JJ JJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the caveat in the other comment.

@coke
Copy link
Collaborator Author

coke commented Mar 13, 2019

Regarding sizes, the new script file is dwarfed by the list of words we're already tracking; Making a slim download of just the docs as a release seems like a solution to this problem.

Regarding failures - I apparently had a local ~/.aspell.en.pws which had a few entries, will remove that file and clean up with a force push.

FYI, "searchable" is removed because it's in the dictionary and doesn't need to be in the list. You can see it's still present in doc/Language/about.pod6 and not triggering a failure.

@JJ
Copy link
Contributor

JJ commented Mar 13, 2019

It's not so much about sizes, it's about having something else in the repo, with commit history, maintenance commitments, and so on. Having all the things related to spelling check, in which only maintainers are interested, is a bit too much for the regular user who just wants to download the documentation. For the time being I think it's OK to accept the PR, but all the spelling-related things are a serious candidate for spinning off the repository, at least IMHO.

Copy link
Contributor

@cfa cfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying @coke.

Nit: noted two lines with trailing whitespace.

util/clean-spell Outdated

if +$promise.status {
say "Can't find $dict/$word anywhere, kill it.";
$keep = False;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace.

util/clean-spell Outdated
$keep = False;
} else {
say "aspell test failed, keep it";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace.

After recent fixes to the spell checker, we have a lot of "words"
that aren't needed.

Also finding some that just aren't used anymore.

Also finding some that are included in the dictionary now.
Maybe:
* Now in dictionary
* Now unused
* No longer mistakenly marked as incorrect by xt/aspell.t
@coke
Copy link
Collaborator Author

coke commented Mar 13, 2019

What's the ticket for the issue of user downloads of docs? Would like to add some notes there.

@coke
Copy link
Collaborator Author

coke commented Mar 13, 2019

whitespace fixed

@coke
Copy link
Collaborator Author

coke commented Mar 13, 2019

The two remaining issues I see based on comments

Might have to add words back in - this is an ongoing battle regardless; updates to the dictionary files are made quite often. We won't have to go through this exercise of cleaning very often. If we want to run it again with just code.pws, that's a simple modification to the script, we can do that.

Regarding the thread about utilities moving outside the repo, that can be handled in the context of #2542.

Merging, thanks.

@coke coke merged commit 5683411 into master Mar 13, 2019
@coke coke deleted the coke/clean-spell branch March 13, 2019 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants