Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aspell should detect duplicated words #403

Open
aspell-helper opened this issue Mar 7, 2007 · 6 comments
Open

aspell should detect duplicated words #403

aspell-helper opened this issue Mar 7, 2007 · 6 comments

Comments

@aspell-helper
Copy link
Collaborator

reitblatt <reitblatt@sf> created a feature request on 2007-03-07 09:35:46 UTC
(Orig. from https://sourceforge.net/p/aspell/feature-requests/53)

This came in to the Ubuntu bug tracker, and I believe it would best be handled upstream (by you guys). Description from the original bug (https://launchpad.net/ubuntu/+source/aspell/+bug/90332 ):

It would be great if aspell detected duplicate words. MS Word's spell checker has supported this for ten years, and WordPerfect even had this way back in the DOS version. It is time that aspell caught up with these apps. :-)

@aspell-helper
Copy link
Collaborator Author

lawlist <lawlist@sf> commented on 2014-01-20 04:06:34.544000 UTC

It's been 7 years since @reitblatt started this thread. What's the status? Why is this a priority 5 and not a higher priority?

@m1cm1c
Copy link

m1cm1c commented Oct 22, 2020

Is this something that wouldn't be merged anyway or just something that no one got around to doing?

@kevina
Copy link
Member

kevina commented Oct 22, 2020

I am not totally against it, but it something I believe is outside the domain of a spell checker.

@daobrien
Copy link

"I don't believe that that feature is something a spell checker could easily handle."

Perfectly valid, but contains duplicated words. I haven't looked into the history of this request, but agree it would be great to have.

@m1cm1c
Copy link

m1cm1c commented Oct 23, 2020

"I don't believe that that feature is something a spell checker could easily handle."

Spell checkers always yield false positives. To find out how many false positives there would be when checking for duplicate words, I just wrote a bash function that I then applied to a couple of long texts that I wrote and thoroughly proof-read in the past:

find-duplicate-words()
{
	cat "$@" | sed -E 's/\s+/\n/g' | uniq -d | grep -E '\w'
}

The only result it came up with was "end" which I traced back to this line in my bachelor's thesis:

%% Please enter the start end end time of your thesis

So I didn't find any false positives. That is not to say that they do not exist. They do. But they are rare.

However, I do know that I often find duplicate words that shouldn't be duplicate words when proof-reading texts.

I also know that I had a significant number of false positives back when I checked my Bachelor's thesis for spelling errors using aspell. For example:

  • names
  • technical terms
  • initialisms
  • shell commands such as the command stated above
  • LaTeX commands
  • URLs that contain an identifier

All of these cause way more false positives. In this comment, aspell yields the these false positives:

  • sed
  • uniq
  • aspell

And probably quite a few more if it wasn't for the fact that I have already added a few of the words used here ("initialisms", "LaTeX", and "URLs" come to mind) to my personal dictionary. Yet we don't throw out spell checkers entirely.

If "that that" is something that occurs often in texts people write, there could be an exception for this. I don't even think that an extension of the personal dictionary for this purpose would be difficult to achieve.

@kevina
Copy link
Member

kevina commented Oct 23, 2020

To clarify I said: "I am not totally against it, but it something I believe is outside the domain of a spell checker." not "I don't believe that that feature is something a spell checker could easily handle." as in a spell checker main job is to check the spelling of words, I do not consider duplicate words a spelling error.

In addition due to technical reasons for how Aspell works and how other applications use Aspell checking anything other than a single word at a time will be difficult.

This has nothing to do with false positives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants