Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spell checker is failing due to failure to get words file #6949

Closed
romani opened this issue Aug 7, 2019 · 7 comments

Comments

@romani
Copy link
Member

commented Aug 7, 2019

build failure:
https://travis-ci.org/checkstyle/checkstyle/jobs/568596942#L307

Retrieve ./usr/share/dict/linux.words
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1271k  100 1271k    0     0   326k      0  0:00:03  0:00:03 --:--:--  326k
/usr/bin/unlzma: (stdin): File format not recognized
./.ci/test-spelling-unknown-words.sh failed to extract words 
(https://rpmfind.net/linux/fedora
/linux/development/rawhide/Everything/aarch64/os/Packages/
w/words-3.0-34.fc31.noarch.rpm
 as .ci-temp/words.rpm) (1 0)

spellcheker is disabled at #6945 , till issue is resolved.

spellcheker works fine on local(ubuntu 16.04)

@romani romani added the approved label Aug 7, 2019

@romani

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2019

reproduced on my local ... when executed in different checkout folder.

@romani

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2019

@jsoref , can you help us to resolve problem ?
looks like some problem with rpm2cpio.sh , problem is at https://github.com/checkstyle/checkstyle/blob/master/.ci/test-spelling-unknown-words.sh#L28

@jsoref

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2019

Did the package change its format? Probably discernable by comparing the previous version. I'll try to look tonight…

@romani

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2019

I am not sure, I did not update for scripts and version of this file for a while, just noticed this problem.

I uploaded cached version of words to

$ SF_USER=romanivanov
$ ssh -t $SF_USER,checkstyle@shell.sourceforge.net create
$ scp .ci-temp/english.words $SF_USER,checkstyle@shell.sourceforge.net:/home/project-web/checkstyle/reports
 if [ ! -e "$dict" ]; then
   echo "Retrieve cached english.words"
   curl https://checkstyle.sourceforge.io/reports/english.words -o $dict
fi

but .... during testing I noticed that it works completely fine on my local without code that retrieve it from rpm.

@jsoref

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2019

What I meant is that the version of the file changed: https://rpmfind.net/linux/fedora/linux/development/rawhide/Everything/aarch64/os/Packages/w/words-3.0-34.fc31.noarch.rpm -- is dated: 2019-07-27 22:40 (i.e. very recently).

Running through the script:

COMPRESSION=`($EXTRACTOR |file -) 2>/dev/null`

$ echo $COMPRESSION
/dev/stdin: Zstandard compressed data (v0.8+), Dictionary ID: None

It looks like RHEL got fancy and switched to Zstd (that was coming, they had announced it a while ago).
Zstd support is not available by default in debian (and almost certainly various other random distributions).

I don't have a good solution to this problem. It would be possible to try to write enough code to try to ask CPAN to get Compress::Zstd::Decompressor, but that's really overkill (the effort to reinvent rpm extraction is painful as is).

w/ CPAN, the following would work as DECOMPRESSOR=:
perl -MCompress::Zstd -e '$/=undef; print decompress(<>);'
But... that requires:

  • getting cpan to properly work (I'm fairly confident there's a way to give cpan non-interactive settings, I've done it in the past, but probably 10+years ago)
  • asking CPAN for the env vars to set to let the above actually work.

I think you're better served by just hosting a version of the dictionary yourself (and dealing w/ the copyright issues directly instead of trying to skirt it).

@romani

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2019

I think you're better served by just hosting a version of the dictionary yourself

Yes, after dealing with such issue, I also came to such conclusion, maintenance will be easier. Vocabulary is not changing that often.

romani added a commit to romani/checkstyle that referenced this issue Aug 7, 2019
romani added a commit to romani/checkstyle that referenced this issue Aug 7, 2019
romani added a commit to romani/checkstyle that referenced this issue Aug 7, 2019
romani added a commit to romani/checkstyle that referenced this issue Aug 7, 2019

@romani romani added the miscellaneous label Aug 7, 2019

romani added a commit to romani/checkstyle that referenced this issue Aug 7, 2019
romani added a commit to romani/checkstyle that referenced this issue Aug 7, 2019
romani added a commit to romani/checkstyle that referenced this issue Aug 7, 2019
romani added a commit that referenced this issue Aug 8, 2019

@romani romani added this to the 8.24 milestone Aug 8, 2019

@romani

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

fix is merged.

@rnveach rnveach closed this Aug 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.