Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to get all models as one archive? #26

Closed
hodzanassredin opened this issue Oct 1, 2015 · 15 comments
Closed

Where to get all models as one archive? #26

hodzanassredin opened this issue Oct 1, 2015 · 15 comments
Assignees

Comments

@hodzanassredin
Copy link

I\m trying to download models from http://whoisbigger.com/polyglot. But unfortunately it shows 0 bps after some time. Could you give me a link to an alternative donwload?

@aboSamoor
Copy link
Owner

Try:
http://whoisbigger.com/polyglot/index.json

This index include all the files.

If you do not want to deal with the file, use polyglot download subcommand

@hodzanassredin
Copy link
Author

Downloader from pypi uses google and doesn't work at all. Latest version from github shows that everything is downloaded(downloader.download("embeddings2.ru")), but during execution, for example some POS or NER tagging code, shows errors like unexpected EOF of compressed file. Seems that problem in that mirror http://whoisbigger.com/polyglot. When I try to download files manually like embeddings2 ru it just freezes and shows download speed 0 bps. One more problem that by default downloader downloads everything into / directory so after installation I have to go to configuration and set different location. It is better to use some /usr/local/ folder..

@aboSamoor
Copy link
Owner

Yes, pypi package uses the old mirror, you need to use the updated github code.

Embeddings are not enough to run POS and NER, you need to download POS and NER models for Russian. The new mirror works, you just did not download all the necessary models.

@hodzanassredin
Copy link
Author

Ok I'll try. But is there any command to download everything? Or everything for a single language?

@hodzanassredin
Copy link
Author

Or download on demand during code execution?

@aboSamoor
Copy link
Owner

Look at the documentation for commands to download all models or all models
for a specific language.

On Thu, Oct 1, 2015, 12:51 Hodza Nassredin notifications@github.com wrote:

Or download on demand during code execution?


Reply to this email directly or view it on GitHub
#26 (comment).

@hodzanassredin
Copy link
Author

My miss. Thanks I found this section. It is hidden in words. Probably better idea is to highjlight it somehow.

@hodzanassredin
Copy link
Author

And yes seems that pypi version has correct download directory inside home dir. Probably one more my miss during installation of latest version. Thanks for your work and support.

@hodzanassredin
Copy link
Author

Started download for LANG:ru half an hour ago and it is still downloading first file.

@aboSamoor
Copy link
Owner

If it is downloading, then it is ok. If you do not see progress in the
download, then you should contact skiena@cs.stonybrook.edu.

The new mirror could be slower than the previous one (Google) but it is for
free :).

On Thu, Oct 1, 2015 at 1:21 PM Hodza Nassredin notifications@github.com
wrote:

Started download for LANG:ru half an hour ago and it is still downloading
first file.


Reply to this email directly or view it on GitHub
#26 (comment).

@hodzanassredin
Copy link
Author

OK I'll leave it for night. Hope it will do the thing.

@aboSamoor
Copy link
Owner

try to start in parallel another download and see if the other is faster,
maybe this one is pathologically slow.

On Thu, Oct 1, 2015 at 1:25 PM Hodza Nassredin notifications@github.com
wrote:

OK I'll leave it for night. Hope it will do the thing.


Reply to this email directly or view it on GitHub
#26 (comment).

@hodzanassredin
Copy link
Author

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> d

Download which package (l=list; x=cancel)?
Identifier> LANG:ru
Downloading collection u'LANG:ru'
|
| Downloading package counts2.ru to /home/hodza/polyglot_data...
| Downloading package sgns2.ru to /home/hodza/polyglot_data...
| Downloading package transliteration2.ru to
| /home/hodza/polyglot_data...
| Downloading package embeddings2.ru to
| /home/hodza/polyglot_data...
| Downloading package ner2.ru to /home/hodza/polyglot_data...
| Downloading package tsne2.ru to /home/hodza/polyglot_data...
| Downloading package sentiment2.ru to
| /home/hodza/polyglot_data...
| Downloading package morph2.ru to /home/hodza/polyglot_data...
|
Done downloading collection LANG:ru


d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> q
hodza@hodza-aspire:~$ polyglot --lang ru ner
Traceback (most recent call last):
File "/usr/local/bin/polyglot", line 9, in
load_entry_point('polyglot==15.5.2', 'console_scripts', 'polyglot')()
File "/usr/local/lib/python2.7/dist-packages/polyglot/main.py", line 280, in main
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/polyglot/main.py", line 86, in ner_chunk
chunker = NEChunker(lang=args.lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 99, in init
super(NEChunker, self).init(lang=lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 40, in init
self.predictor = self._load_network()
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 104, in _load_network
self.embeddings = load_embeddings(self.lang, type='cw')
File "/usr/local/lib/python2.7/dist-packages/polyglot/decorators.py", line 30, in memoizer
cache[key] = obj(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/polyglot/load.py", line 65, in load_embeddings
e = Embedding.load(p)
File "/usr/local/lib/python2.7/dist-packages/polyglot/mapping/embeddings.py", line 254, in load
content = _open(fname).read()
File "/usr/lib/python2.7/tarfile.py", line 823, in read
buf += self.fileobj.read()
File "/usr/lib/python2.7/tarfile.py", line 743, in read
return self.readnormal(size)
File "/usr/lib/python2.7/tarfile.py", line 752, in readnormal
return self.fileobj.read(size)
EOFError: compressed file ended before the logical end-of-stream was detected

@hodzanassredin hodzanassredin reopened this Oct 1, 2015
@aboSamoor
Copy link
Owner

Make sure that you cleaned your models directory from previously downloaded
models. If you start from a fresh clean state, you should not get this
error.

On Thu, Oct 1, 2015 at 4:17 PM Hodza Nassredin notifications@github.com
wrote:

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> d

Download which package (l=list; x=cancel)?
Identifier> LANG:ru
Downloading collection u'LANG:ru'
|
| Downloading package counts2.ru to /home/hodza/polyglot_data...
| Downloading package sgns2.ru to /home/hodza/polyglot_data...
| Downloading package transliteration2.ru to
| /home/hodza/polyglot_data...
| Downloading package embeddings2.ru to
| /home/hodza/polyglot_data...
| Downloading package ner2.ru to /home/hodza/polyglot_data...
| Downloading package tsne2.ru to /home/hodza/polyglot_data...
| Downloading package sentiment2.ru to
| /home/hodza/polyglot_data...
| Downloading package morph2.ru to /home/hodza/polyglot_data...
|

Done downloading collection LANG:ru

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> q

hodza@hodza-aspire:~$ polyglot --lang ru ner
Traceback (most recent call last):
File "/usr/local/bin/polyglot", line 9, in
load_entry_point('polyglot==15.5.2', 'console_scripts', 'polyglot')()
File "/usr/local/lib/python2.7/dist-packages/polyglot/main.py", line
280, in main
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/polyglot/main.py", line
86, in ner_chunk
chunker = NEChunker(lang=args.lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line
99, in init
super(NEChunker, self).init(lang=lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line
40, in init
self.predictor = self._load_network()
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line
104, in _load_network
self.embeddings = load_embeddings(self.lang, type='cw')
File "/usr/local/lib/python2.7/dist-packages/polyglot/decorators.py", line
30, in memoizer
cache[key] = obj(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/polyglot/load.py", line 65,
in load_embeddings
e = Embedding.load(p)
File
"/usr/local/lib/python2.7/dist-packages/polyglot/mapping/embeddings.py",
line 254, in load
content = _open(fname).read()
File "/usr/lib/python2.7/tarfile.py", line 823, in read
buf += self.fileobj.read()
File "/usr/lib/python2.7/tarfile.py", line 743, in read
return self.readnormal(size)
File "/usr/lib/python2.7/tarfile.py", line 752, in readnormal
return self.fileobj.read(size)
EOFError: compressed file ended before the logical end-of-stream was
detected


Reply to this email directly or view it on GitHub
#26 (comment).

@hodzanassredin
Copy link
Author

No. Reason for that error is a corrupted file.

hodza@py-trainer:~$ tar -vxjf /home/hodza/polyglot_data/embeddings2/ru/embeddings_pkl.tar.bz2 > /dev/null

bzip2: Compressed file ends unexpectedly;
perhaps it is corrupted? Possible reason follows.
bzip2: Inappropriate ioctl for device
Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
hodza@py-trainer:~$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants