Where to get all models as one archive? #26

hodzanassredin · 2015-10-01T15:40:09Z

I\m trying to download models from http://whoisbigger.com/polyglot. But unfortunately it shows 0 bps after some time. Could you give me a link to an alternative donwload?

aboSamoor · 2015-10-01T16:54:44Z

Try:
http://whoisbigger.com/polyglot/index.json

This index include all the files.

If you do not want to deal with the file, use polyglot download subcommand

hodzanassredin · 2015-10-01T18:40:19Z

Downloader from pypi uses google and doesn't work at all. Latest version from github shows that everything is downloaded(downloader.download("embeddings2.ru")), but during execution, for example some POS or NER tagging code, shows errors like unexpected EOF of compressed file. Seems that problem in that mirror http://whoisbigger.com/polyglot. When I try to download files manually like embeddings2 ru it just freezes and shows download speed 0 bps. One more problem that by default downloader downloads everything into / directory so after installation I have to go to configuration and set different location. It is better to use some /usr/local/ folder..

aboSamoor · 2015-10-01T18:59:16Z

Yes, pypi package uses the old mirror, you need to use the updated github code.

Embeddings are not enough to run POS and NER, you need to download POS and NER models for Russian. The new mirror works, you just did not download all the necessary models.

hodzanassredin · 2015-10-01T19:50:54Z

Ok I'll try. But is there any command to download everything? Or everything for a single language?

hodzanassredin · 2015-10-01T19:51:32Z

Or download on demand during code execution?

aboSamoor · 2015-10-01T19:54:55Z

Look at the documentation for commands to download all models or all models
for a specific language.

On Thu, Oct 1, 2015, 12:51 Hodza Nassredin notifications@github.com wrote:

Or download on demand during code execution?

—
Reply to this email directly or view it on GitHub
#26 (comment).

hodzanassredin · 2015-10-01T20:00:50Z

My miss. Thanks I found this section. It is hidden in words. Probably better idea is to highjlight it somehow.

hodzanassredin · 2015-10-01T20:02:24Z

And yes seems that pypi version has correct download directory inside home dir. Probably one more my miss during installation of latest version. Thanks for your work and support.

hodzanassredin · 2015-10-01T20:21:16Z

Started download for LANG:ru half an hour ago and it is still downloading first file.

aboSamoor · 2015-10-01T20:23:16Z

If it is downloading, then it is ok. If you do not see progress in the
download, then you should contact skiena@cs.stonybrook.edu.

The new mirror could be slower than the previous one (Google) but it is for
free :).

On Thu, Oct 1, 2015 at 1:21 PM Hodza Nassredin notifications@github.com
wrote:

Started download for LANG:ru half an hour ago and it is still downloading
first file.

—
Reply to this email directly or view it on GitHub
#26 (comment).

hodzanassredin · 2015-10-01T20:25:49Z

OK I'll leave it for night. Hope it will do the thing.

aboSamoor · 2015-10-01T20:30:22Z

try to start in parallel another download and see if the other is faster,
maybe this one is pathologically slow.

On Thu, Oct 1, 2015 at 1:25 PM Hodza Nassredin notifications@github.com
wrote:

OK I'll leave it for night. Hope it will do the thing.

—
Reply to this email directly or view it on GitHub
#26 (comment).

hodzanassredin · 2015-10-01T23:16:56Z

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> d

Download which package (l=list; x=cancel)?
Identifier> LANG:ru
Downloading collection u'LANG:ru'
|
| Downloading package counts2.ru to /home/hodza/polyglot_data...
| Downloading package sgns2.ru to /home/hodza/polyglot_data...
| Downloading package transliteration2.ru to
| /home/hodza/polyglot_data...
| Downloading package embeddings2.ru to
| /home/hodza/polyglot_data...
| Downloading package ner2.ru to /home/hodza/polyglot_data...
| Downloading package tsne2.ru to /home/hodza/polyglot_data...
| Downloading package sentiment2.ru to
| /home/hodza/polyglot_data...
| Downloading package morph2.ru to /home/hodza/polyglot_data...
|
Done downloading collection LANG:ru

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> q
hodza@hodza-aspire:~$ polyglot --lang ru ner
Traceback (most recent call last):
File "/usr/local/bin/polyglot", line 9, in
load_entry_point('polyglot==15.5.2', 'console_scripts', 'polyglot')()
File "/usr/local/lib/python2.7/dist-packages/polyglot/main.py", line 280, in main
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/polyglot/main.py", line 86, in ner_chunk
chunker = NEChunker(lang=args.lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 99, in init
super(NEChunker, self).init(lang=lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 40, in init
self.predictor = self._load_network()
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line 104, in _load_network
self.embeddings = load_embeddings(self.lang, type='cw')
File "/usr/local/lib/python2.7/dist-packages/polyglot/decorators.py", line 30, in memoizer
cache[key] = obj(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/polyglot/load.py", line 65, in load_embeddings
e = Embedding.load(p)
File "/usr/local/lib/python2.7/dist-packages/polyglot/mapping/embeddings.py", line 254, in load
content = _open(fname).read()
File "/usr/lib/python2.7/tarfile.py", line 823, in read
buf += self.fileobj.read()
File "/usr/lib/python2.7/tarfile.py", line 743, in read
return self.readnormal(size)
File "/usr/lib/python2.7/tarfile.py", line 752, in readnormal
return self.fileobj.read(size)
EOFError: compressed file ended before the logical end-of-stream was detected

aboSamoor · 2015-10-01T23:32:14Z

Make sure that you cleaned your models directory from previously downloaded
models. If you start from a fresh clean state, you should not get this
error.

On Thu, Oct 1, 2015 at 4:17 PM Hodza Nassredin notifications@github.com
wrote:

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> d

Download which package (l=list; x=cancel)?
Identifier> LANG:ru
Downloading collection u'LANG:ru'
|
| Downloading package counts2.ru to /home/hodza/polyglot_data...
| Downloading package sgns2.ru to /home/hodza/polyglot_data...
| Downloading package transliteration2.ru to
| /home/hodza/polyglot_data...
| Downloading package embeddings2.ru to
| /home/hodza/polyglot_data...
| Downloading package ner2.ru to /home/hodza/polyglot_data...
| Downloading package tsne2.ru to /home/hodza/polyglot_data...
| Downloading package sentiment2.ru to
| /home/hodza/polyglot_data...
| Downloading package morph2.ru to /home/hodza/polyglot_data...
|

Done downloading collection LANG:ru

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> q

hodza@hodza-aspire:~$ polyglot --lang ru ner
Traceback (most recent call last):
File "/usr/local/bin/polyglot", line 9, in
load_entry_point('polyglot==15.5.2', 'console_scripts', 'polyglot')()
File "/usr/local/lib/python2.7/dist-packages/polyglot/main.py", line
280, in main
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/polyglot/main.py", line
86, in ner_chunk
chunker = NEChunker(lang=args.lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line
99, in init
super(NEChunker, self).init(lang=lang)
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line
40, in init
self.predictor = self._load_network()
File "/usr/local/lib/python2.7/dist-packages/polyglot/tag/base.py", line
104, in _load_network
self.embeddings = load_embeddings(self.lang, type='cw')
File "/usr/local/lib/python2.7/dist-packages/polyglot/decorators.py", line
30, in memoizer
cache[key] = obj(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/polyglot/load.py", line 65,
in load_embeddings
e = Embedding.load(p)
File
"/usr/local/lib/python2.7/dist-packages/polyglot/mapping/embeddings.py",
line 254, in load
content = _open(fname).read()
File "/usr/lib/python2.7/tarfile.py", line 823, in read
buf += self.fileobj.read()
File "/usr/lib/python2.7/tarfile.py", line 743, in read
return self.readnormal(size)
File "/usr/lib/python2.7/tarfile.py", line 752, in readnormal
return self.fileobj.read(size)
EOFError: compressed file ended before the logical end-of-stream was
detected

—
Reply to this email directly or view it on GitHub
#26 (comment).

hodzanassredin · 2015-10-02T08:16:47Z

No. Reason for that error is a corrupted file.

hodza@py-trainer:~$ tar -vxjf /home/hodza/polyglot_data/embeddings2/ru/embeddings_pkl.tar.bz2 > /dev/null

bzip2: Compressed file ends unexpectedly;
perhaps it is corrupted? Possible reason follows.
bzip2: Inappropriate ioctl for device
Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
hodza@py-trainer:~$

aboSamoor assigned alantian Oct 1, 2015

hodzanassredin closed this as completed Oct 1, 2015

hodzanassredin reopened this Oct 1, 2015

aboSamoor closed this as completed Oct 13, 2015

burtsev-cpu mentioned this issue Jan 11, 2019

Downloader: Russian collection is not complete #180

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where to get all models as one archive? #26

Where to get all models as one archive? #26

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

Done downloading collection LANG:ru

hodzanassredin commented Oct 2, 2015

Where to get all models as one archive? #26

Where to get all models as one archive? #26

Comments

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

aboSamoor commented Oct 1, 2015

hodzanassredin commented Oct 1, 2015

d) Download l) List u) Update c) Config h) Help q) Quit

d) Download l) List u) Update c) Config h) Help q) Quit

aboSamoor commented Oct 1, 2015

Done downloading collection LANG:ru

hodzanassredin commented Oct 2, 2015