-
Notifications
You must be signed in to change notification settings - Fork 337
/
Download.rst
365 lines (286 loc) · 16.5 KB
/
Download.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
Downloading Models
==================
Polyglot requires a model for each task and language. These models are
essential for the library to function. Given the large size of some of
the models, we distribute the models through a download manager
separately. The download manager has several modes of operation.
Modes of Operation
------------------
Command Line Mode
~~~~~~~~~~~~~~~~~
The subcommand ``download`` takes a package or more as an argument and
download the specified packages in the ``polyglot_data`` directory.
.. code:: python
!polyglot download --help
.. parsed-literal::
usage: polyglot download [-h] [--dir DIR] [--quiet] [--force] [--exit-on-error] [--url SERVER_INDEX_URL] [packages [packages ...]]
positional arguments:
packages packages to be downloaded
optional arguments:
-h, --help show this help message and exit
--dir DIR download package to directory DIR
--quiet work quietly
--force download even if already installed
--exit-on-error exit if an error occurs
--url SERVER_INDEX_URL
download server index url
.. code:: python
!polyglot download morph2.en
.. parsed-literal::
[polyglot_data] Downloading package morph2.en to
[polyglot_data] /home/rmyeid/polyglot_data...
[polyglot_data] Package morph2.en is already up-to-date!
Interactive Mode
~~~~~~~~~~~~~~~~
You can reach this mode by not supplying any arguments to the command
line.
.. code:: python
!polyglot download
.. parsed-literal::
Polyglot Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader>
Library Interface
~~~~~~~~~~~~~~~~~
.. code:: python
from polyglot.downloader import downloader
downloader.download("embeddings2.en")
Collections
-----------
You noticed, by now, that we can install a specific model by specifying
its name and the target language.
Package name format is ``task_name.language_code``
Langauge Collections
^^^^^^^^^^^^^^^^^^^^
Packages are grouped by language. For example, if we want to download
all the models that are specific to Arabic, the arabic collection of
models name is **LANG:** followed by the language code of Arabic which
is ``ar``.
Therefore, we can just run:
.. code:: python
!polyglot download LANG:ar
.. parsed-literal::
[polyglot_data] Downloading collection u'LANG:ar'
[polyglot_data] |
[polyglot_data] | Downloading package tsne2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package tsne2.ar is already up-to-date!
[polyglot_data] | Downloading package transliteration2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package transliteration2.ar is already up-to-
[polyglot_data] | date!
[polyglot_data] | Downloading package morph2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package morph2.ar is already up-to-date!
[polyglot_data] | Downloading package counts2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package counts2.ar is already up-to-date!
[polyglot_data] | Downloading package sentiment2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package sentiment2.ar is already up-to-date!
[polyglot_data] | Downloading package embeddings2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package embeddings2.ar is already up-to-date!
[polyglot_data] | Downloading package ner2.ar to
[polyglot_data] | /home/rmyeid/polyglot_data...
[polyglot_data] | Package ner2.ar is already up-to-date!
[polyglot_data] |
[polyglot_data] Done downloading collection LANG:ar
Task Collections
^^^^^^^^^^^^^^^^
Packages are grouped by task. For example, if we want to download all
the models that perform transliteration. The collection name is
**TASK:** followed by the task name.
Therefore, we can just run:
.. code:: python
downloader.download("TASK:transliteration2", quiet=True)
.. parsed-literal::
True
Langauge & Task Support
-----------------------
We can query our download manager for which tasks are supported by
polyglot, as the following:
.. code:: python
downloader.supported_tasks(lang="en")
.. parsed-literal::
[u'embeddings2',
u'counts2',
u'pos2',
u'ner2',
u'sentiment2',
u'morph2',
u'tsne2']
We can query our download manager for which languages are supported by
polyglot named entity recognition subsystem, as the following:
.. code:: python
print(downloader.supported_languages_table(task="ner2"))
.. parsed-literal::
1. Polish 2. Turkish 3. Russian
4. Indonesian 5. Czech 6. Arabic
7. Korean 8. Catalan; Valencian 9. Italian
10. Thai 11. Romanian, Moldavian, ... 12. Tagalog
13. Danish 14. Finnish 15. German
16. Persian 17. Dutch 18. Chinese
19. French 20. Portuguese 21. Slovak
22. Hebrew (modern) 23. Malay 24. Slovene
25. Bulgarian 26. Hindi 27. Japanese
28. Hungarian 29. Croatian 30. Ukrainian
31. Serbian 32. Lithuanian 33. Norwegian
34. Latvian 35. Swedish 36. English
37. Greek, Modern 38. Spanish; Castilian 39. Vietnamese
40. Estonian
You can view all the available and/or installed collections or packages
through the list function
.. code:: python
downloader.list(show_packages=False)
.. parsed-literal::
Using default data directory (/home/rmyeid/polyglot_data)
=========================================
Data server index for <polyglot-models>
=========================================
Collections:
[ ] LANG:af............. Afrikaans packages and models
[ ] LANG:als............ als packages and models
[ ] LANG:am............. Amharic packages and models
[ ] LANG:an............. Aragonese packages and models
[ ] LANG:ar............. Arabic packages and models
[ ] LANG:arz............ arz packages and models
[ ] LANG:as............. Assamese packages and models
[ ] LANG:ast............ Asturian packages and models
[ ] LANG:az............. Azerbaijani packages and models
[ ] LANG:ba............. Bashkir packages and models
[ ] LANG:bar............ bar packages and models
[ ] LANG:be............. Belarusian packages and models
[ ] LANG:bg............. Bulgarian packages and models
[ ] LANG:bn............. Bengali packages and models
[ ] LANG:bo............. Tibetan packages and models
[ ] LANG:bpy............ bpy packages and models
[ ] LANG:br............. Breton packages and models
[ ] LANG:bs............. Bosnian packages and models
[ ] LANG:ca............. Catalan packages and models
[ ] LANG:ce............. Chechen packages and models
[ ] LANG:ceb............ Cebuano packages and models
[ ] LANG:cs............. Czech packages and models
[ ] LANG:cv............. Chuvash packages and models
[ ] LANG:cy............. Welsh packages and models
[ ] LANG:da............. Danish packages and models
[ ] LANG:de............. German packages and models
[ ] LANG:diq............ diq packages and models
[ ] LANG:dv............. Divehi packages and models
[ ] LANG:el............. Greek packages and models
[P] LANG:en............. English packages and models
[ ] LANG:eo............. Esperanto packages and models
[ ] LANG:es............. Spanish packages and models
[ ] LANG:et............. Estonian packages and models
[ ] LANG:eu............. Basque packages and models
[ ] LANG:fa............. Persian packages and models
[ ] LANG:fi............. Finnish packages and models
[ ] LANG:fo............. Faroese packages and models
[ ] LANG:fr............. French packages and models
[ ] LANG:fy............. Western Frisian packages and models
[ ] LANG:ga............. Irish packages and models
[ ] LANG:gan............ gan packages and models
[ ] LANG:gd............. Scottish Gaelic packages and models
[ ] LANG:gl............. Galician packages and models
[ ] LANG:gu............. Gujarati packages and models
[ ] LANG:gv............. Manx packages and models
[ ] LANG:he............. Hebrew packages and models
[ ] LANG:hi............. Hindi packages and models
[ ] LANG:hif............ hif packages and models
[ ] LANG:hr............. Croatian packages and models
[ ] LANG:hsb............ Upper Sorbian packages and models
[ ] LANG:ht............. Haitian packages and models
[ ] LANG:hu............. Hungarian packages and models
[ ] LANG:hy............. Armenian packages and models
[ ] LANG:ia............. Interlingua packages and models
[ ] LANG:id............. Indonesian packages and models
[ ] LANG:ilo............ Iloko packages and models
[ ] LANG:io............. Ido packages and models
[ ] LANG:is............. Icelandic packages and models
[ ] LANG:it............. Italian packages and models
[ ] LANG:ja............. Japanese packages and models
[ ] LANG:jv............. Javanese packages and models
[ ] LANG:ka............. Georgian packages and models
[ ] LANG:kk............. Kazakh packages and models
[ ] LANG:km............. Khmer packages and models
[ ] LANG:kn............. Kannada packages and models
[ ] LANG:ko............. Korean packages and models
[ ] LANG:ku............. Kurdish packages and models
[ ] LANG:ky............. Kyrgyz packages and models
[ ] LANG:la............. Latin packages and models
[ ] LANG:lb............. Luxembourgish packages and models
[ ] LANG:li............. Limburgish packages and models
[ ] LANG:lmo............ lmo packages and models
[ ] LANG:lt............. Lithuanian packages and models
[ ] LANG:lv............. Latvian packages and models
[ ] LANG:mg............. Malagasy packages and models
[ ] LANG:mk............. Macedonian packages and models
[ ] LANG:ml............. Malayalam packages and models
[ ] LANG:mn............. Mongolian packages and models
[ ] LANG:mr............. Marathi packages and models
[ ] LANG:ms............. Malay packages and models
[ ] LANG:mt............. Maltese packages and models
[ ] LANG:my............. Burmese packages and models
[ ] LANG:ne............. Nepali packages and models
[ ] LANG:nl............. Dutch packages and models
[ ] LANG:nn............. Norwegian Nynorsk packages and models
[ ] LANG:no............. Norwegian packages and models
[ ] LANG:oc............. Occitan packages and models
[ ] LANG:or............. Oriya packages and models
[ ] LANG:os............. Ossetic packages and models
[ ] LANG:pa............. Punjabi packages and models
[ ] LANG:pam............ Pampanga packages and models
[ ] LANG:pl............. Polish packages and models
[ ] LANG:pms............ pms packages and models
[ ] LANG:ps............. Pashto packages and models
[ ] LANG:pt............. Portuguese packages and models
[ ] LANG:qu............. Quechua packages and models
[ ] LANG:rm............. Romansh packages and models
[ ] LANG:ro............. Romanian packages and models
[ ] LANG:ru............. Russian packages and models
[ ] LANG:sa............. Sanskrit packages and models
[ ] LANG:sah............ Sakha packages and models
[ ] LANG:scn............ Sicilian packages and models
[ ] LANG:sco............ Scots packages and models
[ ] LANG:se............. Northern Sami packages and models
[ ] LANG:sh............. Serbo-Croatian packages and models
[ ] LANG:si............. Sinhala packages and models
[ ] LANG:sk............. Slovak packages and models
[ ] LANG:sl............. Slovenian packages and models
[ ] LANG:sq............. Albanian packages and models
[ ] LANG:sr............. Serbian packages and models
[ ] LANG:su............. Sundanese packages and models
[ ] LANG:sv............. Swedish packages and models
[ ] LANG:sw............. Swahili packages and models
[ ] LANG:szl............ szl packages and models
[ ] LANG:ta............. Tamil packages and models
[ ] LANG:te............. Telugu packages and models
[ ] LANG:tg............. Tajik packages and models
[ ] LANG:th............. Thai packages and models
[ ] LANG:tk............. Turkmen packages and models
[ ] LANG:tl............. Tagalog packages and models
[ ] LANG:tr............. Turkish packages and models
[ ] LANG:tt............. Tatar packages and models
[ ] LANG:ug............. Uyghur packages and models
[ ] LANG:uk............. Ukrainian packages and models
[ ] LANG:ur............. Urdu packages and models
[ ] LANG:uz............. Uzbek packages and models
[ ] LANG:vec............ vec packages and models
[ ] LANG:vi............. Vietnamese packages and models
[ ] LANG:vls............ vls packages and models
[ ] LANG:vo............. Volapük packages and models
[ ] LANG:wa............. Walloon packages and models
[ ] LANG:war............ Waray packages and models
[ ] LANG:yi............. Yiddish packages and models
[ ] LANG:yo............. Yoruba packages and models
[ ] LANG:zh............. Chinese packages and models
[ ] LANG:zhc............ Chinese Character packages and models
[ ] LANG:zhw............ zhw packages and models
[ ] TASK:counts2........ counts2
[ ] TASK:embeddings2.... embeddings2
[ ] TASK:ner2........... ner2
[P] TASK:sentiment2..... sentiment2
[ ] TASK:tsne2.......... tsne2
([*] marks installed packages; [P] marks partially installed collections)