-
Notifications
You must be signed in to change notification settings - Fork 117
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #141 from sashafrey/master
Refactor BigARTM tutorial [skip ci]
- Loading branch information
Showing
14 changed files
with
568 additions
and
409 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,36 +1,62 @@ | ||
Download | ||
======== | ||
|
||
* Windows - latest release | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.8/BigARTM_v0.5.8_win32.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.8/BigARTM_v0.5.8_x64.7z | ||
|
||
* Windows - previous releases | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.7/BigARTM_v0.5.7_win32.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.7/BigARTM_v0.5.7_x64.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.6/BigARTM_v0.5.6_win32.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.6/BigARTM_v0.5.6_x64.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.5/BigARTM_v0.5.5_win32.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.5/BigARTM_v0.5.5_x64.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.4/BigARTM_v0.5.4_win32.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.4/BigARTM_v0.5.4_x64.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.3/BigARTM_v0.5.3_win32.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.3/BigARTM_v0.5.3_x64.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.2/BigARTM_v0.5.2_win32.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.2/BigARTM_v0.5.2_x64.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.1/BigARTM_v0.5.1_win32.7z | ||
* https://github.com/bigartm/bigartm/releases/download/v0.5.1/BigARTM_v0.5.1_x64.7z | ||
|
||
Please refer to :doc:`tutorial` chapter for installation guide. | ||
|
||
* Linux, Mac OS-X | ||
Currently there is distribution package for Linux or Mac OS-X. | ||
To run BigARTM you need to download the source code and built it on your machine. | ||
Detailed procesure is available in :doc:`tutorial` and :doc:`devguide` chapters. | ||
|
||
Other tools | ||
----------- | ||
|
||
* 7-zip -- http://www.7-zip.org/a/7z920-x64.msi | ||
* Python 2.7.9, 64 bit -- https://www.python.org/ftp/python/2.7.9/python-2.7.9.amd64.msi | ||
* Python 2.7.9, 32 bit -- https://www.python.org/ftp/python/2.7.9/python-2.7.9.msi | ||
Downloads | ||
========= | ||
|
||
* **Windows** | ||
|
||
* Latest 32 bit release: https://github.com/bigartm/bigartm/releases/download/v0.5.8/BigARTM_v0.5.8_win32.7z | ||
* Latest 64 bit release: https://github.com/bigartm/bigartm/releases/download/v0.5.8/BigARTM_v0.5.8_x64.7z | ||
* All previous releases are available at https://github.com/bigartm/bigartm/releases | ||
|
||
Please refer to :doc:`tutorials/windows_basic` for step by step installation procedure. | ||
|
||
* **Linux, Mac OS-X** | ||
|
||
To run BigARTM on Linux and Mac OS-X you need to clone BigARTM repository | ||
(https://github.com/bigartm/bigartm) and build it as described in | ||
:doc:`tutorials/linux_basic`. | ||
|
||
* **Datasets** | ||
|
||
========= ========= ======= ======= ================================================================================================================== | ||
Task Source #Words #Items Files | ||
========= ========= ======= ======= ================================================================================================================== | ||
kos `UCI`_ 6906 3430 * `docword.kos.txt.gz (1 MB) <https://s3-eu-west-1.amazonaws.com/artm/docword.kos.txt.gz>`_ | ||
* `vocab.kos.txt (54 KB) <https://s3-eu-west-1.amazonaws.com/artm/vocab.kos.txt>`_ | ||
* `kos_1k (700 KB) <https://s3-eu-west-1.amazonaws.com/artm/kos_1k.7z>`_ | ||
* `kos_dictionary <https://s3-eu-west-1.amazonaws.com/artm/kos_dictionary>`_ | ||
|
||
|
||
nips `UCI`_ 12419 1500 * `docword.nips.txt.gz (2.1 MB) <https://s3-eu-west-1.amazonaws.com/artm/docword.nips.txt.gz>`_ | ||
* `vocab.nips.txt (98 KB) <https://s3-eu-west-1.amazonaws.com/artm/vocab.nips.txt>`_ | ||
* `nips_200 (1.5 MB) <https://s3-eu-west-1.amazonaws.com/artm/nips_200.7z>`_ | ||
* `nips_dictionary <https://s3-eu-west-1.amazonaws.com/artm/nips_dictionary>`_ | ||
|
||
enron `UCI`_ 28102 39861 * `docword.enron.txt.gz (11.7 MB) <https://s3-eu-west-1.amazonaws.com/artm/docword.enron.txt.gz>`_ | ||
* `vocab.enron.txt (230 KB) <https://s3-eu-west-1.amazonaws.com/artm/vocab.enron.txt>`_ | ||
* `enron_1k (7.1 MB) <https://s3-eu-west-1.amazonaws.com/artm/enron_1k.7z>`_ | ||
* `enron_dictionary <https://s3-eu-west-1.amazonaws.com/artm/enron_dictionary>`_ | ||
|
||
nytimes `UCI`_ 102660 300000 * `docword.nytimes.txt.gz (223 MB) <https://s3-eu-west-1.amazonaws.com/artm/docword.nytimes.txt.gz>`_ | ||
* `vocab.nytimes.txt (1.2 MB) <https://s3-eu-west-1.amazonaws.com/artm/vocab.nytimes.txt>`_ | ||
* `nytimes_1k (131 MB) <https://s3-eu-west-1.amazonaws.com/artm/nytimes_1k.7z>`_ | ||
* `nytimes_dictionary <https://s3-eu-west-1.amazonaws.com/artm/nytimes_dictionary>`_ | ||
|
||
pubmed `UCI`_ 141043 8200000 * `docword.pubmed.txt.gz (1.7 GB) <https://s3-eu-west-1.amazonaws.com/artm/docword.pubmed.txt.gz>`_ | ||
* `vocab.pubmed.txt (1.3 MB) <https://s3-eu-west-1.amazonaws.com/artm/vocab.pubmed.txt>`_ | ||
* `pubmed_10k (1 GB) <https://s3-eu-west-1.amazonaws.com/artm/pubmed_10k.7z>`_ | ||
* `pubmed_dictionary <https://s3-eu-west-1.amazonaws.com/artm/pubmed_dictionary>`_ | ||
|
||
wiki `Gensim`_ 100000 3665223 * `enwiki-20141208_10k (1.2 GB) <https://s3-eu-west-1.amazonaws.com/artm/enwiki-20141208_10k.7z>`_ | ||
* `enwiki-20141208_1k (1.4 GB) <https://s3-eu-west-1.amazonaws.com/artm/enwiki-20141208_1k.7z>`_ | ||
* `enwiki-20141208_dictionary (3.6 MB) <https://s3-eu-west-1.amazonaws.com/artm/enwiki-20141208_dictionary>`_ | ||
|
||
wiki_enru `Wiki`_ 196749 216175 * `wiki_enru (282 MB) <https://s3-eu-west-1.amazonaws.com/artm/wiki_enru.7z>`_ | ||
* `wiki_enru_dictionary (5.3 MB) <https://s3-eu-west-1.amazonaws.com/artm/wiki_enru_dictionary>`_ | ||
* class_id(s): ``@english``, ``@russian`` | ||
========= ========= ======= ======= ================================================================================================================== | ||
|
||
.. _UCI: https://archive.ics.uci.edu/ml/datasets/Bag+of+Words | ||
|
||
.. _Gensim: http://radimrehurek.com/gensim/wiki.html | ||
|
||
.. _Wiki: http://dumps.wikimedia.org |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
.. BigARTM documentation master file, created by | ||
sphinx-quickstart on Sun Jul 13 20:00:11 2014. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
|
||
.. _legacy_pages: | ||
|
||
Legacy documentation pages | ||
========================== | ||
|
||
Legacy pages are kept to preserve existing user's links (favourites in browser, etc). | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
tutorial |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,4 +17,6 @@ BigARTM Reference | |
c_interface | ||
cpp_interface | ||
cpp_client | ||
windows_distribution | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
==================== | ||
Windows distribution | ||
==================== | ||
|
||
This chapter describes content of BigARTM distribution package for Windows, available at https://github.com/bigartm/bigartm/releases. | ||
|
||
=========================== ========================================================== | ||
``bin/`` | Precompiled binaries of BigARTM for Windows. | ||
| This folder must be added to ``PATH`` system variable. | ||
|
||
``bin/artm.dll`` | Core functionality of the BigARTM library. | ||
|
||
``bin/node_controller.exe`` | Executable that hosts BigARTM nodes in a distributed | ||
| setting. | ||
|
||
``bin/cpp_client.exe`` | Command line utility allows to perform simple experiments | ||
| with BigARTM. Remember that not all BigARTM features are | ||
| available through cpp_client, but it can serve as a good | ||
| starting point to learn basic functionality. For further | ||
| details refer to :doc:`/ref/cpp_client`. | ||
|
||
``protobuf/`` | A minimalistic version of Google Protocol Buffers | ||
| (https://code.google.com/p/protobuf/) | ||
| library, required to run BigARTM from Python. | ||
| To setup this package follow the instructions in | ||
| ``protobuf/python/README`` file. | ||
|
||
``python/artm/`` | Python programming interface to BigARTM library. | ||
| This folder must be added to ``PYTHONPATH`` | ||
| system variable. | ||
|
||
``library.py`` | Implements all classes of BigARTM python interface. | ||
|
||
``messages_pb2.py`` | Contains all protobuf messages that can be transfered in | ||
| and out BigARTM core library. Most common features are | ||
| exposed with their own API methods, so normally you | ||
| do not use python protobuf messages to operate BigARTM. | ||
|
||
``python/examples/`` | Python examples of how to use BigARTM: | ||
|
||
* `example01_synthetic_collection.py <https://raw.githubusercontent.com/bigartm/bigartm/master/src/python/examples/example01_synthetic_collection.py>`_ | ||
|
||
* `example02_parse_collection.py <https://raw.githubusercontent.com/bigartm/bigartm/master/src/python/examples/example02_parse_collection.py>`_ | ||
|
||
* `example03_concurrency.py <https://raw.githubusercontent.com/bigartm/bigartm/master/src/python/examples/example03_concurrency.py>`_ | ||
|
||
* `example04_online_algorithm.py <https://raw.githubusercontent.com/bigartm/bigartm/master/src/python/examples/example04_online_algorithm.py>`_ | ||
|
||
* `example05_train_and_test_stream.py <https://raw.githubusercontent.com/bigartm/bigartm/master/src/python/examples/example05_train_and_test_stream.py>`_ | ||
|
||
* `example06_use_dictionaries.py <https://raw.githubusercontent.com/bigartm/bigartm/master/src/python/examples/example06_use_dictionaries.py>`_ | ||
|
||
* `example07_master_component_proxy.py <https://raw.githubusercontent.com/bigartm/bigartm/master/src/python/examples/example07_master_component_proxy.py>`_ | ||
|
||
* `example08_network_modus_operandi.py <https://raw.githubusercontent.com/bigartm/bigartm/master/src/python/examples/example08_network_modus_operandi.py>`_ | ||
|
||
| Files ``docword.kos.txt`` and ``vocab.kos.txt`` represent | ||
| a simple collection of text files in Bag-Of-Words format. | ||
| The files are taken from UCI Machine Learning Repository | ||
| (https://archive.ics.uci.edu/ml/datasets/Bag+of+Words). | ||
|
||
``src/`` | Several programming interfaces to BigARTM library. | ||
|
||
``src/c_interface.h`` | :doc:`Low-level BigARTM interface </ref/c_interface>` in C. | ||
|
||
``cpp_interface.h,cc`` | :doc:`C++ interface of BigARTM </ref/cpp_interface>` | ||
|
||
``messages.pb.h,cc`` | Protobuf messages for C++ interface | ||
|
||
``messages.proto`` | Protobuf description for all messages that appear in the | ||
| API of BigARTM. Documented :doc:`here </ref/messages>`. | ||
|
||
``LICENSE`` License file of BigARTM. | ||
=========================== ========================================================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.