-
Notifications
You must be signed in to change notification settings - Fork 117
/
intro.txt
100 lines (77 loc) · 3.65 KB
/
intro.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
Introduction
============
.. warning::
Please note that this is a beta version of the BigARTM library
which is still undergoing final testing before its official release.
Should you encounter any bugs, lack of functionality or
other problems with our library, please let us know immediately.
Your help in this regard is greatly appreciated.
This is the documentation for the BigARTM library.
BigARTM is a tool to infer `topic models`_,
based on a novel technique called
`Additive Regularization of Topic Models`_.
This technique effectively builds multi-objective models
by adding the weighted sums of regularizers to the optimization criterion.
BigARTM is known to combine well
very different objectives, including sparsing, smoothing, topics decorrelation and many others.
Such combinations of regularizers significantly improves
several quality measures at once almost without any loss of the perplexity.
.. sidebar:: Getting help
Having trouble? We'd like to help!
* Try the :doc:`FAQ <faq>` -- it's got answers to many common questions.
* Looking for specific information? Try the :ref:`genindex`,
or :ref:`search`.
* Search for information in the archives of the `bigartm-users`_ mailing list, or
`post a question`_.
* Report bugs with BigARTM in our `ticket tracker`_.
.. _bigartm-users: https://groups.google.com/group/bigartm-users
.. _post a question: https://groups.google.com/d/forum/bigartm-users
.. _ticket tracker: https://github.com/bigartm/bigartm/issues
**Online**.
BigARTM never stores the entire text collection
in the main memory. Instead the collection is split into
small chunks called 'batches', and BigARTM always loads a limited
number of batches into memory at any time.
**Parallel**.
BigARTM can concurrently process several batches,
and by doing so it substantially improves the throughput
on multi-core machines. The library hosts all computation
in several threads withing a single process,
which enables efficient usage of shared memory across application threads.
**Distributed**.
BigARTM is able to distribute all CPU-intensive
processing onto several machines, interconnected by network.
We aim to scale up to hundreds of machines,
but the real scalability have not been fully tested yet.
**Extensible API**.
BigARTM comes with an API in Python,
but can be easily extended for all other languages
that have an implementation of `Google Protocol Buffers`_.
**Cross-platform**.
BigARTM is known to be compatible with gcc,
clang and the Microsoft
compiler (VS 2012). We have tested our library on Windows, Ubuntu
and Fedora.
**Open source**.
BigARTM is released under the `New BSD License`_.
If you plan to use our library commercially, please beware that
BigARTM depends on ZeroMQ. Please, make sure to review
`ZeroMQ license`_.
.. _Additive Regularization of Topic Models: http://www.machinelearning.ru/wiki/images/1/1f/Voron14aist.pdf
.. _topic models: http://en.wikipedia.org/wiki/Topic_model
.. _Google Protocol Buffers: https://code.google.com/p/protobuf/
.. _New BSD license: http://opensource.org/licenses/BSD-3-Clause
.. _ZeroMQ license: http://zeromq.org/area:licensing
=========================================
**Acknowledgements**.
BigARTM project is supported by Russian Foundation for Basic Research (grants 14-07-00847, 14-07-00908, 14-07-31176),
Skolkovo Institute of Science and Technology (project 081-R), Moscow Institute of Physics and Technology.
.. image:: _images/sponsors_RFBR.png
:alt: RFBR
:target: http://www.rfbr.ru/rffi/eng/about
.. image:: _images/sponsors_skoltech.png
:alt: Skoltech
:target: http://www.skoltech.ru/en
.. image:: _images/sponsors_MIPT.png
:alt: MIPT
:target: http://mipt.ru/en/