Skip to content

Commit

Permalink
Merge pull request #149 from sashafrey/master
Browse files Browse the repository at this point in the history
Documentation whitepaper - BigARTM as a Service [skip ci]
  • Loading branch information
bigartm committed Mar 8, 2015
2 parents 4c3747c + d7cba68 commit 3b171a7
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 2 deletions.
Binary file added docs/stories/_images/cloud_service.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/stories/_images/theta_update.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 22 additions & 0 deletions docs/stories/cloud_service.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
BigARTM as a Service
====================

The following diagram shows a suggested topology for a query service that involve topic modelling on Big Data.

.. image:: _images/cloud_service.png
:alt: cloud_service

Here the main use for Hadoop / MapReduce is to process your Big Unstructured Data into a compact bag-of-words representation.
Due to out-of-core design and extreme performance BigARTM will be able to handle this data on a single compute-optimized node.
The resulting topic model should be replicated on all query instances that serve user requests.

To avoid query-time dependency on BigARTM component you may want to infer topic distributions ``theta_{td}`` for new documents in your code.
This can be done as follows. Start from uniform topic assigment ``theta_{td} = 1 / |T|`` and update it in the following loop:

.. image:: _images/theta_update.png
:alt: theta_update

where ``n_dw`` is the number of word ``w`` occurences in document ``d``, ``phi_wt`` is an element of the Phi matrix.
In BigARTM the loop is repeated :attr:`ModelConfig.inner_iterations_count` times (defaulst to ``10``).
To precisely replicate BigARTM behavior one needs to account for class weights and include regularizers.
Please contact us if you need more details.
5 changes: 3 additions & 2 deletions docs/stories/index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@

.. _stories:

Advanced experiments
====================
Whitepapers
===========

.. toctree::
:maxdepth: 2

experiment02_artm
cloud_service

0 comments on commit 3b171a7

Please sign in to comment.