Skip to content

Commit

Permalink
v0.7.3 release notes
Browse files Browse the repository at this point in the history
  • Loading branch information
sashafrey committed Oct 28, 2015
1 parent acf8425 commit 2b1997e
Show file tree
Hide file tree
Showing 4 changed files with 106 additions and 2 deletions.
8 changes: 6 additions & 2 deletions docs/download.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ Downloads

* **Windows**

* Latest 32 bit release: `BigARTM_v0.7.2_win32 <https://github.com/bigartm/bigartm/releases/download/v0.7.2/BigARTM_v0.7.2_win32.7z>`_
* Latest 64 bit release: `BigARTM_v0.7.2_x64 <https://github.com/bigartm/bigartm/releases/download/v0.7.2/BigARTM_v0.7.2_x64.7z>`_
* Latest 32 bit release: `BigARTM_v0.7.3_win32 <https://github.com/bigartm/bigartm/releases/download/v0.7.3/BigARTM_v0.7.3_win32.7z>`_
* Latest 64 bit release: `BigARTM_v0.7.3_x64 <https://github.com/bigartm/bigartm/releases/download/v0.7.3/BigARTM_v0.7.3_x64.7z>`_
* All previous releases are available at https://github.com/bigartm/bigartm/releases

Please refer to :doc:`tutorials/windows_basic` for step by step installation procedure.
Expand Down Expand Up @@ -65,6 +65,8 @@ Downloads
* `vocab.mmro.txt (150 KB) <https://s3-eu-west-1.amazonaws.com/artm/vocab.mmro.txt>`_
* `pPMI_w100.mmro.txt.7z (23 MB) <https://s3-eu-west-1.amazonaws.com/artm/pPMI_w100.mmro.txt.7z>`_
* `vw.mmro.txt.7z (1.4 MB) <https://s3-eu-west-1.amazonaws.com/artm/vw.mmro.txt.7z>`_

eurlex `eurlex`_ 19800 21000 * `eurlex_1k (13 MB) <https://s3-eu-west-1.amazonaws.com/artm/eurlex_1k.zip>`_
========= ========= ======= ======= ==================================================================================================================

.. _UCI: https://archive.ics.uci.edu/ml/datasets/Bag+of+Words
Expand All @@ -76,3 +78,5 @@ Downloads
.. _lastfm: http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/

.. _mmro: http://mmro.ru/

.. _eurlex: http://www.ke.tu-darmstadt.de/resources/eurlex
Binary file added docs/release_notes/_images/v0.7.3_perf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/release_notes/index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ Release Notes
v0.7.0
v0.7.1
v0.7.2
v0.7.3
99 changes: 99 additions & 0 deletions docs/release_notes/v0.7.3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
BigARTM v0.7.3 Release notes
============================

BigARTM v0.7.3 releases the following changes:

* New command line tool for BigARTM
* Support for classification in bigartm CLI
* Support for asynchronous processing of batches
* Improvements in coherence regularizer and coherence score
* New *TopicMass* score for phi matrix
* Support for documents markup
* New API for importing batches through memory


New command line tool for BigARTM
---------------------------------

New CLI is named ``bigartm`` (or ``bigrtm.exe`` on Windows),
and it supersedes previous CLI named ``cpp_client``.
New CLI has the following features:

* Parse collection in one of the :doc:`/formats`
* Load dictionary
* Initialize a new model, or import previously created model
* Perform EM-iterations to fit the model
* Export predicted probabilities for all documents into CSV file
* Export model into a file

All command-line options are listed `here <http://docs.bigartm.org/en/latest/tutorials/bigartm_cli.html>`_,
and you may see several exampels on `BigARTM <http://github.com/bigartm/bigartm>`_ page at github.
At the moment full documentation is only available in `Russian <https://github.com/bigartm/bigartm-book/blob/master/junk/cli/BigARTM_CommandLineInferface.ipynb>`_.


Support for classification in BigARTM CLI
-----------------------------------------

BigARTM CLI is now able to perform classification.
The following example assumes that your batches have ``target_class`` modality in addition to the default modality (``@default_class``).

.. code-block:: bash

# Fit model
bigartm.exe --use-batches <your batches>
--use-modality @default_class,target_class
--topics 50
--dictionary-min-df 10
--dictionary-max-df 25%
--save-model model.bin

# Apply model and output to text files
bigartm.exe --use-batches <your batches>
--use-modality @default_class,target_class
--topics 50
--passes 0
--load-model model.bin
--predict-class target_class
--write-predictions pred.txt
--write-class-predictions pred_class.txt
--csv-separator=tab
--score ClassPrecision

Support for asynchronous processing of batches
----------------------------------------------

Asynchronous processing of batches enables applications to
overlap EM-iterations better utilize CPU resources.
The following chart shows CPU utilization of ``bigartm.exe``
with (left-hand side) and without async flag (right-hand side).

.. image:: _images/v0.7.3_perf.png
:alt: BigARTM performance in asynchronous mode


TopicMass score for phi matrix
------------------------------

Topic mass score calculates cumulated topic mass for each topic.
This is a useful metric to monitor balance between topics.

Support for documents markup
------------------------------

Document markup provides topic distribution for each word in a document.
Since BigARTM v0.7.3 it is posible to extract this information to use it.
A potential application includes color-highlighted maps of the document,
where every work is colored according to the most probable topic of the document.

In the code this feature is refered to as ``ptdw`` matrix.
It is possible to extract and regularizer ``ptdw`` matrices.
In future versions it will be also possible to calculate scores based on ``ptdw`` matrix.

New API for importing batches through memory
--------------------------------------------

New low-level APIs ``ArtmImportBatches`` and ``ArtmDisposeBatches``
allow to import batches from memory into BigARTM.
Those batches are saved in BigARTM, and can be used for batches processing.


0 comments on commit 2b1997e

Please sign in to comment.