Skip to content

Commit

Permalink
Merge branch 'testing'
Browse files Browse the repository at this point in the history
  • Loading branch information
severinsimmler committed Nov 16, 2017
2 parents 2341326 + 2e87139 commit 05dbbae
Show file tree
Hide file tree
Showing 7 changed files with 17 additions and 19 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Topics – Easy Topic Modeling in Python

[Topics](http://dev.digital-humanities.de/ci/job/DARIAH-Topics/doclinks/1/) is a Python library for Text Mining and Topic Modeling. Furthermore, this repository provides a convenient, modular workflow that can be entirely controlled from within and which comes with a well documented [Jupyter](http://jupyter.org/) notebook. Users not yet familiar with programming in Python can test basic Topic Modeling in a [Flask](http://flask.pocoo.org/)-based [GUI demonstrator](/demonstrator/README.md). For a standalone application, which does not require a Python interpreter or any extra installations, have a look at the [release-section](https://github.com/DARIAH-DE/Topics/releases).
[Topics](http://dev.digital-humanities.de/ci/job/DARIAH-Topics/doclinks/1/) is a Python library for Text Mining and Topic Modeling. Furthermore, this repository provides a convenient, modular workflow that can be entirely controlled from within and which comes with a well documented [Jupyter](http://jupyter.org/) notebook. Users not yet familiar with programming in Python can test basic Topic Modeling in a [Flask](http://flask.pocoo.org/)-based [GUI demonstrator](/demonstrator/README.md). **For a standalone application**, which does not require a Python interpreter or any extra installations, **have a look at the [release-section](https://github.com/DARIAH-DE/Topics/releases)**.

At the moment, this library supports three LDA implementations:
* [lda](http://pythonhosted.org/lda/index.html), which is lightweight and provides basic LDA.
Expand All @@ -11,7 +11,7 @@ At the moment, this library supports three LDA implementations:
* [Topics website](http://dev.digital-humanities.de/ci/job/DARIAH-Topics/doclinks/1/)
* [Topics API documentation](http://dev.digital-humanities.de/ci/job/DARIAH-Topics/doclinks/1/docs/gen/modules.html)
* [Topics paper](https://dh2017.adho.org/abstracts/411/411.pdf)
* [Demonstrator releases](https://github.com/DARIAH-DE/Topics/releases)
* **[Standalone Demonstrator releases](https://github.com/DARIAH-DE/Topics/releases)**
* [An introduction to Topic Modeling using lda](IntroducingLda.ipynb)
* [An introduction to Topic Modeling using MALLET](IntroducingMallet.ipynb)
* [An introduction to Topic Modeling using Gensim](IntroducingGensim.ipynb)
Expand Down
28 changes: 14 additions & 14 deletions demonstrator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,27 @@ This web application introduces an user-friendly workflow, basically containing

![Demonstrator Screenshot](screenshot.png)


## First steps

**Important**: Please make sure all dependencies are properly installed, including the `dariah_topics` module. If not (or you are not sure), simply run `pip install -r requirements.txt` (or `pip3 install -r requirements.txt` if you are on an [UNIX-based](https://en.wikipedia.org/wiki/Unix) operating system like macOS or Linux Ubuntu) through the [command-line](https://en.wikipedia.org/wiki/Command-line_interface) within `Topics`.
**Only Linux**: Please make sure all dependencies are properly installed, including the `dariah_topics` module. If not (or you are not sure), simply run `pip3 install -r requirements.txt` in the [command-line](https://en.wikipedia.org/wiki/Command-line_interface) within `Topics`.

### Running the application
To run the application, type `python demonstrator.py` (or `python3 demonstrator.py` for UNIX) in the command-line and press enter. Your default browser should immediately display the interface (it might take some seconds until your browser automatically opens – if not, do it by yourself and go to `http://127.0.0.1:5000`).<br>

**Important**: This application aims for simplicity and usability. If you are working with a large corpus (> 200 documents) you may wish to use more sophisticated topic models such as those implemented in MALLET, which is known to be more robust than standard LDA. Have a look at our Jupyter notebook [introducing topic modeling with MALLET](https://github.com/DARIAH-DE/Topics/blob/testing/IntroducingMallet.ipynb).<br>
This application aims for simplicity and usability. If you are working with a large corpus (> 200 documents) you may wish to use more sophisticated topic models such as those implemented in MALLET, which is known to be more robust than standard LDA. Have a look at our Jupyter notebook [introducing topic modeling with MALLET](https://github.com/DARIAH-DE/Topics/blob/testing/IntroducingMallet.ipynb).<br>

**Hint**: To gain better results, it is highly recommended to use one of the provided [stopword lists](https://github.com/DARIAH-DE/Topics/blob/master/tutorial_supplementals/stopwords). Removing the most frequent words is a dangerous game, because you might remove quite important words.

#### Windows and macOS
Although this application is built with Python, it is possible to run it as if it was a native application, without having to install Python or any related packages. There is currently one build for Windows and macOS, respectively.

1. Download `demonstrator-0.0.1-windows.zip` or `demonstrator-0.0.1-mac.zip` from the [release-section](https://github.com/DARIAH-DE/Topics/releases).
2. Open it by double-clicking.
3. Run the app by double-clicking.
4. **Mac**: If you get an error message saying that the file is from an “unidentified developer”, you can override it by holding control while double-clicking. The error message will still appear, but you will be given an option to run the file anyway.

#### Linux
To run the application, type `python3 demonstrator.py` in the command-line and press enter. Your default browser should immediately display the interface (it might take some seconds until your browser automatically opens – if not, do it by yourself and go to `http://127.0.0.1:5000`).

### Handling the application
The application behaves just like any other website. Basically, there are only two sites: one to select text files and make some more adjustments, and one to show what your topic model has generated. Once clicked the `Send`-button, all generated data will be stored in the cache and you can jump between the pages without losing any data. **But be careful**, once you clicked the `Send`-button again, all of the previous data will be lost.

Expand All @@ -26,16 +36,6 @@ If you are confronted with any issues, please use `Issues` [on GitHub](https://g
- In case you want to jump from the output site back to the first page, but your browser displays a blank page, press the reload button. Jumping between sites should be possible within seconds, in any other cases something went wrong.
- If you get a `ModuleNotFoundError`-error, your dependencies are probably not up-to-date. Try running `pip install -r requirements.txt` (or `pip3 install -r requirements.txt` for UNIX) in the command-line within `Topics`.


## Stand-alone application
Although this application is built with Python, it is possible to run it as if it was a native application, without having to install Python or any related packages. There is currently one build for Windows and macOS, respectively.

### Running the stand-alone application
1. Download `demonstrator-0.0.1-windows.zip` or `demonstrator-0.0.1-mac.zip` from the [release-section](https://github.com/DARIAH-DE/Topics/releases/tag/0.0.1).
2. Open it by double-clicking.
3. Run the app by double-clicking.
4. **Mac**: If you get an error message saying that the file is from an “unidentified developer”, you can override it by holding control while double-clicking. The error message will still appear, but you will be given an option to run the file anyway.

## Creating a build
To create a stand-alone application, you need to install `pyinstaller` and run:

Expand Down
Binary file removed demonstrator/flower.ico
Binary file not shown.
Binary file removed demonstrator/flower.png
Binary file not shown.
Binary file added demonstrator/icon.ico
Binary file not shown.
Binary file added demonstrator/icon.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 1 addition & 3 deletions demonstrator/templates/result.html
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,7 @@ <h3>1.1. Topics</h3>
{% for table in topics %} {{ table|safe }} {% endfor %}
<br>
<h3>1.2. Topics in documents</h3>
<p>Each topic has a certain probability for each document in the corpus. This probability distributions are visualized in an interactive <b>heatmap</b> (the darker the color, the higher the probability) which displays the kind of information
that is presumably most useful to literary scholars. Going beyond pure exploration, this visualization can be used to show thematic developments over a set of texts as well as a single text, akin to a dynamic topic model. What might become
apparent here, is that some topics correlate highly with a specific author or group of authors, while other topics correlate highly with a specific text or group of texts.</p><br>
<p>The heatmap option displays the kind of information that is probably most useful to literary scholars. Going beyond pure exploration, this visualization can be used to show thematic developments over a set of texts as well as a single text, akin to a dynamic topic model. What also can become apparent here, is that some topics correlate highly with a specific author or group of authors, while other topics correlate highly with a specific text or group of texts. All in all, this displays two of LDA's properties - its use as a distant reading tool that aims to get at text meaning, and its use as a provider of data that can be further used in computational analysis, such as document classification or authorship attribution.</p><br>
{{ div|safe }}<br>
<h2>2. Getting deeper into topic modeling</h2>
<p>We want to empower users with little or no previous experience and programming skills to create custom workflows mostly using predefined functions within a familiar environment. So, if this practical introduction aroused your interest and
Expand Down

0 comments on commit 05dbbae

Please sign in to comment.