LDA Topic Modeling for Polar Data Insights
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
notebooks
.gitignore
Dockerfile
README.md
environment.yml

README.md

pdi-topics

LDA topic modeling for Polar Deep Insights.

Build and start a docker image

docker build -t pdi-topics .

Now to run it we can do

docker run -d -t -p 8888:8888 --name pdi-topics pdi-topics

or if we want to run notebooks from a particular location we can just mount a volume

docker run -d -t -p 8888:8888 -v $MY_LOCAL_PATH:/opt/pdi-topics/notebooks --name pdi-topics pdi-topics

You'll need the jupyter token in order to access the notebooks, you can get it by inspecting the logs in the docker container

docker logs pdi-topics

Running pdi-topics on a local Solr index with Sparkler data

  1. Follow steps on https://github.com/USCDataScience/sparkler to run Sparkler on a seed url or file.
  2. After execution completes, you can find the data indexed on http://localhost:8983/solr/#/crawldb/query
  3. Build the docker image and run it using the following command. You need to replace HOST-IP with your system’s IP address.
docker run -d -t --add-host=docker:{HOST-IP} -p 8888:8888 --name pdi-topics pdi-topics
  1. Run sparkler-pdi-topics.ipynb and sparkler-pdi-scikit-topics.ipynb notebooks to view results for Sparkler data.

Using Conda environments

If we want to avoid using Docker we can also run the topic notebooks by creating an environment using conda3 or miniconda3

conda env create -f environment.yml

now to use the notebooks we need to activate the environment and run jupyter

source activate pdi-topics
jupyter notebook --allow-root --notebook-dir=$MY_DIR --ip='*' --port=8888 --no-browser