Skip to content
Alex edited this page Feb 18, 2015 · 4 revisions

Welcome to the RedVis wiki! This page will provide some information on how to run and view the RedVis data visualization tool.

Running RedVis

To run the visualization, data is required. As the paper explains, this is gathered by crawling Reddit comments. To do so, run the crawler.py script with python crawler.py, or ./crawler.py if you are on a *nix system. This script can be run for as long as necessary. For the dataset we provided, it ran continuously for a week. The script can be stopped with ctrl+c, which will let the script automatically store its results to results.json. If the script gets started again later with results.json present, it will try to pick up where it left off, so running it continuously is not necessary.

Second, the server needs to be hosted. For this, another Python script is provided, server.py. Again, run the file with either python server.py or ./server.py. This will start up an HTTP server which can be accessed through port 8000.

Viewing RedVis

Because crawling Reddit for several days before the data can be viewed is not possible for some, or simply not desired, we have set up a server with crawling data provided. This server can be accessed on baft.asuscomm.com:8000. Due to issues with security, this host has been taken offline. The program can still be hosted locally.

Note: The program is extremely CPU-limited, and the server provided is not the fastest, so retrieving results can take up to a minute. Another approach would be to grab the results.json from that server and run it locally. To do this, follow these steps:

  1. Clone this repository in a folder we'll call dir.
  2. Download results.json, and put it in dir. NOTE: This file is around 120MB, so downloading it will take a few minutes.
  3. From a shell or command prompt, change directory in to dir (with cd (*Nix) or dir (Windows)).
  4. Run the command python server.py.
  5. Open localhost:8000 in your favorite browser. The application was tested using Chrome.

After opening a browser window with the correct url (localhost:8000 or the one provided), an empty screen with a few controls will be presented. In the background, it will try to load a list of all subreddits that have been crawled, as to provide auto-completion. In the right-most input field the name of the subreddit to be filtered can be entered. Once all the subreddits have been loaded from the server, a dropdown will be present, and auto-completion of the subreddits will be supported.

The center field lets you state the maximum number of subreddits to retrieve a diagram for. On baft.asuscomm.com:8000, this is capped to 100, as not to overload the server. If the script cannot find any more subreddits related to the provided subreddit, it will simply return that dataset.

The left-most field lets us specify a minimum number of edges each relation needs. This means that if it is set to 10, and only 9 users are shared for a specific relation, this relation is ignored in our diagram.

Once the fields have been set (all have default values, so it's not needed to set every one of them), pressing update will request the server to calculate a new dependency matrix. As mentioned earlier, this might take a bit to compute, but it will result in a Chord Diagram presented to the user for the subreddit they requested.

Clone this wiki locally