Skip to content

Installing 4CAT

Stijn Peeters edited this page Jan 13, 2020 · 89 revisions

Install and run 4CAT

You can create your own 4CAT instance! While a user-friendly install script is still on the todo list, the instructions below should help you setting up your own instance of the tool.

Overview

4CAT has two components, the backend and the web tool. These share some bits of code and a configuration file but apart from that they run independently. Communication between the two happens via a PostgreSQL database.

Installation

Requirements

It is recommended that you run 4CAT on a UNIX-like system (e.g. Linux or MacOS). 4CAT further requires Python 3.7 (lower versions may work but are not supported) and PostgreSQL 9.5.

4CAT uses Sphinx as its search backend. This guide assumes you have a local instance of Sphinx running. Refer to the Sphinx website for further instructions on how to set this up.

Hardware-wise, it is recommended that you store both the database and Sphinx indexes on a reasonably fast SSD disk. Furthermore, it is recommended that your server has about 8GB RAM per 100 million posts stored (this is a rule of thumb; your mileage may vary).

Instructions

Clone the repository somewhere:

git clone https://www.github.com/digitalmethodsinitiative/4cat.git

After cloning the repository, copy config.py-example to config.py and edit the file to match your machine's configuration. The various options are explained in the file itself:

cd 4cat
cp config.py-example config.py
nano config.py

Next, install the dependencies. While in the 4CAT root folder, run:

pip3 install -r requirements.txt

This should take care of installing the required third-party libraries.

Next, you should make sure a database is available for 4CAT. 4CAT requires a PostgreSQL database to store dataset metadata, the job queue and other assorted data. You should create the database yourself, and add the database login details to config.py. After doing so, run the following command to create the tables, indices, et cetera, required by 4CAT:

psql --user=[username] --dbname=[database name] < backend/database.sql

Replace [username] and [database name] with the relevant values.

You can now run 4CAT.

Running 4CAT

Running the backend

The backend is run as a daemon that can be started and stopped using the included 4cat-daemon.py script:

python3 4cat-daemon.py start

Other valid arguments are stop, restart and status. Note that if you change any configuration options, you will need to restart the daemon for the changes to take effect.

Note: The 4CAT was made to run on a UNIX-like system and the above will not work on Windows. Instead, running 4cat-daemon.py on Windows will start the 4CAT backend in the terminal window, regardless of the argument given. The backend can then be quit by entering q, followed by enter.

Running the web tool

The web tool is a Flask app. It is recommended that you run the web tool as a WSGI module: see the Flask documentation for more details. For testing and development, you can run the Flask app locally from the command line:

FLASK_APP=webtool flask run

With the default configuration, you can now navigate to http://localhost:5000 where you'll find the web tool that allows you to query the database and create datasets.

Acquiring data

4CAT requires one or more data sources to function. By default, all data sources that acquire data from third parties are enabled, and these should work out-of-the-box. Others may be enabled on demand. Notably, the fourchan data source allows interfacing with a locally stored 4chan archive that can be searched with the Sphinx full-text search engine.

Data sources may be enabled by adding them to the DATASOURCES configuration variable in config.py. Refer to the data sources' own README.mds for more information on how to configure individual data sources.

Separating the backend and web tool

While by default the web tool and backend run on the same server, you could set things up so that they run on separate servers instead. Simply only start the backend on one server, and the frontend on the other. If you configure the front end to connect to the database on another server (or vice versa), the backend and front end will be able to communicate.

Clone this wiki locally