-
Notifications
You must be signed in to change notification settings - Fork 71
Installing 4CAT
You can create your own 4CAT instance! While a user-friendly install script is still on the todo list, the instructions below should help you setting up your own instance of the tool.
4CAT has two components, the backend and the web tool. These share some bits of code and a configuration file but apart from that they run independently. Communication between the two happens via a PostgreSQL database.
It is recommended that you run 4CAT on a UNIX-like system (e.g. Linux or MacOS). 4CAT further requires Python 3.7 (lower versions may work but are not supported) and PostgreSQL 9.5.
4CAT uses Sphinx as its search backend. This guide assumes you have a local instance of Sphinx running. Refer to the Sphinx website for further instructions on how to set this up.
Hardware-wise, it is recommended that you store both the database and Sphinx indexes on a reasonably fast SSD disk. Furthermore, it is recommended that your server has about 8GB RAM per 100 million posts stored (this is a rule of thumb; your mileage may vary).
Clone the repository somewhere:
git clone https://www.github.com/digitalmethodsinitiative/4cat.git
After cloning the repository, copy config.py-example to config.py and edit
the file to match your machine's configuration. The various options are
explained in the file itself:
cd 4cat
cp config.py-example config.py
nano config.py
Next, install the dependencies. While in the 4CAT root folder, run:
pip3 install -r requirements.txt
This should take care of installing the required third-party libraries.
Next, you should make sure a database is available for 4CAT. 4CAT requires a
PostgreSQL database to store dataset metadata, the job queue and other assorted
data. You should create the database yourself, and add the database login
details to config.py. After doing so, run the following command to create the
tables, indices, et cetera, required by 4CAT:
psql --user=[username] --dbname=[database name] < backend/database.sql
Replace [username] and [database name] with the relevant values.
You can now run 4CAT.
The backend is run as a daemon that can be started and stopped using the
included 4cat-daemon.py script:
python3 4cat-daemon.py start
Other valid arguments are stop, restart and status. Note that if you
change any configuration options, you will need to restart the daemon for the
changes to take effect.
Note: The 4CAT was made to run on a UNIX-like system and the above will not
work on Windows. Instead, running 4cat-daemon.py on Windows will start the
4CAT backend in the terminal window, regardless of the argument given. The
backend can then be quit by entering q, followed by enter.
The web tool is a Flask app. It is recommended that you run the web tool as a WSGI module: see the Flask documentation for more details. For testing and development, you can run the Flask app locally from the command line:
FLASK_APP=webtool flask run
With the default configuration, you can now navigate to
http://localhost:5000 where you'll find the web tool that allows you to query
the database and create datasets.
4CAT requires one or more data sources to function. By default, all data sources that
acquire data from third parties are enabled, and these should work out-of-the-box.
Others may be enabled on demand. Notably, the fourchan data source allows
interfacing with a locally stored 4chan archive that can be searched with the
Sphinx full-text search engine.
Data sources may be enabled by adding them to the DATASOURCES configuration
variable in config.py. Refer to the data sources' own README.mds for more
information on how to configure individual data sources.
While by default the web tool and backend run on the same server, you could set things up so that they run on separate servers instead. Simply only start the backend on one server, and the frontend on the other. If you configure the front end to connect to the database on another server (or vice versa), the backend and front end will be able to communicate.