-
Notifications
You must be signed in to change notification settings - Fork 71
Installing 4CAT
You can create your own 4CAT instance! While a user-friendly install script is still on the todo list, the instructions below should help you setting up your own instance of the tool.
4CAT has two components, the backend and the web tool. These share some bits of code and a configuration file but apart from that they run independently. Communication between the two happens via a PostgreSQL database.
It is recommended that you run 4CAT on a UNIX-like system (e.g. Linux or MacOS). 4CAT further requires Python 3.7 (lower versions may work but are not supported) and PostgreSQL 9.5. For Windows, we recommend installing Python via Anaconda.
Hardware-wise, it is recommended that you store both the database on a reasonably fast SSD disk. Furthermore, it is recommended that your server is configured with at least 8GB of RAM (but more is better).
Clone the repository somewhere:
git clone https://www.github.com/digitalmethodsinitiative/4cat.git
After cloning the repository, copy config.py-example to config.py and edit
the file to match your machine's configuration. The various options are
explained in the file itself:
cd 4cat
cp config.py-example config.py
nano config.py
Next, install the dependencies. While in the 4CAT root folder, run:
pip3 install -r requirements.txt
This should take care of installing the required third-party libraries. Note that
you may be asked to install some other dependencies. For instance, on Windows the
pyahocorasick library needs to Microsoft Visual C++ Build Tools to be installed.
Two other dependencies cannot be installed through requirements.txt:
- The web interface uses the Font Awesome icon library, which is a commercial product
that cannot be included (it has a free version however, though its license also prevents inclusion).
You should get a copy of Font Awesome and unpack
it into
/webtool/static/fontawesome. The web interface is mostly useable without the icons, but may look odd in a few places. - The sigma.js network module requires sigma.js version 1.2.1.
Just extract the
/sigma.js-1.2.1folder from the zip to thewebtool/static/jsdirectory of your 4CAT instance.
Next, you should make sure a database is available for 4CAT. 4CAT requires a
PostgreSQL database to store dataset metadata, the job queue and other assorted
data. You should create the database yourself, and add the database login
details to config.py. After doing so, run the following command to create the
tables, indices, et cetera, required by 4CAT:
psql --user=[username] --dbname=[database name] < backend/database.sql
Replace [username] and [database name] with the relevant values. You may be
prompted for a password.
You can now run 4CAT.
The backend is run as a daemon that can be started and stopped using the
included 4cat-daemon.py script:
python3 4cat-daemon.py start
Other valid arguments are stop, restart and status. Note that if you
change any configuration options, you will need to restart the daemon for the
changes to take effect. For development/testing it may be helpful to run
4cat-daemon.py interactively with the -i switch (i.e., python3 4cat-daemon.py -i start).
This will log output to the terminal as well.
Note: The 4CAT was made to run on a UNIX-like system and the above will not
work on Windows. Instead, running 4cat-daemon.py on Windows will start the
4CAT backend in the terminal window, regardless of the argument given. The
backend can then be quit by entering q, followed by enter.
4CAT logs to 4cat.log in the root folder by default.
The web tool is a Flask app. It is recommended that you run the web tool as a WSGI module: see the Flask documentation for more details. For testing and development, you can run the Flask app locally from the command line. For Mac:
FLASK_APP=webtool flask run
For Windows:
set FLASK_APP=webtool
flask run
With the default configuration, you can now navigate to
http://localhost:5000 where you'll find the web tool that allows you to query
the database and create datasets.
4CAT requires one or more data sources to function. By default, all data sources that
acquire data from third parties are enabled, and these should work out-of-the-box.
Others may be enabled on demand. Notably, the fourchan data source allows
interfacing with a locally stored 4chan archive that can be searched with the
Sphinx full-text search engine.
Data sources may be enabled by adding them to the DATASOURCES configuration
variable in config.py. Refer to the data sources' own README.mds for more
information on how to configure individual data sources. If you add or make changes
to a data source but they don't show up, you can delete the file module_cache.pb
file in the backend folder to start fresh.
While by default the web tool and backend run on the same server, you could set things up so that they run on separate servers instead. Simply only start the backend on one server, and the frontend on the other. If you configure the front end to connect to the database on another server (or vice versa), the backend and front end will be able to communicate.