Skip to content

Installing 4CAT

Dale Wahl edited this page Oct 20, 2021 · 89 revisions

Install and run 4CAT

You can install 4CAT on your local machine or a server. This can be useful if you want to capture data from various online platforms, or analyse data you've captured previously. This page describes how you can install 4CAT and run it.

Please note that at this time, scraping 4chan, 8chan, and 8kun require Sphinx which is not yet working in the Docker setup. If you need those sources, you will need to install 4CAT directly and then follow these instructions to set up Sphinx. For some data sources, you will additionally need to provide keys for the API from which the data is retrieved.

Datasource Docker install Manual + Sphinx Need API keys
Bitchute yes yes no
Reddit yes yes no
Telegram yes yes yes
Twitter APIv2 yes yes yes
Tumblr yes yes yes + add to config.py
4chan no yes no
8chan no yes no
8kun no yes no

Install 4CAT via Docker

The recommended method is to use Docker.

  1. Install Docker Desktop, and start it. Note that on Windows, you may need to ensure that WSL (Windows Subsystem for Linux) integration is enabled in Docker. You can find this in the Docker setting in Settings -> Resources-> WSL Integration -> Enable integration with required distros.
  2. Clone the 4CAT repository, or download the most recent release and unzip it somewhere.
  3. In a terminal/command prompt, navigate to the folder in which you just installed 4CAT (the folder that contains the docker-compose.yml file)
  4. Optionally, if you know what you are doing, you can copy config.py-example to config.py and edit 4CAT's configuration before building it. The default configuration will be sufficient in most cases, but you might want to configure e.g. the MAIL_ variables to allow 4CAT to send e-mails via a local SMTP server or a service like SendGrid.
  5. Run the command docker-compose up
  6. If this is the first time you're starting the Docker container, it will take a while for all components to be built. Keep an eye on the output: the login data for the 4CAT interface will be displayed here.
  7. Once this is done, you can access the 4CAT interface via http://localhost:80.

Note: if your computer/server is already using some of the same ports that Docker wishes to use, you can modify the .env file in the home directory and change the ports that Docker uses. Any modifications to configuration files will require you to rebuild the docker images with docker-compose up --build.

Occasionally, on the first build, there is a bug where the frontend fails to start because the backend takes longer than normal to install. On successful installation, you should see three containers running in the Docker UI or running docker ps. If the frontend is missing, start that container again via the UI or docker start 4cat_frontend.

Install 4CAT via Docker onto a server

With Docker, 4CAT is set to host itself on localhost:80 by default. This can be modified in the .env file located in the main directory. If you would like to make 4CAT externally accessible, it is necessary to change SERVER_NAME to your domain or IP address. This will add it to the whitelists in config.py-example file. You could also redirect to 4CAT via another service such as Apache or Nginx.

# Modify SERVER_NAME and/or PUBLIC_PORT to make 4CAT available externally

SERVER_NAME = 4cat.example.com # You could also use your server's IP address
PUBLIC_PORT = 80

# This example would allow you to navigate to http://4cat.example.com (or http://4cat.example.com:80) and access your version of 4CAT

Install 4CAT manually

If you cannot or don't want to use Docker, you can run 4CAT directly from the code rather than via Docker. This requires more set-up and the manual installation of various dependencies, but can be useful if you want to develop data sources or processors for 4CAT.

Requirements

It is recommended that you run 4CAT on a UNIX-like system (e.g. Linux or MacOS). It will also run under Windows, but the instructions below are written with a UNIX-like in mind. 4CAT further requires Python 3.8 and PostgreSQL 9.5. Lower versions of either may work, but are not officially supported.

Installation

Clone the repository somewhere:

git clone https://www.github.com/digitalmethodsinitiative/4cat.git

After cloning the repository, copy config.py-example to config.py and edit the file to match your machine's configuration. The various options are explained in the file itself:

cd 4cat
cp config.py-example config.py
nano config.py

Next, install the dependencies. On Linux systems that use apt, the following should suffice:

apt install python3-pip libpq-dev python3-dev postgresql-server-dev-all unzip postgresql-client

Adapt these to your own package manager (e.g. yum or brew) as necessary. From here on, we are working within Python, so it is recommended you create a virtual environment to install Python packages and run 4CAT in. There are several ways to set up a virtual environment, the link earlier in this paragraph lists the best practices.

Within your virtual environment, while in the 4CAT root folder, install the required Python packages:

pip3 install -r requirements.txt

Some of the dependencies may have their own dependencies. For instance, on Windows the pyahocorasick library needs to Microsoft Visual C++ Build Tools to be installed. If you encounter similar issues, please file an issue!

Next, you should make sure a database is available for 4CAT. 4CAT requires a PostgreSQL database to store dataset metadata, the job queue and other assorted data. You should create the database yourself, and add the database login details to config.py. After doing so, run the following command to create the tables, indices, et cetera, required by 4CAT:

psql --user=[username] --dbname=[database name] < backend/database.sql

Replace [username] and [database name] with the relevant values. You may be prompted for a password.

Finally, to make sure everything is in working order, run the following command and follow the instructions:

python3 helper-scripts/migrate.py

You can now run 4CAT!

Running the backend

The backend is run as a daemon that can be started and stopped using the included 4cat-daemon.py script:

python3 4cat-daemon.py start

Other valid arguments are stop, restart and status. Note that if you change any configuration options, you will need to restart the daemon for the changes to take effect. For development/testing it may be helpful to run 4cat-daemon.py interactively with the -i switch (i.e., python3 4cat-daemon.py -i start). This will log output to the terminal as well.

Note: The 4CAT daemon was made to run on a UNIX-like system and the above will not work on Windows. On Windows, the 4CAT daemon will always run interactively, and can be quit by entering 'q' and pressing Enter.

4CAT logs to 4cat.log in the root folder by default.

Running the web tool

The web tool is a Flask app. It is recommended that you run the web tool as a WSGI module: see the Flask documentation for more details. For testing and development, you can run the Flask app locally from the command line. For Mac:

FLASK_APP=webtool flask run

For Windows:

set FLASK_APP=webtool
flask run

With the default configuration, you can now navigate to http://localhost:5000 where you'll find the web tool that allows you to query the database and create datasets.

Clone this wiki locally