Upgrading TCAT

Hoylen Sue edited this page May 5, 2016 · 25 revisions

Every now and then TCAT receives important upgrades or bug fixes. We distinguish regular feature upgrades from critical upgrades and important bug fixes. The administration panel will inform you whenever updates are available. It is recommended to keep your installation up-to-date. Please follow these steps when upgrading your own installation.

There are three ways to upgrade TCAT:

  • Automatic updates
  • Update through the administration panel
  • Manual updates

Automatic updates

TCAT features the possibility to automatically update itself in the background. Both your codebase and your database will be upgraded.

Automatic updates can be enabled when installing TCAT using the helper/tcat-install-linux.sh install script. By default it is disabled.

To enable automatic upgrades, you will need to edit your config.php file and modify two variables.

Change the AUTOUPDATE_ENABLED variable from false to true.

Set the AUTOUPDATE_LEVEL to the appropriate value. Legal values are: trivial, substantial and expensive

Since a lot of these update maintain locks on the database, capture may be blocked until the upgrade has finished. By setting the AUTOUPDATE_LEVEL to a lower value (such as trivial, or substantial for medium-sized datasets) you can avoid long lock times and interrupted captures. The downside will be that some upgrades, and therefore new features, may not become automatically available.

Update through the administration panel

TCAT will display a message on the administration panel whenever a new update is available. By clicking on the upgrade link, a one-time background upgrade will be scheduled.

The AUTOUPDATE_LEVEL, as defined in config.php will be honored (see the description in the previous section).

Manual updates

These are the steps to manually update TCAT:

  • Step 1: Disable the cron jobs.
  • Step 2: Kill all running dmi-tcat processes (i.e. controller.php, dmitcat_*).
  • Step 3: Remove all files in the proc directory.
  • Step 4: Pull the latest code.
  • Step 5: Upgrade the database (optional, see Upgrading database tables section for details)
  • Step 6: Re-enable the cron jobs.
  • Step 7: Check TCAT is running properly.

The example commands below assume TCAT was installed using the helpers/tcat-install-linux.sh install script (where TCAT is installed in /var/www/dmi-tcat and owned by the "tcat" user). They may need to be modified if TCAT was installed differently.

Step 1: Disable the cron jobs

Open the cron file that runs the TCAT cron jobs.

If cron was configured by the TCAT installation script, edit the system cron file for TCAT:

$ sudoedit /etc/cron.d/tcat

If cron was configured manually in a particular user's crontab, edit it using crontab -e or sudo crontab -u username -e (if it is not the current user).

Comment out the TCAT jobs by prepending the comment symbol (#) in front of them. The TCAT jobs are "controller.php" and (if the URL expander has been installed) "urlexpand.sh". For example,

# * * * * * tcat (cd "/var/www/dmi-tcat/capture/stream"; php controller.php)
# 0 * * * * tcat (cd "/var/www/dmi-tcat/helpers"; sh urlexpand.sh)

Save the changes.

Step 2: Kill all running dmi-tcat processes

You need to make sure TCAT is not running. Either reboot your server, or kill the processes manually. For manual killing, log in to your server or console, then find all the processes related to TCAT and kill them one-by-one. For example: Log into your console or server, then head to the installation directory of tcat and execute:

$ ps aux | grep php | egrep 'dmitcat|controller'
john      426  0.0  0.3 262188 14848 pts/5    S    16:09   0:00 /usr/bin/php /var/www/dmi-tcat/capture/stream/dmitcat_follow.php
john    32638  0.5  0.3 263568 15876 pts/5    S    16:08   0:07 /usr/bin/php /var/www/dmi-tcat/capture/stream/dmitcat_track.php
john    32655  0.7  0.2 263568 15876 pts/5    S    16:08   0:01 /usr/bin/php /var/www/dmi-tcat/capture/stream/controller.php
$ kill 426
$ kill 32638
$ kill 32655

( the number in the column after john specifies the process ID you will need to kill )

TODO: does this example also detects the URL expander process? Should also change "john" to "tcat".

Step 3: Remove all files in the proc directory

Run the following command:

$ sudo rm -f /var/www/dmi-tcat/proc/*

Step 4: Pull the latest code

Retrieve the latest version of DMI-TCAT.

In the TCAT directory, run "git pull" as the user who owns the TCAT files:

$ cd /var/www/dmi-tcat
$ sudo  su tcat -c "git pull"

Step 5: Upgrade the database

This step may be optional. See the Upgrading database tables section below for details.

Step 6: Re-enable the controller.php cron job

We are ready to bring TCAT back online. Remove the comment symbol (#) you've added in an earlier step so that controller.php will start again automatically.

Step 7: Check TCAT is running properly

Inspect the contents of the controller's log file:

tail -f /var/www/dmi-tcat/logs/controller.log

After a successful upgrade, controller.log should look something like this:

2014-09-24 15:08:00 script track was not running - starting
2014-09-24 15:09:01 script track is running with pid [32638] and has been idle for 2 seconds
2014-09-24 15:10:04 script track is running with pid [32638] and has been idle for 12 seconds
2014-09-24 15:11:02 script track is running with pid [32638] and has been idle for 8 seconds
2014-09-24 15:12:09 script track is running with pid [32638] and has been idle for 18 seconds

If you get warning messages and TCAT is not capturing new tweets, you can file an issue here on Github. Please copy the error messages from controller.log and all recent messages from track.error.log, follow.error.log and/or onepercent.error.log.

Upgrading database tables

New releases of TCAT may involve significant changes in the database architecture. Generally, these apply only to new tables. We do not apply such changes automatically, because if you have huge data sets they might take a very long time to complete. Please read the release announcements for specifics.

Before you run the upgrade script, stop all tracking and controller processes and disable the controller.php in your crontab (as specified above).

The general procedure to keep your database architecture up-to-date is very simple. Log into your console or server, then head to the installation directory of TCAT and execute:

$ cd common/
$ screen
$ php upgrade.php

The screen command ensures the upgrade process will not be interrupted when your ssh session is terminated (use screen -r to restore your screen in that event).

The upgrade.php script is interactive. Trivial upgrades are executed automatically, but for any steps that may take a long time to complete, a warning message is issued and you are asked a yes/no/all question whether to execute it. You will be presented a short message with the explanation of the upgrade and its complexity: substantial, or expensive.

A crude estimate is that non-expensive updates on a reasonably fast machine and on a bin with less than 100.000 tweets should finish within several minutes. Expensive updates for bins under 30.000 tweets should take a similar amount of time.

If you are executing expensive updates on a very big database (with millions of tweets), be prepared for it to take multiple hours (or even days!) at least. During this update your capture will remain interrupted.

The upgrade.php script can be safely run multiple times. It will only consider bins and tables which need updates (and will thus not redo any completed operations).

Command-line options for upgrade.php

If you are confident you know what your are doing, you can instruct upgrade.php to execute certain steps, or all steps automatically. This allows you to run upgrade.php from cron and keep your database up-to-date without user interaction. Remember: executing the script while a track process on a bin is running is strongly discouraged, as a deadlock situation is likely to occur.

That said, these are the additional command-line options to upgrade.php

--non-interactive        ( run tcat without asking the user any questions )
--au0                    ( auto-upgrade everything upto level trivial; this is the default )
--au1                    ( auto-upgrade everything upto level substantial )
--au2                    ( auto-upgrade everything upto level expensive )
binname                  ( restrict updates to this bin, ignoring others; specify a single bin only )

Handling upgrades to very big datasets

When you must upgrade a large bin with minimal downtime, you could follow this procedure:

  • stop TCAT
  • run common/upgrade.php once and execute all non-expensive upgrades
  • start TCAT again
  • temporarily disable your big bin via the administration panel
  • run: php common/upgrade.php bigbinname (this time without stopping TCAT!)
  • after the script has finished, re-enable the big bin via the administration panel

Repeating the last three steps bin by bin allows you to upgrade a server with large datasets.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.