Skip to content
ManuelThurner edited this page Nov 17, 2014 · 4 revisions

The Site Monitor is a crawler which looks for errors on the Climate CoLab website. It's a Java program, which can be run from the commandline.

It is located in the repository under other/site-monitor. The command to run it is cd java -cp "target/dependency/*:target/classes" org.xcolab.utils.sitemonitor.SiteMonitor. It is currently configured to run on the cognosis server every day, using a cronjob.

Configuration

The Site Monitor can be configured using the file other/site-monitor/src/main/resources/siteMonitor-config.xml. The content of the configuration file are described below. Keep in mind that you have to recompile the crawler after you change the configuration

Checkers

The checkers configured in this section will be used to verify if a crawled page is considered erroneous. There are three types of Checkers at the moment: RegexMatchingChecker: Checks if a given regular expression is matched on the page. If yes, it considers it as an error. ResponseStatusChecker: Checks for a specific HTTP response code (e.g. 500). InverseCheckResultChecker: Inverses the result of another checker.

Typical checker configuration:

<checkers>
	<checker>
		<name>errorNotificationChecker</name>
		<class>org.xcolab.utils.sitemonitor.checkers.RegexMatchingChecker</class>
		<configuration>(alert\-error|Internal\s*error)</configuration>
		<message>Error notification found on a page</message>
	</checker>
	<checker>
		<name>noErrorNotificationChecker</name>
		<class>org.xcolab.utils.sitemonitor.checkers.InverseCheckResultChecker</class>
		<configuration>errorNotificationChecker</configuration>
		<message>Error notification found on a page</message>
	</checker>
	<checker>
		<name>unavailablePortletChecker</name>
		<class>org.xcolab.utils.sitemonitor.checkers.RegexMatchingChecker</class>
		<configuration>.*is\s*temporarily\s*unavailable.*</configuration>
		<message>Unavailable portlet found</message>
	</checker>
	<checker>
		<name>noUnavailablePortletsChecker</name>
		<class>org.xcolab.utils.sitemonitor.checkers.InverseCheckResultChecker</class>
		<configuration>unavailablePortletChecker</configuration>
		<message>Unavailable portlet found on a page</message>
	</checker>
</checkers>

A checker has to be mapped to be applied to specific URL patterns. /** would match any URL starting with the same domain. If no checker mapping is present, the checker will not be applied.

<checkerMappings>
	<checkerMapping checker="noUnavailablePortletsChecker">
		<url>/**</url>
	</checkerMapping>
	<checkerMapping checker="noErrorNotificationChecker">
		<url>/**</url>
	</checkerMapping>
</checkerMappings>

Crawler Config

In this section you configure the start URL to crawl, to which recursion depth the links should be followed, and which links should be followed (/** means any link which starts with the same domain).

<crawlerConfig>
	<crawlConfig>
		<!-- traverse the whole web application -->
		<url>http://climatecolab.org/</url>
		<recursionLevel>4</recursionLevel>
		<linkPatterns>
			<linkPattern>/**</linkPattern>
		</linkPatterns>
	</crawlConfig>
</crawlerConfig>

E-Mail Notification

Set who receives the error notification email here, and which SMTP server is used to send the e-mail. Be sure to test the SMTP credentials, if they are wrong the monitor cannot notify anyone.

<emailNotification>
	<to>thurner@mit.edu</to>
	<to>pdeboer@mit.edu</to>
	<from>support@climatecolab.com</from>
	<smtphost>smtp.gmail.com</smtphost>
	<smtpport>465</smtpport>
	<smtpusetsl>true</smtpusetsl>
	<smtpuser>***</smtpuser>
	<smtppassword>***</smtppassword>
	<subject>Site Monitor has detected errors in climatecolab.org</subject>
</emailNotification>

Cronjob

There is a script in scripts/ which starts the site monitor and echoes the output in a log file. You can use it with cron to make it run periodically, e.g. once a night: crontab -e

0 0 * * * /home/manuel/XCoLab/scripts/run-site-monitor.sh

Compiling

You can compile the monitor after you changed the source code or the configuration in the following way:

mvn compile package
mvn dependency:copy-dependencies
```