Skip to content
rsimon edited this page Feb 8, 2013 · 7 revisions

Project Layout

monitrix is implemented based on the Play! framework (Java). The Play version at the time of writing is 2.0.4. As it's primary backend, monitrix uses the MongoDB NoSQL database. So far, monitrix has been tested with version 2.0.4 of MongoDB (no typo - identical to the Play version!).

Folder Structure

The project layout follows standard Play! conventions, with the following top-level folders:

  • app contains the actual application source code
  • conf contains application configuration, route definitions and Jasper .jrxml templates for the downloadable reports
  • project holds the build file + properties
  • public contains static Web resources (images, CSS, javascript files)
  • test contains unit test classes + resources

Source Code Layout

According to Play! conventions, the project has two top-level packages named controllers and views, which contain the controller implementation classes and and view templates, respectively. controllers also contains a sub-package mapping which holds helper classes that wrap different model objects into a form that can be automatically translated to a convenient JSON representation by the Play! framework.

The core application logic is located in a third package named uk.bl.monitrix. This package has the following sub-packages:

  • model
    This package contains interface (or abstract base class) definitions for the core datamodel concepts used by monitrix (cf. Technical Overview).

    • CrawlLog and CrawlLogEntry represent the crawl log and an individual log line, respectively
    • KnownHostList and KnownHost represent the list of known hosts and an individual host, respectively
    • CrawlStats and CrawlStatsUnit represent the crawl stats collection, and an individual base-resolution data point
    • AlertLog and Alert represent the alert log collection and an individual alert
    • VirusLog and VirusRecord represent the virus log and a record of occurences of an individual virus
  • heritrix
    This package contains classes specific to reading and ingesting Heritrix log files.

    • The class LogFileEntry is an implementation of CrawlLogEntry, based on a line read from a log file.
    • SimpleLogFileReader is a class that exposes a Heritrix log file through an Iterator over LogFileEntrys.
    • IncrementalLogFileReader is a log file reader that implements incremental batch loading on a log file that is being concurrently written by Heritrix.
    • Classes for ingesting data into montrix are contained in the sub-package ingest.
      • IngestWatcher provides a "frontend" API to the ingest system (with methods to start and stop the watching process, query status, and add logs for watching).
      • IngestActor handles the actual watch- and ingest-process in the background.
      • IngestStatus and IngestControlMessage are simple classes used for communication with the ingest system.
  • database
    This package contains the DBConnector class (a minimal, generic database read interface), the DBIngestConnector class (a generic database write interface) and one subpackage mongodb, which holds the implementation classes for the MongoDB storage backend. This package has the following contents:

    • MongoProperties holds the string constants (collection and field names) used for MongoDB
    • MongoDBConnector implements the DBConnector interface for MongoDB
    • Package model contains implementation classes for monitrix' core datamodel concepts (as contained in uk.bl.monitrix.model).
    • Package ingest contains extensions to the core datamodel implementations which provide write access. These classes also contain the core ingest processing logic!
  • analytics
    This package contains data structures and processing functions for computing various aggregate stats from the raw data held in the backend.

    • The class CrawlStatsAnalytics contains helpers to compute/resample timeseries from the data held in the Crawl Stats collection.

    • The class LogAnalytics contains helpers to compute various stats and property distributions from series of log entries.

    • PieChartValue and TimeseriesValue are data structures used to represent computation results in the analytics classes.

  • export
    This package contains the classes that implement rendering of printable reports, using on Jasper Reports. Note that actual report templates (extension .jrxml) are located in the /conf folder.

Finally, there are two additonal classes in the top-level uk.bl.monitrix package: Global which is an implementation of the Play! Global object, and NumberFormat which provides helpers for formatting numbers and dates in the view templates.