Usage examples for Divolte collector
Scala Python Groovy Java
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
avro-schema Merge pull request #1 from divolte/dev/dsl-mapping Dec 15, 2014
hdfs-hive Fix READMEs to reflect update to DSL based mapping. Dec 16, 2014
pyspark
spark
tcp-kafka-consumer Type fix. Dec 21, 2014
.gitignore
LICENSE
README.md
javadoc-commons-lang-divolte.tar.gz

README.md

Divolte Collector usage examples

Contained here are some common usage examples of Divolte Collector. Here you will find:

  • avro-schema/: A custom Avro schema and Divolte schema mapping for capturing clickstream events on Javadoc pages (this is used in the other examples).
  • hdfs-hive/: A howto for using Divolte Collector data in Hive/Impala on Hadoop.
  • tcp-kafka-consumer/: A Kafka consumer example that sends events from Divolte's Kafka topic to a TCP socket.
  • Some examples of processing Divolte Collector data using spark:
    • pyspark/: Python API, standalone and using IPython notebook.
    • spark/: Scala API, standalone and Spark Streaming.

Before You Begin

Prerequisites:

  • You need a HTTP server that can server static files locally.
    Serving static files over HTTP is easy if you have Python installed; it is as simple as running python -m SimpleHTTPServer in a directory with static files. (On a Mac, you could install use Homebrew to install http-server.)
  • You must have Java 8 installed. *

For specific examples we assume some familiarity with the tools used, such as Apache Kafka, Hadoop or Apache Spark. We don't go into the details of installing and configuring these. If you do want to try the Hadoop examples, but don't know how to setup Hadoop locally, we recommend using the Quickstart VM from Cloudera which contains Cloudera's CDH distribution.

Javadoc Click Stream Data

All the examples above are based on a Divolte Collector setup that collects click stream data for Javadoc pages. Since all generated Javadoc pages use the same URL layout, it should work with the Javadocs for any project of your choice. To generate Divolte Collector enabled Javadoc for your project, you can use the following command from the source directory of your project:

% javadoc -d YOUR_OUTPUT_DIRECTORY \
    -bottom '<script src="//localhost:8290/divolte.js" defer async></script>' \
    -subpackages .

Note that if you have a special source encoding you should add -encoding "YOUR_ENCODING" (e.g. -encoding "ISO-8859-1") to the command.

For convenience, the javadoc-commons-lang-divolte.tar.gz archive contains a pre-built set of Javadoc for the Apache Commons Lang project.

HAVE FUN!


* Divolte Collector itself needs Java 8 as do some of the examples. Any Java libraries we ship for use in third party applications are compatible with Java 7 and above.