Skip to content
Permalink
Browse files
[FLAGON-379] WIP updates
  • Loading branch information
poorejc committed Apr 21, 2019
1 parent a868775 commit 028a3213a2a684497f2502858fd96e19e78b71b9
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 51 deletions.
@@ -13,7 +13,7 @@ Before you begin, you'll need [NPM and Node.js](https://nodejs.org/), [Docker](h

[Apache UserALE.js](https://github.com/apache/incubator-flagon-useralejs) is the Apache Flagon's thin-client behavioral logging solution. Below, you'll find short-hand instructions for getting started with UserALE.js. For complete instructions, see our [README](https://github.com/apache/incubator-flagon-useralejs/blob/master/README.md).

First, download the [release](http://flagon.incubator.apache.org/releases/) or clone our [repository on GitHub](https://github.com/apache/incubator-flagon-useralejs/tree/master). Apache UserALE.js is also available as an [NPM package](https://www.npmjs.com/package/useralejs).
First, download the [release](http://flagon.incubator.apache.org/releases/) or clone our [repo on GitHub](https://github.com/apache/incubator-flagon-useralejs/tree/master). Apache UserALE.js is also available as an [NPM package](https://www.npmjs.com/package/useralejs).

Next, **install Dependencies**.
```shell
@@ -44,7 +44,7 @@ You can now start generating behavioral log data from your page, or through your

[Apache Flagon](https://github.com/apache/incubator-flagon) utilizes an Elastic stack for transforming, indexing, and storing log data. With Elastic, you'll not only have the ability to search and query log dat, but you'll also be able to monitor it and visualize it through Kibana.

To build our single-node Elastic instance, **first clone our [Docker repo](https://github.com/apache/incubator-flagon/tree/master/docker)**.
To build our single-node Elastic instance, **first clone our [Docker repo](https://github.com/apache/incubator-flagon/tree/master/docker)**. Note that for production-level deployments, you should probably check out our [Kubernetes build](https://github.com/apache/incubator-flagon/tree/master/kubernetes) and our [guide for scaling]({{ '/docs/stack/scaling' | prepend: site.baseurl }})).

Then, **start up a virtual machine**.
```shell
@@ -67,6 +67,4 @@ Before starting Kibana, **generate some logs**. Move your mouse around, click ar

Finally, **navigate to localhost:5601 (Kibaba), set an index pattern, and load our visualizations and dashboards** to see your logs. Find simple instructions in our [README](https://github.com/apache/incubator-flagon/tree/master/docker).

Note that single-node container isn't meant for persistent use, but is built to scale. See our [Kubernetes build](https://github.com/apache/incubator-flagon/tree/master/kubernetes), or configure your own cluster using this container to suite your needs.

Subscribe to our [dev list](dev-subscribe@flagon.incubator.apache.org) and join the conversation!
@@ -1,72 +1,60 @@
---
title: Getting Started
title: Scaling Considerations
component: stack
permalink: /docs/stack/
permalink: /docs/stack/scaling
priority: 0
---

The Apache Flagon project provides a streamlined deployment solution for including behavioral logging capabilities in your project and for monitoring and analyzing your log data in a containerized Elastic backend. Our Docker container includes an Elastic backend, pre-configured, interactive Kibana dashboards. The container also includes prototype applications for exploration, Apache Distill and Apache Tap.
### Scaling Apache Flagon: An Introduction and First Principles

Before you begin, you'll need [NPM and Node.js](https://nodejs.org/), [Docker](https://www.docker.com/) and [Docker Compose](https://docs.docker.com/compose/install/) installed before you start.
**"It Depends..."**

### Building and Deploying UserALE.js
The best way to scale [Apache Flagon](https://github.com/apache/incubator-flagon) depends entirely on your use-case: how you'll use your [Apache UserALE.js]({{ '/docs/useralejs' | prepend: site.baseurl }}) data and which [UserALE.js data streams]({{ '/docs/useralejs/dataschema' | prepend: site.baseurl }}) you'll use. This page provides a few Apache Flagon-specific considerations to think about as you decide how best to scale, and some useful methods for benchmarking to help you make that determination.

[Apache UserALE.js](https://github.com/apache/incubator-flagon-useralejs) is the Apache Flagon's thin-client behavioral logging solution. Below, you'll find short-hand instructions for getting started with UserALE.js. For complete instructions, see our [README](https://github.com/apache/incubator-flagon-useralejs/blob/master/README.md).
**The Apache Flagon Single-Node Elastic Container is an Ingredient, Not a Whole Solution**

First, download the [release](http://flagon.incubator.apache.org/releases/) or clone our [repository on GitHub](https://github.com/apache/incubator-flagon-useralejs/tree/master). Apache UserALE.js is also available as an [NPM package](https://www.npmjs.com/package/useralejs).
The single-node Elastic (ELK) stack in the Docker container distributed by [Apache Flagon]({{ '/docs/stack' | prepend: site.baseurl }}) is not alone suitable for most production-level use-cases. It may be suitable on its own, for "grab-and-go" user-testing use cases, for example. This would entail just a few days of data collection from a specific application, from just a few users. But, alone it will fail quickly for most persistent data collection uses and enterprise-scale use-cases. Rather, this container is meant to be a building block for larger solutions:

Next, **install Dependencies**.
```shell
#intall NPM packages into build directory
$ npm install
```
1. Our ELK .yml config files for our Docker container can be used as the building-blocks for your very own [multi-node cluster](https://dzone.com/articles/elasticsearch-tutorial-creating-an-elasticsearch-c) with load-balancing capabilities.
1. You can use our [Kubernetes build](https://github.com/apache/incubator-flagon/tree/master/kubernetes), which relies on our Docker assets, to scale your Apache Flagon stack to meet your needs.
1. You can use our single-node container to scale out in [AWS Elastic Beanstalk (EBS)](https://aws.amazon.com/elasticbeanstalk/).

Then, **build UserALE.js**.
```shell
#produce UserALE.js build artifacts
$ npm run build
```
**Apache Flagon Data Also Scales**

The build process produced a minified version of UserALE.js and a Web Extension package, giving you two options depending on your needs.
It's important to note that the burden of scale isn't placed wholly on your Apache Flagon stack. Flagon's behavioral logging capability, [Apache UserALE.js]({{ '/docs/useralejs' | prepend: site.baseurl }}) also scales. This means that one of the most cost-efficient ways to manage the resources behind your Apache Flagon stack, is to [configure]({{ '/docs/useralejs/API' | prepend: site.baseurl }}) or [modify]({{ '/docs/useralejs/modifying' | prepend: site.baseurl }}) UserALE.js to meet your use-case.
### Sizing Up an Elastic Stack

**Option 1: Include Apache UserALE.js in your project:**
When thinking about how to scale your Apache Flagon Elastic stack, it's important to note that Elasticsearch isn't a database, its a datastore, which stores documents. Elastic is built on top of [Lucene](http://lucene.apache.org/). That means that Apache Flagon "logs" aren't logs once they're indexed in Elastic, they become searchable documents. That's a huge strength (and why we chose Elastic), but it also means that assumptions about resource consumptions based purely on records and fields is misleading in Elastic. Elastic has many useful [guides]((https://www.elastic.co/blog/found-sizing-elasticsearch)) on sizing and scaling. Below, we're adding a few of our own thoughts based on Apache Flagon's own eccentricities.

```markdown
#include userale in your project via script tag
<script src="/path/to/userale-1.0.0.min.js" data-url="http://yourLoggingUrl"></script>
```
Apache UserALE.js allows for configuration via HTML 5 data parameters. For a complete list of options, see the [docs]({{ '/docs/useralejs' | prepend: site.baseurl }}) or the [README](https://github.com/apache/incubator-flagon-useralejs/blob/master/README.md). You can also modify Apache UserALE.js using our API. Find examples in our [repos](https://github.com/apache/incubator-flagon-useralejs/tree/FLAGON-192).
* Given that documents are the atomic unit of storage in Elastic, *document generation rate* is the most important consideration to scaling. Default [Apache UserALE.js parameters]({{ '/docs/useralejs' | prepend: site.baseurl }}) produce a lot of [data]({{ '/docs/useralejs/dataschema' | prepend: site.baseurl }}), even from single users in Apache UserALE.js. In fact, we used say that "drinking from the fire-hose" didn't quite do our data-rate justice--we used to say that opening up UserALE.js was more like "drinking from the flame-thrower". We strongly suggest that you think about your needs and consider whether you need data from all our event-handlers. If you don't need mouseover events, for example, you can dramatically reduce the rate at which you generate data and the resources you'll need. Note that with our [interval logs]({{ '/docs/useralejs/dataschema' | prepend: site.baseurl }}), you can still get metrics for gross user-behavior without high-velocity data streams (e.g., mouseover, scrolls). Alternatively, you can use the [UserALE.js API]({{ '/docs/useralejs/API' | prepend: site.baseurl }}) and even [configurable HTLM5 parameters in our script tag]({{ '/docs/useralejs' | prepend: site.baseurl }}) to modify the frequency with which UserALE.js logs these kinds of user events.

**Option 2: Follow the [instructions](https://github.com/apache/incubator-flagon-useralejs/tree/FLAGON-192/src/UserALEWebExtension) to install the Apache UserALE.js web extension into your browser.**
![alt text][logBreakdown]

You can now start generating behavioral log data from your page, or through your browser. To view these logs, you can either utilize our [example logging server](https://github.com/apache/incubator-flagon-useralejs/tree/master/example) and log to file, or you can log directly to our Elastic backend. For complete instructions, see the [README](https://github.com/apache/incubator-flagon/tree/master/docker)
### Building and Using the Elastic Backend
* Your Elastic resource needs will also grow with *document length*, especially the length of [strings](https://blog.appdynamics.com/product/estimating-costs-of-storing-documents-in-elasticsearch/) within your logs. One of the discriminating features of Apache UserALE.js is its precision--its ability to capture both the target of user behaviors, and the entire DOM path of that target--and rich meta data. Apache UserALE.js fields like `path` and `pageUrl` can get quite long for certain kinds of web sites and applications. It's worth considering, for example, using our [API]({{ '/docs/useralejs/API' | prepend: site.baseurl }}) to abstract fields like `path` with labels, if you need precision, or if you can get by with only knowing users' `target` elements. Similarly, you might consider relying on `pageTitle` rather than `pageUrl` if they sufficiently differentiate pages and if you're dealing with very deep website trees. Through simple [modifications to UserALE.js source]({{ '/docs/useralejs/modifying' | prepend: site.baseurl }}), you can alias verbose fields in your logs to reduce resource consumption

[Apache Flagon](https://github.com/apache/incubator-flagon) utilizes an Elastic stack for transforming, indexing, and storing log data. With Elastic, you'll not only have the ability to search and query log dat, but you'll also be able to monitor it and visualize it through Kibana.
![alt text][verboseLogs]

To build our single-node Elastic instance, **first clone our [Docker repo](https://github.com/apache/incubator-flagon/tree/master/docker)**.
* Apache Flagon is built to scale as a platform, allowing you to connect additional services to your platform that consume user behavioral data. When considering scale, the *number of services* you connect to your Apache Flagon platform will affect your Elastic stacks' performance. Any production-level deployment will require, at minimum a simple three-node [Elastic cluster](https://dzone.com/articles/elasticsearch-tutorial-creating-an-elasticsearch-c) (with one load-balancing node). As you scale and configure that cluster, be mindful that in addition to performing indexing functions, Elasticsearch is also servicing queries and aggregations of that data. Analytical services connected to Apache Flagon, especially if they both read and write back to index, can consume significant resources and increase indexing and search time. This can be problematic for real-time analytical and monitoring applications (including Kibana). This can also result in data loss if logs are flushed from pipelines prior to being indexed. For hefty analytical services, it may be worth dedicating specific nodes in your cluster to service them.

Then, **start up a virtual machine**.
```shell
# start virtual machine and requisite network
$ docker-machine create --virtualbox-memory 3072 --virtualbox-cpu-count 2 senssoft
$ docker-machine ssh senssoft sudo sysctl -w vm.max_map_count=262144
$ docker network create esnet
```
Next, **start Elastic services**.
For the reasons above, its really critical to do some benchmarking for your use-case prior to deciding on a scaling strategy. Below, you'll find some guidance for how to do this with Apache Flagon and Elastic tools. We also provide some simple benchmarks and underlying assumptions. But again, each webpage or application is a bit different, so it's important create some of your own benchmarks for comparison with ours.
### Benchmarking Tools and Methods for Sizing your Apache Flagon Stack

```shell
#start Elastic services
$ docker-compose up -d elasticsearch
$ docker-compuse up -d logstash
$ docker-compose up -d kibana
```
**Configure UserALE.js** to send logs to localhost:8100. This is easy: either modify the script tag for port 8100 or open up the "options" tab of the web extension and enter localhost:8100 as your logging end-point.
[Elastic discussion boards](https://discuss.elastic.co/t/10-billion-records-writen-to-es-ervery-day-how-many-nodes-and-hardware-should-need/45307/4)

Before starting Kibana, **generate some logs**. Move your mouse around, click around, etc. Do this for a couple minutes to populate the index.
[Apache UserALE.js](https://github.com/apache/incubator-flagon-useralejs) is the Apache Flagon's thin-client behavioral logging solution. Below, you'll find short-hand instructions for getting started with UserALE.js. For complete instructions, see our [README](https://github.com/apache/incubator-flagon-useralejs/blob/master/README.md).

Finally, **navigate to localhost:5601 (Kibaba), set an index pattern, and load our visualizations and dashboards** to see your logs. Find simple instructions in our [README](https://github.com/apache/incubator-flagon/tree/master/docker).
[Elastic's Stats API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html)
[Overview of Elastic's APIs](https://www.datadoghq.com/blog/collect-elasticsearch-metrics/#index-stats-api)

**Cluster and Index Size**
```shell
#Index Stats using Elastic's GET _stats API
$ curl localhost:9200/index_name/_stats?pretty=true
#Use me for Apache Flagon index configs
$ curl localhost:9200/userale/_stats?pretty=true
```
[http://localhost:9200/userale/_stats?pretty=true](http://localhost:9200/userale/_stats?pretty=true)

Note that single-node container isn't meant for persistent use, but is built to scale. See our [Kubernetes build](https://github.com/apache/incubator-flagon/tree/master/kubernetes), or configure your own cluster using this container to suite your needs.

Subscribe to our [dev list](dev-subscribe@flagon.incubator.apache.org) and join the conversation!

0 comments on commit 028a321

Please sign in to comment.