Permalink
Browse files

Doc updates for 0.6.0

  • Loading branch information...
1 parent abc6a77 commit cb3c58aa414f5c5946b53cb6ee4d46ac77d281fc @matthieumorel matthieumorel committed Feb 25, 2013
View
@@ -12,9 +12,12 @@ Entry pages are written with haml and the documentation is written with markdown
The generated static website is in `output/`
-# To upload the site to apache
-## first, commit the generated website to svn
+There are also a number of dependencies on other gem, error messages are explicit about which ones and how to install them.
+
+We also use pygments for code syntax highlighting. It's a python program, see [here](http://pygments.org/docs/installation/) for installing.
+
+# To upload the site to apache, commit the generated website to svn (site/ directory)
cp -R output/* $S4_SVN_LOC/site
cd $S4_SVN_LOC
@@ -23,7 +26,4 @@ The generated static website is in `output/`
svn add <whatever is missing>
svn commit --username <apache username> -m "commit message"
-## then checkout into web server
- ssh people.apache.org
- cd /www/incubator.apache.org/content/s4
- svn checkout http://svn.apache.org/repos/asf/incubator/s4/site .
+With svnpubsub, the website is automatically updated
View
@@ -22,23 +22,18 @@ compile '/images/*/' do
# do nothing
end
-compile '/doc/*' do
- if item.binary?
- # don’t filter binary items
- else
+compile '*' do
+ if item[:extension] == "haml"
+ filter :haml
+ layout 'default'
+ end
+ if item[:extension] == "md"
filter :kramdown
filter :colorize_syntax,
:default_colorizer => :pygmentize,
:pygmentize => { :linenos => 'inline', :options => { :startinline => 'True' } }
layout 'default'
end
-end
-
-compile '*' do
- if item[:extension] == "haml"
- filter :haml
- layout 'default'
- end
filter :relativize_paths, :type => :html
end
View
@@ -41,4 +41,4 @@ data_sources:
layouts_root: /
google_analytics_account_id: UA-19490961-1
-google_analytics_domain: .s4.io
+google_analytics_domain: incubator.apache.org
@@ -2,6 +2,8 @@
title: Configuration
---
+> How to configure S4 clusters and applications
+
# Toolset
S4 provides a set of tools to:
@@ -13,15 +15,8 @@ S4 provides a set of tools to:
* start a Zookeeper server for easy testing: `s4 zkServer`
* `s4 zkServer -t` will start a Zookeeper server and automatically configure 2 clusters
* view the status of S4 clusters coordinated by a given Zookeeper ensemble: `s4 status`
-
-
- ./s4
-
-will give you a list of available commands.
-
- ./s4 <command> -help
-
-will provide detailed documentation for each of these commands.
+* `s4` will give you a list of available commands.
+* `./s4 <command> -help` will provide detailed documentation for each of these commands.
# Cluster configuration
@@ -38,23 +33,24 @@ Before starting S4 nodes, you must define a logical cluster by specifying:
The cluster configuration is maintained in Zookeeper, and can be set using S4 tools:
./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000
+
See tool documentation by typing:
./s4 newCluster -help
# Node configuration
-*Platform* *code and* *application* *code are fully configurable,* *at deployment time{*}*.*
+**Platform code and application code are fully configurable, at deployment time.**
-S4 nodes start as simple *bootstrap* processes whose initial role is merely to connect the cluster manager:
+S4 nodes start as simple **bootstrap** processes whose initial role is merely to connect the cluster manager:
-* the bootstrap code connects to the cluster manager
-* when an application is available on the cluster, the node gets notified
-* it downloads the platform configuration and code, as specified in the configuration of the deployed application.
-* the communication and core components are loaded, bound and initialized
-* the application configuration and code, as specified in the configuration of the deployed applciation, is downloaded
-* the application is initialized and started
+1. the bootstrap code connects to the cluster manager
+1. when an application is available on the cluster, the node gets notified
+1. it downloads the platform configuration and code, as specified in the configuration of the deployed application.
+1. the communication and core components are loaded, bound and initialized
+1. the application configuration and code, as specified in the configuration of the deployed applciation, is downloaded
+1. the application is initialized and started
This figure illustrates the separation between the bootstrap code, the S4 platform code, and application code in an S4 node:
@@ -73,9 +69,9 @@ Example:
# Application configuration
-Deploying applications is easier when we can define both the parameters of the application *and* the target environment.
+Deploying applications is easier when we can define both the parameters of the application **and** the target environment.
-In S4, we achieve this by specifying *both* application parameters and S4 platform parameters in the deployment phase :
+In S4, we achieve this by specifying **both** application parameters and S4 platform parameters in the deployment phase :
* which application class to use
* where to fetch application code
@@ -87,7 +83,7 @@ In S4, we achieve this by specifying *both* application parameters and S4 platfo
## Modules configuration
-S4 follows a modular design and uses[Guice](http://code.google.com/p/google-guice/) for defining modules and injecting dependencies.
+S4 follows a modular design and uses [Guice](http://code.google.com/p/google-guice/) for defining modules and injecting dependencies.
As illustrated above, an S4 node is composed of:
* a base module that specifies how to connect to the cluster manager and how to download code
@@ -105,30 +101,36 @@ For the core module, there is no default parameters.
We provide default modules, but you may directly specify others through the command line, and it is also possible to override them with new modules and even specify new ones (custom modules classes must provide an empty no-args constructor).
-Custom overriding modules can be specified when deploying the application, through the`deploy` command, through the _emc_ or _modulesClasses_ option.
+Custom overriding modules can be specified when deploying the application, through the`deploy` command, through the `emc` or `modulesClasses` option.
For instance, in order to enable file system based checkpointing, pass the corresponding checkpointing module class :
./s4 deploy -s4r=uri/to/app.s4r -c=cluster1 -appName=myApp \
-emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule
-You can also write your own custom modules. In that case, just package them into a jar file, and specify how to fetch that file when deploying the application, with the _mu_ or _modulesURIs_ option.
+You can also write your own custom modules. In that case, just package them into a jar file, and specify how to fetch that file when deploying the application, with the `mu` or `modulesURIs` option.
For instance, if you checkpoint through a specific key value store, you can write you own checkpointing implementation and module, package that into fancyKeyValueStoreCheckpointingModule.jar , and then:
- ./s4 node -c=cluster1 -emc=my.project.FancyKeyValueStoreBackendCheckpointingModule \
+ ./s4 deploy -c=cluster1 -emc=my.project.FancyKeyValueStoreBackendCheckpointingModule \
-mu=uri/to/fancyKeyValueStoreCheckpointingModule.jar
### overriding parameters
A simple way to pass parameters to your application code is by:
-* injecting them in the application class:
+* injecting them in the application class (primitive types, enums and class literals are automatically converted), for instance:
+
+~~~
+#!java
+
+@Inject
+@Named("thePortNumber")
+int port
+
+~~~
- @Inject
- @Named('myParam')
- param
-* specifying the parameter value at node startup (using -p inline with the node command, or with the '@' syntax)
+* specifying the parameter value at node startup (using `-p` inline with the node command, or with the '`@`' syntax)
S4 uses an internal Guice module that automatically injects configuration parameters passed through the deploy command to matching `@Named` parameters.
@@ -141,7 +143,7 @@ Both application and platform parameters can be overriden. For instance, specify
## File-based configuration
Instead of specifying node parameters inline, you may refer to a file with the '@' notation:
-./s4 deploy @/path/to/config/file
+`./s4 deploy @/path/to/config/file`
With contents of the referenced file like:
-s4r=uri/to/app.s4r
@@ -2,10 +2,10 @@
title: Development tips
---
-Here are a few tips to ease the development of S4 applications.
+> Here are a few tips to ease the development of S4 applications.
-### Import an S4 project into your IDE
+# Import an S4 project into your IDE
You can run `gradlew eclipse` or `gradlew idea` at the root of your S4 application directory. Then simply import the project into eclipse or intellij. You'll have both your application classes _and_ S4 libraries imported to the classpath of the project.
@@ -18,28 +18,31 @@ In order to get the transitive dependencies of the platform included as well, yo
./gradlew install -DskipTests
* Then run `gradlew eclipse` or `gradlew idea`
+----
+# Start a local Zookeeper instance
-### Start a local Zookeeper instance
-
-* Use the default test configuration (2 clusters with following configs: `c=testCluster1:flp=12000:nbTasks=1` and `c=testCluster2:flp=13000:nbTasks=1`)
+* Use the default test configuration (2 clusters with following configs: `-c=testCluster1:flp=12000:nbTasks=1` and `-c=testCluster2:flp=13000:nbTasks=1`)
s4 zkServer -t
* Start a Zookeeper instance with your custom configuration, e.g. with 1 partition:
s4 zkServer -clusters=c=testCluster1:flp=12000:nbTasks=1
+----
-### Load an application in a new node directly from an IDE
+# Load an application in a new node directly from an IDE
This allows to *skip the packaging phase!*
-A requirement is that you have both the application classes and the S4 classes in your classpath. See above.
+Requirements:
+
+* application classes **and** S4 classes are in your classpath. See above.
+* application already configured in cluster (with the `-appClass` option, no need to package the app)
-Then you just need to run the `org.apache.s4.core.Main` class and pass:
+Then just run the `org.apache.s4.core.S4Node` class and pass:
* the cluster name: `-c=testCluster1`
-* the app class name: `-appClass=myAppClass`
-If you use a local Zookeeper instance, there is no need to specify the `-zk` option.
+If you use a local Zookeeper instance on the default port (2181), there is no need to specify the `-zk` option.
@@ -2,6 +2,8 @@
title: Event dispatch
---
+> Exploring how events are dispatched to, from and within S4 nodes
+
Events are dispatched according to their key.
The key is identified in an `Event` through a `KeyFinder`.
@@ -97,10 +99,10 @@ S4 follows a staged event driven architecture and uses a pipeline of executors t
An executor is an object that executes tasks. It usually keeps a bounded queue of task items and schedules their execution through a pool of threads.
When processing queues are full, executors may adopt various possible behaviours, in particular, in S4:
- * **blocking**: the current thread simply waits until the queue is not full
- * **shedding**: the current event is dropped
-**Throttling**, i.e. placing an upper bound on the maximum processing rate, is a convenient way to avoid sending too many messages too fast.
+* **blocking**: the current thread simply waits until the queue is not full
+* **shedding**: the current event is dropped
+* **throttling**, i.e. placing an upper bound on the processing rate, is a convenient way to avoid sending too many messages too fast.
S4 provides various default implementations of these behaviours and you can also define your own custom executors as appropriate.
@@ -116,7 +118,7 @@ The following picture illustrates the pipeline of executors.
1. the message is passed to a deserializer executor
* this executor is loaded with the application, and therefore has access to application classes, so that application specific messages can be deserialized
* by default it uses 1 thread and **blocks** if the processing queue is full
-1. the event (deserialized message) is dispatched to a stream executor
+1. the event (the deserialized message) is dispatched to a stream executor
* the stream executor is selected according to the stream information contained in the event
* by default it **blocks** if the processing queue is full
1. the event is processed in the PE instance that matches the key of the event
@@ -12,19 +12,33 @@ S4 (Simple Scalable Streaming System) is a general-purpose, distributed, scalabl
* You may start with an [overview](overview) of the platform
* Then follow a [walkthrough](walkthrough) for an hands-on introduction
+* Complement with a look at a [topic trending](twitter_trending_example) application using Twitter data
* And [here](dev_tips) are some tips to ease the development process
## Configuration
-* How to [customize the platform and pass configuration parameters](configuration)
-* How to [add application dependencies](application_dependencies)
-* How to [dispatch events ](event_dispatch) within an application and between applications
+* [Customize the platform and pass configuration parameters](configuration)
+* Add [application dependencies](application_dependencies)
+* [Dispatch events ](event_dispatch) within an application and between applications
+
+## Running S4
+* [Commands](tools) for creating, running and managing applications
+* [Monitor](metrics) the system
+
## Features
* Details about [fault tolerance](fault_tolerance)
## Troubleshooting
+* [Recommended practices](recommended_practices)
* Try the [FAQ](https://cwiki.apache.org/confluence/display/S4/FAQ)
-* Try the [mailing lists](https://cwiki.apache.org/S4/s4-apache-mailing-lists.html)
+* Try the [mailing lists](https://cwiki.apache.org/S4/s4-apache-mailing-lists.html)
+
+## Resources
+* Questions can be asked through the [mailing lists](https://cwiki.apache.org/confluence/display/S4/S4+Apache+mailing+lists)
+* The source code is available through [git](https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git), [here](http://incubator.apache.org/s4/contrib/) are instructions for fetching the code.
+* A nice set of [slides](http://www.slideshare.net/leoneu/20111104-s4-overview) was used for a presentation at Stanford in November 2011.
+* The driving ideas are detailed in a [conference publication](http://www.4lunas.org/pub/2010-s4.pdf) from KDCloud'11
+* You can also watch the [video](http://vimeo.com/20489778) of a presentation given at LinkedIn.
@@ -0,0 +1,41 @@
+---
+title: Metrics
+---
+
+
+> S4 continuously collects runtime statistics. Let's see how to access these and add custom ones.
+
+# Why?
+
+S4 aims at processing large quantities of events with low latency. In order to achieve this goal, a key requirement is to be able to monitor system internals at runtime.
+
+# How?
+For that purpose, we include a system for gathering statistics about various parts of the S4 platform.
+
+We rely on the [metrics](http://metrics.codahale.com) library, which offers an efficient way to gather such information and relies on statistical techniques to minimize memory consumption.
+
+# What?
+
+By default, S4 instruments queues, caches, checkpointing, event reception and emission and statistics are available for all of these components.
+
+You can also monitor your own PEs. Simply add new probes (`Meter`, `Gauge`, etc..) and report interesting updates to them. There is nothing else to do, these custom metrics will be reported along with the S4 metrics, as explained next.
+
+# Where?
+
+By default, metrics are exposed by each node through JMX.
+
+The `s4.metrics.config` parameter enables periodic dumps of aggregated statistics to the **console** or to **files** in csv format. This parameter is specified as an application parameter, and must match the following regular expression:
+
+ (csv:.+|console):(\d+):(DAYS|HOURS|MICROSECONDS|MILLISECONDS|MINUTES|NANOSECONDS|SECONDS)
+
+Examples:
+
+ # dump metrics to csv files to /path/to/directory every 10 seconds
+ csv:file://path/to/directory:10:SECONDS
+
+ # dump metrics to the console every minute
+ console:1:MINUTES
+
+
+
+Reporting to Ganglia or Graphite is not provided out of the box with S4, but it's quite easy to add. You simply have to add the corresponding dependencies to your project and enable reporting to these systems during the initialization of your application. See the [metrics](http://metrics.codahale.com) documentation for more information.
Oops, something went wrong.

0 comments on commit cb3c58a

Please sign in to comment.