Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-4520][flink-siddhi] Integrate Siddhi as a light-weight Streaming CEP Library #2486

Closed
wants to merge 88 commits into from

Conversation

haoch
Copy link
Member

@haoch haoch commented Sep 9, 2016

Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.

  • General
    • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
    • The pull request addresses only one issue
    • Each commit in the PR has a meaningful commit message (including the JIRA id)
  • Documentation
    • Documentation has been added for new functionality
    • Old documentation affected by the pull request has been updated
    • JavaDoc for public methods has been added
  • Tests & Build
    • Functionality added by the pull request is covered by tests
    • mvn clean verify has been executed successfully locally or a Travis build has passed

Abstraction

Siddhi CEP is a lightweight and easy-to-use Open Source Complex Event Processing Engine (CEP) released as a Java Library under Apache Software License v2.0. Siddhi CEP processes events which are generated by various event sources, analyses them and notifies appropriate complex events according to the user specified queries.

It would be very helpful for flink users (especially streaming application developer) to provide a library to run Siddhi CEP query directly in Flink streaming application.

Features

  • Integrate Siddhi CEP as an stream operator (i.e. TupleStreamSiddhiOperator), supporting rich CEP features like
    • Filter
    • Join
    • Aggregation
    • Group by
    • Having
    • Window
    • Conditions and Expressions
    • Pattern processing
    • Sequence processing
    • Event Tables
      ...
  • Provide easy-to-use Siddhi CEP API to integrate Flink DataStream API (See SiddhiCEP and SiddhiStream)
    • Register Flink DataStream associating native type information with Siddhi Stream Schema, supporting POJO,Tuple, Primitive Type, etc.
    • Connect with single or multiple Flink DataStreams with Siddhi CEP Execution Plan
    • Return output stream as DataStream with type intelligently inferred from Siddhi Stream Schema
  • Integrate siddhi runtime state management with Flink state (See AbstractSiddhiOperator)
  • Support siddhi plugin management to extend CEP functions. (See SiddhiCEP#registerExtension)

Test Cases

Example

 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 SiddhiCEP cep = SiddhiCEP.getSiddhiEnvironment(env);

 cep.registerExtension("custom:plus",CustomPlusFunctionExtension.class);

 cep.registerStream("inputStream1", input1, "id", "name", "price","timestamp");
 cep.registerStream("inputStream2", input2, "id", "name", "price","timestamp");

 DataStream<Tuple5<Integer,String,Integer,String,Double>> output = cep
    .from("inputStream1").union("inputStream2")
    .sql(
        "from every s1 = inputStream1[id == 2] "
         + " -> s2 = inputStream2[id == 3] "
         + "select s1.id as id_1, s1.name as name_1, s2.id as id_2, s2.name as name_2 , custom:plus(s1.price,s2.price) as price"
         + "insert into outputStream"
    )
    .returns("outputStream");

 env.execute();

@haoch haoch changed the title FLINK-4520 Integrate Siddhi as a light-weight Streaming CEP Library [FLINK-4520][flink-siddhi] Integrate Siddhi as a light-weight Streaming CEP Library Sep 9, 2016
mushketyk and others added 14 commits September 10, 2016 09:24
…t() throws UnknownHostException

- If InetAddress.getLocalHost() throws UnknownHostException when
  attempting to connect with LOCAL_HOST strategy, the code will move on
  to try the other strategies instead of immediately failing.

- Also made minor code style improvements for trying the different strategies.

This closes apache#2383
…mentation

This also includes minor code and test cleanups.

This closes apache#2416
… fail on resharding

This no longer allows the Kinesis consumer to transparently handle resharding.
This is a short-term workaround until we have a min-watermark notification service available in the JobManager.

This closes apache#2414
Implemented Mesos AppMaster including:
- runners for AppMaster and TaskManager
- MesosFlinkResourceManager as a Mesos framework
- ZK persistent storage for Mesos tasks
- reusable scheduler actors for:
  - offer handling using Netflix Fenzo (LaunchCoordinator)
  - reconciliation (ReconciliationCoordinator)
  - task monitoring (TaskMonitor)
  - connection monitoring (ConnectionMonitor)
- lightweight HTTP server to serve artifacts to the Mesos fetcher (ArtifactServer)
- scenario-based logging for:
  - connectivity issues
  - offer handling (receive, process, decline, rescind, accept)
- incorporated FLINK-4152, FLINK-3904, FLINK-4141, FLINK-3675, FLINK-4166
- Fenzo usage fix - always call scheduleOnce after expireAllLeases.
- increased aggressiveness of task scheduler
- factored YarnJobManager and MesosJobManager to share base class
`ContaineredJobManager`
- improved supervision for task actors, unit tests
- support for zombie tasks (i.e. non-strict slave registry)
- improved javadocs
- fix for style violations (e.g. line length)
- completed the SchedulerProxy
- final fields
- improved preconditions
- log lines to use {}
- cleanup ZK state
- serializable messages
twalthr and others added 28 commits September 10, 2016 09:24
The version change didn't cause the Scalastyle errors. Seems like the
only viable solution to prevent random failures of the Scalastyle
plugin is to disable Scalastyle checks for the affected source file.
…ctionInfo to TaskManagerLocation

This adds the ResourceId to the TaskManagerLocation
…nt of 'Instance'.

To allow for a future dynamic slot allocation and release model, the slots should not depend on 'Instance'.
In this change, the Slots hold most of the necessary information directly (location, gateway) and
the interact with the Instance only via a 'SlotOwner' interface.
…nstance' to have more intuitive names

getResourceID() --> getTaskManagerID()
getInstanceConnectionInfo() --> getTaskManagerLocation()
This caused Scalastyle to fail, presumably depending on the locale
used. After a bit of debugging on the Scalastyle plugin I found out that
the number in the error is the byte position.

"Expected identifier, but got Token(COMMA,,,1772,,)"

head -c 1772 flink-mesos/src/test/scala/org/apache/flink/mesos/Utils.scala

pointed to the Unicode character '⇒' which causes Scalastyle to fail in
certain environments.

This closes apache#2466
There is RocksDBAsyncSnapshotTest which tests async snapshots for the
RocksDB state backend. Operators themselves cannot do asynchronous
checkpoints right now.
Yarn reports null or (1, maxVcores) depending on its internal logic. The
test only worked in the past because it summed up the used vcores of the
RM and the TM containers. We have checks in place to ensure the vcores
config value is passed on to the Flink ResourceManager.
…to not return null

Return a DefaultAWSCredentialsProviderChain instead of null when
AWS_CREDENTIALS_PROVIDER config is set to "AUTO"

This closes apache#2470
Adds a NoOpOperator which is unwound in OperatorTranslation.translate.
This will be first used by Gelly as a placeholder to support implicit
operator reuse.

This closes apache#2294
Rename _configuration to originalConfiguration

Remove testing classes from main scope in flink-runtime

Previously, the ForkableFlinkMiniCluster which resided in flink-test-utils required
these files to be in the main scope of flink-runtime. With the removal of the
ForkableFlinkMiniCluster, these classes are now no longer needed and can be moved
back to the test scope.

This closes apache#2450.
Introduce TaskExecutionStateListener for Task

Replace JobManagerGateway in Task by InputSplitProvider and CheckpointNotifier

Replace the TaskManager ActorGateway by TaskManagerConnection in Task

Rename taskmanager.CheckpointNotifier into CheckpointResponder; rename TaskExecutionStateListener.notifyTaskExecutionState into notifyTaskExecutionStateChanged

Remove InputSplitProvider.start; add ClassLoader parameter to InputSplitProvider.getNextInputSplit

Removes the unused class InputSplitIterator.

Update InputSplitProvider JavaDocs

This closes apache#2456.
The Gelly documentation was recently split into multiple pages in
FLINK-4104 but was missing a redirect. This commit updates the Gelly
redirect to point to the old page.

This closes apache#2464
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet