This repository has been archived by the owner on May 12, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 152
APEXMALHAR-2085: Operator supporting the Beam concepts of windowing, watermarks, triggering and accumulation #319
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
76 changes: 76 additions & 0 deletions
76
library/src/main/java/org/apache/apex/malhar/lib/window/Accumulation.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.apex.malhar.lib.window; | ||
|
||
import org.apache.hadoop.classification.InterfaceStability; | ||
|
||
/** | ||
* This interface is for the processing part of the WindowedOperator. | ||
* We can assume that all stateful processing of the WindowedOperator is a form of accumulation. | ||
* | ||
* In most cases, AccumT is the same as OutputT. But in some cases, the accumulated type and the output type may be | ||
* different. For example, if we are doing the AVERAGE of doubles, InputT will be double, and we need the SUM and the | ||
* COUNT stored as type AccumT, and AccumT will be a pair of double and long, in which double is the sum of the inputs, | ||
* and long is the number of inputs. OutputT will be double, because it represents the average of the inputs. | ||
*/ | ||
@InterfaceStability.Evolving | ||
public interface Accumulation<InputT, AccumT, OutputT> | ||
{ | ||
/** | ||
* Returns the default accumulated value when nothing has been accumulated | ||
* | ||
* @return | ||
*/ | ||
AccumT defaultAccumulatedValue(); | ||
|
||
/** | ||
* Accumulates the input to the accumulated value | ||
* | ||
* @param accumulatedValue | ||
* @param input | ||
* @return | ||
*/ | ||
AccumT accumulate(AccumT accumulatedValue, InputT input); | ||
|
||
/** | ||
* Merges two accumulated value into one | ||
* | ||
* @param accumulatedValue1 | ||
* @param accumulatedValue2 | ||
* @return | ||
*/ | ||
AccumT merge(AccumT accumulatedValue1, AccumT accumulatedValue2); | ||
|
||
/** | ||
* Gets the output of the accumulated value. This is used for generating the data for triggers | ||
* | ||
* @param accumulatedValue | ||
* @return | ||
*/ | ||
OutputT getOutput(AccumT accumulatedValue); | ||
|
||
/** | ||
* Gets the retraction of the value. This is used for retracting previous panes in | ||
* ACCUMULATING_AND_RETRACTING accumulation mode | ||
* | ||
* @param value | ||
* @return | ||
*/ | ||
OutputT getRetraction(OutputT value); | ||
} |
42 changes: 42 additions & 0 deletions
42
library/src/main/java/org/apache/apex/malhar/lib/window/ControlTuple.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.apex.malhar.lib.window; | ||
|
||
import org.apache.hadoop.classification.InterfaceStability; | ||
|
||
/** | ||
* Control tuple interface. | ||
* TODO: This should be removed or moved to Apex Core when Apex Core has native support for custom control tuples. | ||
*/ | ||
@InterfaceStability.Evolving | ||
public interface ControlTuple | ||
{ | ||
/** | ||
* Watermark control tuple | ||
*/ | ||
interface Watermark extends ControlTuple | ||
{ | ||
/** | ||
* Gets the timestamp associated with this watermark | ||
* | ||
* @return | ||
*/ | ||
long getTimestamp(); | ||
} | ||
} |
65 changes: 65 additions & 0 deletions
65
library/src/main/java/org/apache/apex/malhar/lib/window/JoinAccumulation.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.apex.malhar.lib.window; | ||
|
||
import org.apache.hadoop.classification.InterfaceStability; | ||
|
||
/** | ||
* This is the interface for accumulation when joining multiple streams. | ||
*/ | ||
@InterfaceStability.Evolving | ||
public interface JoinAccumulation<InputT1, InputT2, InputT3, InputT4, InputT5, AccumT, OutputT> extends Accumulation<InputT1, AccumT, OutputT> | ||
{ | ||
/** | ||
* Accumulate the second input type to the accumulated value | ||
* | ||
* @param accumulatedValue | ||
* @param input | ||
* @return | ||
*/ | ||
AccumT accumulate2(AccumT accumulatedValue, InputT2 input); | ||
|
||
/** | ||
* Accumulate the third input type to the accumulated value | ||
* | ||
* @param accumulatedValue | ||
* @param input | ||
* @return | ||
*/ | ||
AccumT accumulate3(AccumT accumulatedValue, InputT3 input); | ||
|
||
/** | ||
* Accumulate the fourth input type to the accumulated value | ||
* | ||
* @param accumulatedValue | ||
* @param input | ||
* @return | ||
*/ | ||
AccumT accumulate4(AccumT accumulatedValue, InputT4 input); | ||
|
||
/** | ||
* Accumulate the fifth input type to the accumulated value | ||
* | ||
* @param accumulatedValue | ||
* @param input | ||
* @return | ||
*/ | ||
AccumT accumulate5(AccumT accumulatedValue, InputT5 input); | ||
|
||
} |
47 changes: 47 additions & 0 deletions
47
library/src/main/java/org/apache/apex/malhar/lib/window/SessionWindowedStorage.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.apex.malhar.lib.window; | ||
|
||
import java.util.Collection; | ||
import java.util.Map; | ||
|
||
import org.apache.hadoop.classification.InterfaceStability; | ||
|
||
/** | ||
* This interface is for storing data for session windowed streams. | ||
* | ||
* @param <K> The key type | ||
* @param <V> The value type | ||
*/ | ||
@InterfaceStability.Evolving | ||
public interface SessionWindowedStorage<K, V> extends WindowedKeyedStorage<K, V> | ||
{ | ||
/** | ||
* Given the key, the timestamp and the gap, gets the data that falls into timestamp +/- gap. | ||
* This is used for getting the entry the data given the timestamp belongs to, and for determining whether to merge | ||
* session windows. | ||
* This should only return at most two entries if sessions have been merged appropriately. | ||
* | ||
* @param key the key | ||
* @param timestamp the timestamp | ||
* @param gap | ||
* @return | ||
*/ | ||
Collection<Map.Entry<Window.SessionWindow<K>, V>> getSessionEntries(K key, long timestamp, long gap); | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to make the interface stateless? Should it be stateful? My concern is with the interface promoting boxing/unboxing pattern and leading to an excessive GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vrozov Good point. I'm assuming you're asking why not:
with accumulatedValue updated in place. But doing it will make it a lot less flexible because the underlying storage might not support this kind of operation. For example, if the storage supports get(key) and put(key, value) with get(key) returning not a reference to the actual object (possibly as a result of deserialization), then it would not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to delegate to a class that implements Accumulation interface handling of accumulation and change interface to
The implementation class will need to define how it handles accumulation and how AccumT is defined. The implementation may use Collection for AccumT or may use primitive types such as int, long or double to accumulate values.