[FLINK-3836] Add LongHistogram accumulator #1966

mbode · 2016-05-06T15:13:20Z

New accumulator LongHistogram; the Histogram accumulator now throws an IllegalArgumentException instead of letting the int overflow.

General
- The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
- The pull request addresses only one issue
- Each commit in the PR has a meaningful commit message (including the JIRA id)
Documentation
- Documentation has been added for new functionality
- Old documentation affected by the pull request has been updated
- JavaDoc for public methods has been added
Tests & Build
- Functionality added by the pull request is covered by tests
- mvn clean verify has been executed successfully locally or a Travis build has passed

StephanEwen · 2016-05-19T09:30:01Z

.../src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/MockRuntimeContext.java

@@ -148,6 +149,11 @@ public Histogram getHistogram(String name) {
 	}

 	@Override
+	public LongHistogram getLongHistogram(String name) {


I think we should try and not have utility methods for each accumulator type in the runtime context - it becomes a lot otherwise. The methods for getHistogram() etc are also marked as public evolving, because they may possibly be removed in the future.

StephanEwen · 2016-05-19T09:36:21Z

I like the addition. Two things, however, that I am not sure about:

The Histogram uses a LongHistogram internally, which results in a lot of wrapping and converting. Each accumulator report (each heartbeat) needs to do the conversion. I think the Histogram should rather hold the proper (int, int) map directly.
I am skeptical about failing hard in the Histogram on an integer overflow. These kind of hard failures in utility types are always tricky. A program causing an overflow will result in a non-recoverable failure (it will always overflow again). For streaming programs, that is not a nice property. I would actually rather try to deprecate the int histogram (let it overflow) and encourage to replace it over time completely with the long histogram.

StephanEwen · 2016-05-19T09:37:22Z

Please let me know what you think about these suggestions!

mbode · 2016-05-19T09:46:19Z

Hi Stephan, sounds good.

I wanted to avoid too much duplication, but I see your point.
Ok, throwing a new exception breaks the API. So should I just mark Histogram as deprecated? I guess the proper way would be to make Histogram generic, enabling users to instantiate Histogram<Long>. Again, this breaks the API, so we would have to wait for the next major release – what is the process for cases like that?

StephanEwen · 2016-05-19T17:28:02Z

I think for now, we should add a @Deprecated annotation to the Histogram and its related method on RuntimeContext and mention in the comment that we encourage the LongHistogram instead.

On a major release, we should go through all deprecated parts and remove them.

This closes apache#1970.

Graph algorithms for annotating vertex degree for undirected graphs vertex out-, in-, and out- and in-degree for directed graphs edge source, target, and source and target degree for undirected graphs

…nslators The TranslateFunction interface is similar to MapFunction but may be called multiple times before serialization. This closes apache#1968

This cloes apache#1976

* ubuntu trusty->xenial * jdk 7u51 -> 8u91 * flink 0.10.1 -> 1.0.2 This closes apache#1969

- replaced CharSets with StandardCharsets - added checkElementIndex to Flink Preconditions - replaced Guava Preconditions with Flink Preconditions - removed single usages Ints.max() and Joiner() This closes apache#1938

This closes apache#1974

…rch2 sink This closes apache#1971

…me/Timestamp This closes apache#1959

This closes apache#1952

…ache This closes apache#1965

…r local execution This closes apache#1945

Depending on the context, the ExecutionConfig's type fields may either be deserialized using a custom class loader or the default class loader. It may be explicitly serialized for the Task or shipped inside the PojoSerializer where it is serialized or directly passed in local mode. An ExecutionConfig may be reused and thus its fields can't be set to null after it has been shipped once. The entire ExecutionConfig is now serialized upon setting it on the JobGraph. It is not passed through the JobGraph's constructor but set explicitly on the JobGraph. If no ExecutionConfig has been set, the default is used. Unlike before, no code may modify the ExecutionConfig after it has been set on the JobGraph. This closes apache#1913

Addition to bbd02d2. The java.lang.Date type shouldn't be an automatically Kryo registered anymore.

… section This closes apache#1991

The local clustering coefficient measures the connectedness of each vertex's neighborhood. Scores range from 0.0 (no edges between neighbors) to 1.0 (neighborhood is a clique). This closes apache#1896

This closes apache#1998

This closes apache#1995

The Jaccard Index measures the similarity between vertex neighborhoods. Scores range from 0.0 (no shared neighbors) to 1.0 (all neighbors are shared). This closes apache#1980

…e code - Metric groups are generally thread-safe - Metric groups are closable. Closed groups do not register metric objects and more. - TaskManager's JobMetricsGroup auto disposes when all TaskMetricGroups are closed - Maven project with metric reporters renamed to 'flink-metric-reporters' - Various code style cleanups

- introduce a unique container id independent of the Hadoop version - improve printing of exceptions during registration - minor improvements to the Yarn ResourceManager code This closes apache#2013

After 38698c0, there are now two executions defined for the Surefire plugin: unit-tests and integration-tests. In addition, there is an implicit default execution called default-test. This leads to the unit tests to be executed twice. This renames unit-tests to default-test to prevent duplicate execution. This closes apache#2019

…ggregates and grouping sets. This closes apache#2014

…features.

This closes apache#2015

This closes apache#2023

…nsupported SQL features. This closes apache#2018

Until FLINK-3960 is fixed, we need to disable this test to allow other tests to execute properly. This closes apache#2022

This closes apache#2026

We should use java.util.concurrent.ConcurrentHashMap because Netty's ConcurrentHashMap is not available for Hadoop 1. Also, Netty's ConcurrentHashMap is merely a copy of Java's to support Java versions prior 1.5.

- Add unit tests for Aggretates. This closes apache#2024

This closes apache#1956

This closes apache#2035

- Fix FLINK-3696 (type issues of DataSetUnion by forwarding expected types to input operators). This closes apache#2025

…ingRecordDeserializer

Only tested behavior on int overflow.

Recommend LongHistogram instead.

mbode · 2016-05-27T14:07:26Z

Sorry guys, botched the PR :/

greghogan · 2016-05-27T14:29:07Z

I was interested to see what happened here and a simple rebase and force push corrects the problem.

Make sure local master is up-to-date
$ git checkout master
$ git pull apache

Fetch this PR and checkout the branch
$ git fetch github pull/1966/head:pr1966
$ git checkout pr1966

Move the new commits after the last commit on master
$ git rebase master

Push the changes to your repo
$ git push -f pr1966 origin

[Flink-3836] Add LongHistogram accumulator

f457319

mbode changed the title ~~[Flink-3836] Add LongHistogram accumulator~~ [FLINK-3836] Add LongHistogram accumulator May 8, 2016

mbode closed this May 11, 2016

mbode reopened this May 11, 2016

StephanEwen reviewed May 19, 2016
View reviewed changes

uce and others added 21 commits May 27, 2016 15:47

[docs] Add note about S3AFileSystem 'buffer.dir' property

7d6dfdf

[FLINK-3881] [docs] Java 8 Documetation Sample Correction

dce10b6

This closes apache#1970.

[docs] Adjust network buffer config for slots and add tl;dr

ad3a70d

[FLINK-3772] [gelly] Graph algorithms for vertex and edge degree

d3f11a1

Graph algorithms for annotating vertex degree for undirected graphs vertex out-, in-, and out- and in-degree for directed graphs edge source, target, and source and target degree for undirected graphs

[FLINK-3877] [gelly] Create TranslateFunction interface for Graph tra…

35e61a5

…nslators The TranslateFunction interface is similar to MapFunction but may be called multiple times before serialization. This closes apache#1968

[FLINK-3880] remove mutex for user accumulators hash map

76a1628

This cloes apache#1976

[FLINK-3155] Update Flink Dockerfile

2444cd6

* ubuntu trusty->xenial * jdk 7u51 -> 8u91 * flink 0.10.1 -> 1.0.2 This closes apache#1969

[hotfix] [tableAPI] Fix SQL queries on TableSources.

a974e32

[FLINK-3842] [tableApi] Fix handling null record/row in generated code

f2fa73f

This closes apache#1974

[FLINK-3882] [docs] Fix errors in sample Java code for the Elasticsea…

43272f5

…rch2 sink This closes apache#1971

[FLINK-3856] [core] [api-extending] Create types for java.sql.Date/Ti…

b1cff6b

…me/Timestamp This closes apache#1959

[FLINK-3855] Upgrade and unify to Jackson 2.7.4

d62e8ec

This closes apache#1952

[FLINK-3878] Fix support multiple identical temp directories in FileC…

fd7ba44

…ache This closes apache#1965

[FLINK-3776] Flink Scala shell does not allow to set configuration fo…

4e14213

…r local execution This closes apache#1945

[FLINK-3856] adapt test assertion to type stack changes

689317e

Addition to bbd02d2. The java.lang.Date type shouldn't be an automatically Kryo registered anymore.

[FLINK-3912] [docs] Fix errors in Batch Scala API Documentation, Join…

60f00ef

… section This closes apache#1991

[FLINK-3768] [gelly] Local Clustering Coefficient

ea34260

The local clustering coefficient measures the connectedness of each vertex's neighborhood. Scores range from 0.0 (no edges between neighbors) to 1.0 (neighborhood is a clique). This closes apache#1896

[FLINK-3488] [tests] Fix flakey test Kafka08ITCase.testBigRecordJob

2056e3f

This closes apache#1998

[FLINK-3782] [tests] Properly close streams in CollectionInputFormatTest

bfd2c02

This closes apache#1995

greghogan and others added 26 commits May 27, 2016 15:47

[FLINK-3780] [gelly] Jaccard Index

9ec80da

The Jaccard Index measures the similarity between vertex neighborhoods. Scores range from 0.0 (no shared neighbors) to 1.0 (all neighbors are shared). This closes apache#1980

[FLINK-1502] [core] Add basic metric system

804ea5a

[FLINK-3927][yarn] make container id consistent across Hadoop versions

d2df640

- introduce a unique container id independent of the Hadoop version - improve printing of exceptions during registration - minor improvements to the Yarn ResourceManager code This closes apache#2013

[FLINK-3939] [tableAPI] Prevent translation of unsupported distinct a…

ccb91f2

…ggregates and grouping sets. This closes apache#2014

[hotfix] [tableAPI] Throw helpful exception for unsupported ORDER BY …

ef667cf

…features.

[hotfix] [tableAPI] Throw helpful exception for unsupported outer joins.

0f0869c

[FLINK-3632] [tableAPI] Clean up TableAPI exceptions.

e39f5dc

This closes apache#2015

[docs] Fix outdated default value for akka.ask.timeout

7650ba4

[hotfix] [tableAPI] Moved tests to correct package.

79166ea

[FLINK-3955] [tableAPI] Rename Table.toSink() to Table.writeToSink().

9d0fd5b

This closes apache#2023

[FLINK-3728] [tableAPI] Improve error message and documentation for u…

14033d8

…nsupported SQL features. This closes apache#2018

[FLINK-3960] ignore EventTimeWindowCheckpointingITCase for now

8572ece

Until FLINK-3960 is fixed, we need to disable this test to allow other tests to execute properly. This closes apache#2022

[FLINK-3963] Removed shaded import

3a6c0c8

This closes apache#2026

[FLINK-3963] AbstractReporter uses wrong ConcurrentHashMap

ffb369e

We should use java.util.concurrent.ConcurrentHashMap because Netty's ConcurrentHashMap is not available for Hadoop 1. Also, Netty's ConcurrentHashMap is merely a copy of Java's to support Java versions prior 1.5.

[FLINK-3586] Fix potential overflow of Long AVG aggregation.

1cc39f8

- Add unit tests for Aggretates. This closes apache#2024

[FLINK-2044] [gelly] Implementation of HITS algorithm.

749f0cf

This closes apache#1956

[FLINK-3936] [tableAPI] Add MIN/MAX aggregation for Boolean.

639eb74

This closes apache#2035

[FLINK-3941] [tableAPI] Add support for UNION to Table API.

1307f95

- Fix FLINK-3696 (type issues of DataSetUnion by forwarding expected types to input operators). This closes apache#2025

[hotfix] Remove leftover config key constant from ExecutionConfig

ef135c3

[hotfix] Fix access to temp file directories in SpillingAdaptiveSpann…

030b4f8

…ingRecordDeserializer

[FLINK-3962] [core] Properly initialize I/O Metric Group

18015ca

Revert to Histogram with internal <int, int> map

d5ff226

[FLINK-3836] Remove HistogramTest.

84fa0db

Only tested behavior on int overflow.

[FLINK-3836] Deprecate Histogram and getHistogram.

a717f71

Recommend LongHistogram instead.

mbode closed this May 27, 2016

rmetzger added the component=<none> label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-3836] Add LongHistogram accumulator #1966

[FLINK-3836] Add LongHistogram accumulator #1966

mbode commented May 6, 2016

StephanEwen May 19, 2016

StephanEwen commented May 19, 2016

StephanEwen commented May 19, 2016

mbode commented May 19, 2016 •

edited

StephanEwen commented May 19, 2016

mbode commented May 27, 2016

greghogan commented May 27, 2016

[FLINK-3836] Add LongHistogram accumulator #1966

[FLINK-3836] Add LongHistogram accumulator #1966

Conversation

mbode commented May 6, 2016

StephanEwen May 19, 2016

Choose a reason for hiding this comment

StephanEwen commented May 19, 2016

StephanEwen commented May 19, 2016

mbode commented May 19, 2016 • edited

StephanEwen commented May 19, 2016

mbode commented May 27, 2016

greghogan commented May 27, 2016

mbode commented May 19, 2016 •

edited