[FLINK-4491] Handle index.number_of_shards in the ES connector #2790

ddolzan · 2016-11-11T16:01:58Z

Implemented the Index Template and Index Mapping creation.
Number of shards and many other properties can be defined in the Index Template.

Usage

Before calling ElasticasearchSink instantiate ElasticSearchHelper

ElasticSearchHelper esHelper = new ElasticSearchHelper(config, transports);
//Create an Index Template given a name and the json structure
esHelper.initTemplate(templateName, templateRequest);
//Create an Index Mapping given the Index Name, DocType and the json structure
esHelper.initIndexMapping(indexName, docType, mappingsRequest);

TemplateRequest example

{
  "template": "te*",
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "type1": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z YYYY"
        }
      }
    }
  }
}

MappingRequest example

{
  "mappings": {
    "user": {
      "_all": {
        "enabled": false
      },
      "properties": {
        "title": {
          "type": "string"
        },
        "name": {
          "type": "string"
        },
        "age": {
          "type": "integer"
        }
      }
    }
  }
}

zentol · 2016-11-14T13:59:44Z

...in/java/org/apache/flink/streaming/connectors/elasticsearch2/helper/ElasticSearchHelper.java

+import java.util.List;
+import java.util.Map;
+
+public class ElasticSearchHelper {


This class needs a javadoc that explains what this it can be used for.

fpompermaier · 2016-11-17T10:33:45Z

Any other feedback on this?could it be merged?

fpompermaier · 2016-11-17T22:57:14Z

...in/java/org/apache/flink/streaming/connectors/elasticsearch2/helper/ElasticSearchHelper.java

+				for (String indexName:mappingRequest.indices()){
+					// If the index does not exist, create it
+					client.admin().indices().prepareCreate(indexName)
+							.setSettings(Settings.builder().put("index.number_of_shards", DEFAULT_INDEX_SHARDS)


Use elasticsearch constant for "index.number_of_shards" => IndexMetaData.SETTING_NUMBER_OF_SHARDS

fpompermaier · 2016-11-17T22:57:57Z

...in/java/org/apache/flink/streaming/connectors/elasticsearch2/helper/ElasticSearchHelper.java

+					// If the index does not exist, create it
+					client.admin().indices().prepareCreate(indexName)
+							.setSettings(Settings.builder().put("index.number_of_shards", DEFAULT_INDEX_SHARDS)
+							.put("index.number_of_replicas", DEFAULT_INDEX_REPLICAS)).execute().actionGet();


Use elasticsearch constant for "index.number_of_replicas" => IndexMetaData. SETTING_NUMBER_OF_REPLICAS

fpompermaier · 2016-11-17T23:03:13Z

.../test/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSinkITCase.java

+		// This instructs the sink to emit after every element, otherwise they
+		// would be buffered
+		config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
+		config.put("cluster.name", "my-transport-client-cluster");


Try to see if "cluster.name" can be replaced by ClusterName.CLUSTER_NAME_SETTING

fhueske · 2016-11-22T15:14:22Z

Thanks for the pull request @ddolzan and sorry for the delayed review.

To be honest, I am not sure if we should include the ElasticSearchHelper in Flink. I see that might be helpful for some users, but it does not contain any Flink related code and is in principle a light-weight Elasticsearch client.

fpompermaier · 2016-11-22T15:38:06Z

From my point of view ElasticSearchHelper is very useful if you plan to use the Flink Elasticsearch Sink in a real production use case, because it reduce a lot the complexity of interfacing with ES.
We didn't use ES java client before and it took about 2 days to write that helper...I think it's not that obvious to rewrite that logic from the scratch.
Ina production environment you really needs to easy customize indexing templates and mappings IMHO

rmetzger · 2016-11-23T15:48:56Z

I agree with Fabian here. The ESHelper does not use any Flink code at all, so the relation to Flink is not clear. A user of Hadoop would equally benefit from such a utility. I would expect that ES provides such tools to Java developers. Maybe there is a repository somewhere offering ES tools / abstractions?
Apache Kafka for example also provides a set of tools to query the status of consumers, etc. independent of the framework consuming the data.

rmetzger · 2016-11-23T15:40:05Z

...h2/src/main/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSink.java

-			} else {
-				throw new RuntimeException("An error occured in ElasticsearchSink.");
-			}
+			LOG.error("Some documents failed while indexing to Elasticsearch: " + failureThrowable.get());


I would suggest to add a debug log statement as well logging the full stack trace.
Also, in the other connectors we have a flag that allows the user to control whether an error should be logged or fail the connector. I would suggest to add this here as well.

Do you mean to substitute the current line 263 with a LOG.DEBUG + stacktrace or leave the current line or to add another LOG.debug line where we log only the stacktrace? Currently the detail of the error is logged within afterBulk(). Should we modify that part?

How do you suggest to control the error behaviour ? Is it ok comething like

public static final String CONFIG_KEY_BULK_ERROR_POLICY = "bulk.error.policy";

that a user can set to "strict" or "lenient" in the userConfig object?

fpompermaier · 2016-11-23T16:15:16Z

Fine for us about the ES helper. Since it took some time to implement that functionality and there was a JIRA ticket for that we thought that it was a good idea to share that code with other users.
I think we can still share it on out flink-example githib repository (https://github.com/okkam-it/flink-examples) for anyone that has similar needs.

ddolzan · 2016-12-01T14:08:08Z

Index template and index mapping creation/configuration will be kept outside of flink.
An example on how to do it can be found on https://github.com/okkam-it/flink-examples.

fhueske · 2016-12-01T14:12:43Z

Thank you @ddolzan

[FLINK-4491] Added index template and mappings creation

bae4237

zentol reviewed Nov 14, 2016

View reviewed changes

[FLINK-4491] Added javadoc to ElasticSearchHelper

d31c363

fpompermaier reviewed Nov 17, 2016

View reviewed changes

ddolzan added 3 commits November 18, 2016 10:13

[FLINK-4491] Added constants

aed4a5b

Fixed job failure on bad documents (letting other threads to complete)

474febb

Reverted some changes to check indexing problem

515d79f

rmetzger reviewed Nov 23, 2016

View reviewed changes

ddolzan closed this Dec 1, 2016

rmetzger added the component=Connectors/Common label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-4491] Handle index.number_of_shards in the ES connector #2790

[FLINK-4491] Handle index.number_of_shards in the ES connector #2790

ddolzan commented Nov 11, 2016

zentol Nov 14, 2016

fpompermaier commented Nov 17, 2016

fpompermaier Nov 17, 2016

fpompermaier Nov 17, 2016

fpompermaier Nov 17, 2016

fhueske commented Nov 22, 2016

fpompermaier commented Nov 22, 2016

rmetzger commented Nov 23, 2016

rmetzger Nov 23, 2016

fpompermaier Nov 23, 2016

fpompermaier commented Nov 23, 2016 •

edited

ddolzan commented Dec 1, 2016

fhueske commented Dec 1, 2016

[FLINK-4491] Handle index.number_of_shards in the ES connector #2790

[FLINK-4491] Handle index.number_of_shards in the ES connector #2790

Conversation

ddolzan commented Nov 11, 2016

Usage

TemplateRequest example

MappingRequest example

zentol Nov 14, 2016

Choose a reason for hiding this comment

fpompermaier commented Nov 17, 2016

fpompermaier Nov 17, 2016

Choose a reason for hiding this comment

fpompermaier Nov 17, 2016

Choose a reason for hiding this comment

fpompermaier Nov 17, 2016

Choose a reason for hiding this comment

fhueske commented Nov 22, 2016

fpompermaier commented Nov 22, 2016

rmetzger commented Nov 23, 2016

rmetzger Nov 23, 2016

Choose a reason for hiding this comment

fpompermaier Nov 23, 2016

Choose a reason for hiding this comment

fpompermaier commented Nov 23, 2016 • edited

ddolzan commented Dec 1, 2016

fhueske commented Dec 1, 2016

fpompermaier commented Nov 23, 2016 •

edited