Skip to content

Commit

Permalink
馃帀 New Destination: Elasticsearch (#7005)
Browse files Browse the repository at this point in the history
* feat: adds destination-elasticsearch

* feat: adds destination-elasticsearch es server container

* refactor: header configuration

* update: only call createIndex when preparing the writes

* update: reuse container

* fix: make index names valid and use namespace

* refactor: use bulk process and buffered consumer

* refactor: fix bulk process and buffered consumer

* chore: update documentation

* update: remove ssl reference

* fix: bulk indexing

adds test logging config to inspect http wire
begins work for overrwriting existing records

* docs: update for authentication

* refactor: simplify config

* refactor: cleanup indices, implement auth

* update: cleanup equals/toString in Elasticsearch ConnectionConfiguration

* chore: use conventions and remove unused code

* update: close underlying rest connection

* update: enable `supportsNormalization`

* refactor: better encapsulate index naming

* update: allow upserting

* update: use oneOf for auth method

* refactor: use encapsulated auth object

* chore: pretty

* update: simplify auth header creation

* chore: remove unused class

* update: use boolean as field type

* adds: elasticsearch example server

* fix: api secret test
  • Loading branch information
jdbranham authored and lmossman committed Nov 3, 2021
1 parent fe35eb1 commit eb8c6f5
Show file tree
Hide file tree
Showing 27 changed files with 1,649 additions and 3 deletions.
5 changes: 2 additions & 3 deletions .gitignore
Expand Up @@ -60,6 +60,5 @@ resources/examples/airflow/logs/*

# Cloud Demo
!airbyte-webapp/src/packages/cloud/data

# Sphinx Docs
_build
/bin/
/**/bin/
3 changes: 3 additions & 0 deletions .vscode/settings.json
@@ -0,0 +1,3 @@
{
"java.configuration.updateBuildConfiguration": "automatic"
}
@@ -0,0 +1,7 @@
{
"destinationDefinitionId": "68f351a7-2745-4bef-ad7f-996b8e51bb8c",
"name": "Elasticsearch",
"dockerRepository": "airbyte/destination-elasticsearch",
"dockerImageTag": "0.1.0",
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/elasticsearch"
}
Expand Up @@ -23,6 +23,11 @@
dockerRepository: airbyte/destination-dynamodb
dockerImageTag: 0.1.0
documentationUrl: https://docs.airbyte.io/integrations/destinations/dynamodb
- destinationDefinitionId: 68f351a7-2745-4bef-ad7f-996b8e51bb8c
name: Elasticsearch
dockerRepository: airbyte/destination-elasticsearch
dockerImageTag: 0.1.0
documentationUrl: https://docs.airbyte.io/integrations/destinations/elasticsearch
- name: Google Cloud Storage (GCS)
destinationDefinitionId: ca8f6566-e555-4b40-943a-545bf123117a
dockerRepository: airbyte/destination-gcs
Expand Down
1 change: 1 addition & 0 deletions airbyte-integrations/builds.md
Expand Up @@ -100,6 +100,7 @@
| Azure Blob Storage | [![destination-azure-blob-storage](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-azure-blob-storage%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-azure-blob-storage) |
| BigQuery | [![destination-bigquery](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-bigquery%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-bigquery) |
| Databricks | (Temporarily Not Available) |
| Elasticsearch | (Temporarily Not Available) |
| Google Cloud Storage (GCS) | [![destination-gcs](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-gcs%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-gcs) |
| Google PubSub | [![destination-pubsub](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-pubsub%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-pubsub) |
| Kafka | [![destination-kafka](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-kafka%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-kafka) |
Expand Down
@@ -0,0 +1,3 @@
*
!Dockerfile
!build
@@ -0,0 +1,11 @@
FROM airbyte/integration-base-java:dev

WORKDIR /airbyte
ENV APPLICATION destination-elasticsearch

COPY build/distributions/${APPLICATION}*.tar ${APPLICATION}.tar

RUN tar xf ${APPLICATION}.tar --strip-components=1

LABEL io.airbyte.version=0.1.0
LABEL io.airbyte.name=airbyte/destination-elasticsearch
@@ -0,0 +1,68 @@
# Destination Elasticsearch

This is the repository for the Elasticsearch destination connector in Java.
For information about how to use this connector within Airbyte, see [the User Documentation](https://docs.airbyte.io/integrations/destinations/elasticsearch).

## Local development

#### Building via Gradle
From the Airbyte repository root, run:
```
./gradlew :airbyte-integrations:connectors:destination-elasticsearch:build
```

#### Create credentials
**If you are a community contributor**, generate the necessary credentials and place them in `secrets/config.json` conforming to the spec file in `src/main/resources/spec.json`.
Note that the `secrets` directory is git-ignored by default, so there is no danger of accidentally checking in sensitive information.

**If you are an Airbyte core member**, follow the [instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci) to set up the credentials.

### Locally running the connector docker image

#### Build
Build the connector image via Gradle:
```
./gradlew :airbyte-integrations:connectors:destination-elasticsearch:airbyteDocker
```
When building via Gradle, the docker image name and tag, respectively, are the values of the `io.airbyte.name` and `io.airbyte.version` `LABEL`s in
the Dockerfile.

#### Run
Then run any of the connector commands as follows:
```
docker run --rm airbyte/destination-elasticsearch:dev spec
docker run --rm -v $(pwd)/secrets:/secrets airbyte/destination-elasticsearch:dev check --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets airbyte/destination-elasticsearch:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/destination-elasticsearch:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
```

## Testing
We use `JUnit` for Java tests.

### Unit and Integration Tests
Place unit tests under `src/test/io/airbyte/integrations/destinations/elasticsearch`.

#### Acceptance Tests
Airbyte has a standard test suite that all destination connectors must pass. Implement the `TODO`s in
`src/test-integration/java/io/airbyte/integrations/destinations/elasticsearchDestinationAcceptanceTest.java`.

### Using gradle to run tests
All commands should be run from airbyte project root.
To run unit tests:
```
./gradlew :airbyte-integrations:connectors:destination-elasticsearch:unitTest
```
To run acceptance and custom integration tests:
```
./gradlew :airbyte-integrations:connectors:destination-elasticsearch:integrationTest
```

## Dependency Management

### Publishing a new version of the connector
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
1. Make sure your changes are passing unit and integration tests.
1. Bump the connector version in `Dockerfile` -- just increment the value of the `LABEL io.airbyte.version` appropriately (we use [SemVer](https://semver.org/)).
1. Create a Pull Request.
1. Pat yourself on the back for being an awesome contributor.
1. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
@@ -0,0 +1,33 @@
# Elasticsearch Destination

Elasticsearch is a Lucene based search engine that's a type of NoSql storage.
Documents are created in an `index`, similar to a `table`in a relation database.

The documents are structured with fields that may contain nested complex structures.
[Read more about Elastic](https://elasticsearch.org/)

This connector maps an incoming `stream` to an Elastic `index`.
When using destination sync mode `append` and `append_dedup`, an `upsert` operation is performed against the Elasticsearch index.
When using `overwrite`, the records/docs are place in a temp index, then cloned to the target index.
The target index is deleted first, if it exists before the sync.

The [ElasticsearchConnection.java](./src/main/java/io/airbyte/integrations/destination/elasticsearch/ElasticsearchConnection.java)
handles the communication with the Elastic server.
This uses the `elasticsearch-java` rest client from the Elasticsearch team -
[https://github.com/elastic/elasticsearch-java/](https://github.com/elastic/elasticsearch-java/)

The [ElasticsearchAirbyteMessageConsumerFactory.java](./src/main/java/io/airbyte/integrations/destination/elasticsearch/ElasticsearchAirbyteMessageConsumerFactory.java)
contains the logic for organizing a batch of records and reporting progress.

The `namespace` and stream `name` are used to generate an index name.
The index is created if it doesn't exist, but no other index configuration is done at this time.

Elastic will determine the type of data by detection.
You can create an index ahead of time for field type customization.

Basic authentication and API key authentication are supported.

## Development
See the Elasticsearch client tests for examples on how to use the library.

[https://github.com/elastic/elasticsearch-java/blob/main/java-client/src/test/java/co/elastic/clients/elasticsearch/end_to_end/RequestTest.java](https://github.com/elastic/elasticsearch-java/blob/main/java-client/src/test/java/co/elastic/clients/elasticsearch/end_to_end/RequestTest.java)
@@ -0,0 +1,49 @@
plugins {
id 'application'
id 'airbyte-docker'
id 'airbyte-integration-test-java'
}

application {
mainClass = 'io.airbyte.integrations.destination.elasticsearch.ElasticsearchDestination'
}

dependencies {
implementation project(':airbyte-config:models')
implementation project(':airbyte-protocol:models')
implementation project(':airbyte-integrations:bases:base-java')
implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs)

implementation 'co.elastic.clients:elasticsearch-java:7.15.0'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.12.3'
implementation 'org.projectlombok:lombok:1.18.20'

// EPL-2.0 OR GPL-2.0 WITH Classpath-exception-2.0
// https://eclipse-ee4j.github.io/jsonp/
implementation 'jakarta.json:jakarta.json-api:2.0.1'

// Needed even if using Jackson to have an implementation of the Jsonp object model
// EPL-2.0 OR GPL-2.0 WITH Classpath-exception-2.0
// https://github.com/eclipse-ee4j/jsonp
implementation 'org.glassfish:jakarta.json:2.0.1'

// MIT
// https://www.testcontainers.org/
//implementation "org.testcontainers:testcontainers:1.16.0"
testImplementation "org.testcontainers:elasticsearch:1.15.3"
integrationTestJavaImplementation "org.testcontainers:elasticsearch:1.15.3"

integrationTestJavaImplementation project(':airbyte-integrations:bases:standard-destination-test')
integrationTestJavaImplementation project(':airbyte-integrations:connectors:destination-elasticsearch')
}

repositories {
maven {
name = "ESSnapshots"
url = "https://snapshots.elastic.co/maven/"
}
maven {
name = "ESJavaGithubPackages"
url = "https://maven.pkg.github.com/elastic/elasticsearch-java"
}
}
@@ -0,0 +1,11 @@
version: "3.7"

services:
elastic:
image: "docker.elastic.co/elasticsearch/elasticsearch:7.15.1"
ports:
- "9200:9200"
environment:
ES_JAVA_OPTS: "-Xms256m -Xmx256m"
discovery.type: "single-node"
network.host: "0.0.0.0"
@@ -0,0 +1,159 @@
/*
* Copyright (c) 2021 Airbyte, Inc., all rights reserved.
*/

package io.airbyte.integrations.destination.elasticsearch;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.util.Objects;

@JsonIgnoreProperties(ignoreUnknown = true)
public class ConnectorConfiguration {

private String endpoint;
private boolean upsert;
private AuthenticationMethod authenticationMethod = new AuthenticationMethod();

public ConnectorConfiguration() {}

public static ConnectorConfiguration fromJsonNode(JsonNode config) {
return new ObjectMapper().convertValue(config, ConnectorConfiguration.class);
}

public String getEndpoint() {
return this.endpoint;
}

public boolean isUpsert() {
return this.upsert;
}

public AuthenticationMethod getAuthenticationMethod() {
return this.authenticationMethod;
}

public void setEndpoint(String endpoint) {
this.endpoint = endpoint;
}

public void setUpsert(boolean upsert) {
this.upsert = upsert;
}

public void setAuthenticationMethod(AuthenticationMethod authenticationMethod) {
this.authenticationMethod = authenticationMethod;
}

@Override
public boolean equals(Object o) {
if (this == o)
return true;
if (o == null || getClass() != o.getClass())
return false;
ConnectorConfiguration that = (ConnectorConfiguration) o;
return upsert == that.upsert && Objects.equals(endpoint, that.endpoint) && Objects.equals(authenticationMethod, that.authenticationMethod);
}

@Override
public int hashCode() {
return Objects.hash(endpoint, upsert, authenticationMethod);
}

@Override
public String toString() {
return "ConnectorConfiguration{" +
"endpoint='" + endpoint + '\'' +
", upsert=" + upsert +
", authenticationMethod=" + authenticationMethod +
'}';
}

static class AuthenticationMethod {

private ElasticsearchAuthenticationMethod method = ElasticsearchAuthenticationMethod.none;
private String username;
private String password;
private String apiKeyId;
private String apiKeySecret;

public ElasticsearchAuthenticationMethod getMethod() {
return this.method;
}

public String getUsername() {
return this.username;
}

public String getPassword() {
return this.password;
}

public String getApiKeyId() {
return this.apiKeyId;
}

public String getApiKeySecret() {
return this.apiKeySecret;
}

public void setMethod(ElasticsearchAuthenticationMethod method) {
this.method = method;
}

public void setUsername(String username) {
this.username = username;
}

public void setPassword(String password) {
this.password = password;
}

public void setApiKeyId(String apiKeyId) {
this.apiKeyId = apiKeyId;
}

public void setApiKeySecret(String apiKeySecret) {
this.apiKeySecret = apiKeySecret;
}

public boolean isValid() {
return switch (this.method) {
case none -> true;
case basic -> Objects.nonNull(this.username) && Objects.nonNull(this.password);
case secret -> Objects.nonNull(this.apiKeyId) && Objects.nonNull(this.apiKeySecret);
};
}

@Override
public boolean equals(Object o) {
if (this == o)
return true;
if (o == null || getClass() != o.getClass())
return false;
AuthenticationMethod that = (AuthenticationMethod) o;
return method == that.method &&
Objects.equals(username, that.username) &&
Objects.equals(password, that.password) &&
Objects.equals(apiKeyId, that.apiKeyId) &&
Objects.equals(apiKeySecret, that.apiKeySecret);
}

@Override
public int hashCode() {
return Objects.hash(method, username, password, apiKeyId, apiKeySecret);
}

@Override
public String toString() {
return "AuthenticationMethod{" +
"method=" + method +
", username='" + username + '\'' +
", apiKeyId='" + apiKeyId + '\'' +
'}';
}

}

}

0 comments on commit eb8c6f5

Please sign in to comment.