Skip to content

Commit

Permalink
Merge branch 'fixup_task_bootstrap_inject' of github.com:abhishekrb19…
Browse files Browse the repository at this point in the history
…/incubator-druid into fixup_task_bootstrap_inject
  • Loading branch information
abhishekrb19 committed Jun 4, 2024
2 parents c2c9fd1 + b036808 commit a16d69e
Show file tree
Hide file tree
Showing 31 changed files with 1,175 additions and 1,101 deletions.
4 changes: 2 additions & 2 deletions docs/configuration/extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@ Core extensions are maintained by Druid committers.
|druid-google-extensions|Google Cloud Storage deep storage.|[link](../development/extensions-core/google.md)|
|druid-hdfs-storage|HDFS deep storage.|[link](../development/extensions-core/hdfs.md)|
|druid-histogram|Approximate histograms and quantiles aggregator. Deprecated, please use the [DataSketches quantiles aggregator](../development/extensions-core/datasketches-quantiles.md) from the `druid-datasketches` extension instead.|[link](../development/extensions-core/approximate-histograms.md)|
|druid-kafka-extraction-namespace|Apache Kafka-based namespaced lookup. Requires namespace lookup extension.|[link](../development/extensions-core/kafka-extraction-namespace.md)|
|druid-kafka-extraction-namespace|Apache Kafka-based namespaced lookup. Requires namespace lookup extension.|[link](../querying/kafka-extraction-namespace.md)|
|druid-kafka-indexing-service|Supervised exactly-once Apache Kafka ingestion for the indexing service.|[link](../ingestion/kafka-ingestion.md)|
|druid-kinesis-indexing-service|Supervised exactly-once Kinesis ingestion for the indexing service.|[link](../ingestion/kinesis-ingestion.md)|
|druid-kerberos|Kerberos authentication for druid processes.|[link](../development/extensions-core/druid-kerberos.md)|
|druid-lookups-cached-global|A module for [lookups](../querying/lookups.md) providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.|[link](../development/extensions-core/lookups-cached-global.md)|
|druid-lookups-cached-global|A module for [lookups](../querying/lookups.md) providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.|[link](../querying/lookups-cached-global.md)|
|druid-lookups-cached-single| Per lookup caching module to support the use cases where a lookup need to be isolated from the global pool of lookups |[link](../development/extensions-core/druid-lookups.md)|
|druid-multi-stage-query| Support for the multi-stage query architecture for Apache Druid and the multi-stage query task engine.|[link](../multi-stage-query/index.md)|
|druid-orc-extensions|Support for data in Apache ORC data format.|[link](../development/extensions-core/orc.md)|
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -627,7 +627,7 @@ the [HTTP input source](../ingestion/input-sources.md#http-input-source).

You can use the following properties to specify permissible JDBC options for:
- [SQL input source](../ingestion/input-sources.md#sql-input-source)
- [globally cached JDBC lookups](../development/extensions-core/lookups-cached-global.md#jdbc-lookup)
- [globally cached JDBC lookups](../querying/lookups-cached-global.md#jdbc-lookup)
- [JDBC Data Fetcher for per-lookup caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer).

These properties do not apply to metadata storage connections.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ title: "Apache Kafka Lookups"
~ under the License.
-->

To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` in the extensions load list.
To use this Apache Druid extension, [include](../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` in the extensions load list.

If you need updates to populate as promptly as possible, it is possible to plug into a Kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.

Expand All @@ -41,13 +41,13 @@ If you need updates to populate as promptly as possible, it is possible to plug
| `kafkaTopic` | The Kafka topic to read the data from | Yes ||
| `kafkaProperties` | Kafka consumer properties (`bootstrap.servers` must be specified) | Yes ||
| `connectTimeout` | How long to wait for an initial connection | No | `0` (do not wait) |
| `isOneToOne` | The map is a one-to-one (see [Lookup DimensionSpecs](../../querying/dimensionspecs.md)) | No | `false` |
| `isOneToOne` | The map is a one-to-one (see [Lookup DimensionSpecs](./dimensionspecs.md)) | No | `false` |

The extension `kafka-extraction-namespace` enables reading from an [Apache Kafka](https://kafka.apache.org/) topic which has name/key pairs to allow renaming of dimension values. An example use case would be to rename an ID to a human-readable format.

## How it Works

The extractor works by consuming the configured Kafka topic from the beginning, and appending every record to an internal map. The key of the Kafka record is used as they key of the map, and the payload of the record is used as the value. At query time, a lookup can be used to transform the key into the associated value. See [lookups](../../querying/lookups.md) for how to configure and use lookups in a query. Keys and values are both stored as strings by the lookup extractor.
The extractor works by consuming the configured Kafka topic from the beginning, and appending every record to an internal map. The key of the Kafka record is used as they key of the map, and the payload of the record is used as the value. At query time, a lookup can be used to transform the key into the associated value. See [lookups](./lookups.md) for how to configure and use lookups in a query. Keys and values are both stored as strings by the lookup extractor.

The extractor remains subscribed to the topic, so new records are added to the lookup map as they appear. This allows for lookup values to be updated in near-realtime. If two records are added to the topic with the same key, the record with the larger offset will replace the previous record in the lookup map. A record with a `null` payload will be treated as a tombstone record, and the associated key will be removed from the lookup map.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ title: "Globally Cached Lookups"
~ under the License.
-->

To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` in the extensions load list.
To use this Apache Druid extension, [include](../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` in the extensions load list.

## Configuration
:::info
Static configuration is no longer supported. Lookups can be configured through
[dynamic configuration](../../querying/lookups.md#configuration).
[dynamic configuration](./lookups.md#configuration).
:::

Globally cached lookups are appropriate for lookups which are not possible to pass at query time due to their size,
Expand All @@ -36,7 +36,7 @@ and are small enough to reasonably populate in-memory. This usually means tens t

Globally cached lookups all draw from the same cache pool, allowing each process to have a fixed cache pool that can be used by cached lookups.

Globally cached lookups can be specified as part of the [cluster wide config for lookups](../../querying/lookups.md) as a type of `cachedNamespace`
Globally cached lookups can be specified as part of the [cluster wide config for lookups](./lookups.md) as a type of `cachedNamespace`

```json
{
Expand Down Expand Up @@ -84,7 +84,7 @@ The parameters are as follows
|--------|-----------|--------|-------|
|`extractionNamespace`|Specifies how to populate the local cache. See below|Yes|-|
|`firstCacheTimeout`|How long to wait (in ms) for the first run of the cache to populate. 0 indicates to not wait|No|`0` (do not wait)|
|`injective`|If the underlying map is [injective](../../querying/lookups.md#query-rewrites) (keys and values are unique) then optimizations can occur internally by setting this to `true`|No|`false`|
|`injective`|If the underlying map is [injective](./lookups.md#query-rewrites) (keys and values are unique) then optimizations can occur internally by setting this to `true`|No|`false`|

If `firstCacheTimeout` is set to a non-zero value, it should be less than `druid.manager.lookups.hostUpdateTimeout`. If `firstCacheTimeout` is NOT set, then management is essentially asynchronous and does not know if a lookup succeeded or failed in starting. In such a case logs from the processes using lookups should be monitored for repeated failures.

Expand All @@ -93,7 +93,7 @@ Proper functionality of globally cached lookups requires the following extension

## Example configuration

In a simple case where only one [tier](../../querying/lookups.md#dynamic-configuration) exists (`realtime_customer2`) with one `cachedNamespace` lookup called `country_code`, the resulting configuration JSON looks similar to the following:
In a simple case where only one [tier](./lookups.md#dynamic-configuration) exists (`realtime_customer2`) with one `cachedNamespace` lookup called `country_code`, the resulting configuration JSON looks similar to the following:

```json
{
Expand Down Expand Up @@ -170,7 +170,7 @@ It's highly recommended that `druid.lookup.namespace.numBufferedEntries` is set

## Supported lookups

For additional lookups, please see our [extensions list](../../configuration/extensions.md).
For additional lookups, please see our [extensions list](../configuration/extensions.md).

### URI lookup

Expand Down Expand Up @@ -345,7 +345,7 @@ The JDBC lookups will poll a database to populate its local cache. If the `tsCol

|Parameter|Description|Required|Default|
|---------|-----------|--------|-------|
|`connectorConfig`|The connector config to use. You can set `connectURI`, `user` and `password`. You can selectively allow JDBC properties in `connectURI`. See [JDBC connections security config](../../configuration/index.md#jdbc-connections-to-external-databases) for more details.|Yes||
|`connectorConfig`|The connector config to use. You can set `connectURI`, `user` and `password`. You can selectively allow JDBC properties in `connectURI`. See [JDBC connections security config](../configuration/index.md#jdbc-connections-to-external-databases) for more details.|Yes||
|`table`|The table which contains the key value pairs|Yes||
|`keyColumn`|The column in `table` which contains the keys|Yes||
|`valueColumn`|The column in `table` which contains the values|Yes||
Expand Down Expand Up @@ -377,7 +377,7 @@ The JDBC lookups will poll a database to populate its local cache. If the `tsCol
:::info
If using JDBC, you will need to add your database's client JAR files to the extension's directory.
For Postgres, the connector JAR is already included.
See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#installing-the-mysql-connector-library) or [MariaDB](./mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
See the MySQL extension documentation for instructions to obtain [MySQL](../development/extensions-core/mysql.md#installing-the-mysql-connector-library) or [MariaDB](../development/extensions-core/mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
The connector JAR should reside in the classpath of Druid's main class loader.
To add the connector JAR to the classpath, you can copy the downloaded file to `lib/` under the distribution root directory. Alternatively, create a symbolic link to the connector in the `lib` directory.
:::
Expand Down
10 changes: 5 additions & 5 deletions docs/querying/lookups.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ title: "Lookups"

Lookups are a concept in Apache Druid where dimension values are (optionally) replaced with new values, allowing join-like
functionality. Applying lookups in Druid is similar to joining a dimension table in a data warehouse. See
[dimension specs](../querying/dimensionspecs.md) for more information. For the purpose of these documents, a "key"
[dimension specs](./dimensionspecs.md) for more information. For the purpose of these documents, a "key"
refers to a dimension value to match, and a "value" refers to its replacement. So if you wanted to map
`appid-12345` to `Super Mega Awesome App` then the key would be `appid-12345` and the value would be
`Super Mega Awesome App`.
Expand All @@ -43,12 +43,12 @@ and such data belongs in the raw denormalized data for use in Druid.

Lookups are generally preloaded in-memory on all servers. But very small lookups (on the order of a few dozen to a few
hundred entries) can also be passed inline in native queries time using the "map" lookup type. Refer to the
[dimension specs](dimensionspecs.md) documentation for details.
[dimension specs](./dimensionspecs.md) documentation for details.

Other lookup types are available as extensions, including:

- Globally cached lookups from local files, remote URIs, or JDBC through [lookups-cached-global](../development/extensions-core/lookups-cached-global.md).
- Globally cached lookups from a Kafka topic through [kafka-extraction-namespace](../development/extensions-core/kafka-extraction-namespace.md).
- Globally cached lookups from local files, remote URIs, or JDBC through [lookups-cached-global](./lookups-cached-global.md).
- Globally cached lookups from a Kafka topic through [kafka-extraction-namespace](./kafka-extraction-namespace.md).

Query Syntax
------------
Expand Down Expand Up @@ -213,7 +213,7 @@ Injective lookups are eligible for the largest set of query rewrites. Injective
function may encounter null input values.

To determine whether a lookup is injective, Druid relies on an `injective` property that you can set in the
[lookup definition](../development/extensions-core/lookups-cached-global.md). In general, you should set
[lookup definition](./lookups-cached-global.md). In general, you should set
`injective: true` for any lookup that satisfies the required properties, to allow Druid to run your queries as fast as
possible.

Expand Down
19 changes: 17 additions & 2 deletions extensions-core/azure-extensions/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,23 @@
</dependency>
<!-- Tests -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-params</artifactId>
<scope>test</scope>
</dependency>
<dependency>
Expand Down

This file was deleted.

0 comments on commit a16d69e

Please sign in to comment.