Skip to content
The event streaming database purpose-built for stream processing applications
Java ANTLR Shell HTML Dockerfile TSQL
Branch: master
Clone or download
big-andy-coates and stevenpyzhang test: fix timezone issue in `RecordFormatterTest` (#4585)
* test: fix timezone issue in `RecordFormatterTest`

Test depends on timezone, so was failing in US.

* spotbugs fix

Co-authored-by: Steven Zhang <35498506+stevenpyzhang@users.noreply.github.com>
Latest commit 8274d5a Feb 19, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github docs: update codeowners file for docs-md directory (DOCS-3120) (#4161) Dec 17, 2019
bin Add confluent-security to classpath to enable security plugins (#2993) Jun 19, 2019
build-tools Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
checkstyle chore: re-enable final param and var checkstyle rule Dec 30, 2019
config feat: enhance `PRINT TOPIC`'s format detection (#4551) Feb 14, 2020
debian Merge branch '5.1.x' Jan 3, 2019
design-proposals docs: Bring KLIP-15 up to date (#4499) Feb 14, 2020
docs-md Merge branch '5.5.x' Feb 14, 2020
docs feat: enhance `PRINT TOPIC`'s format detection (#4551) Feb 14, 2020
ext Reduce log level when loading blacklist to INFO from ERROR (#1522) Jul 4, 2018
findbugs Fix old copyrights on some files (#2456) Feb 20, 2019
ksql-api keep findbugs happy (#4587) Feb 18, 2020
ksql-benchmark Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-cli chore: reintroduce 0dc8e64 to fix deprecation warning (#4588) Feb 18, 2020
ksql-common Merge branch '5.5.x' Feb 15, 2020
ksql-console-scripts Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-docker Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-engine feat: Plug insert streams into backend (#4512) Feb 14, 2020
ksql-etc Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-examples Merge branch '5.5.x' Feb 11, 2020
ksql-execution fix: support conversion of STRING to BIGINT for window bounds (#4500) Feb 10, 2020
ksql-functional-tests test: regenerate the stream stream join qtt (#4576) Feb 17, 2020
ksql-metastore Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-package Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-parser chore: add tests covering use of system columns in projection (#4502) Feb 10, 2020
ksql-rest-app test: fix timezone issue in `RecordFormatterTest` (#4585) Feb 19, 2020
ksql-rest-client feat: Adds a new RoutingFilter, MaximumLagFilter that looks at offset… Feb 8, 2020
ksql-rest-model chore: merge from 5.5.x Feb 14, 2020
ksql-rocksdb-config-setter Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-serde feat: enhance `PRINT TOPIC`'s format detection (#4551) Feb 14, 2020
ksql-streams feat: Adds a new RoutingFilter, MaximumLagFilter that looks at offset… Feb 8, 2020
ksql-test-util chore: Primitive Keys comes to PRINT TOPIC (#4507) Feb 11, 2020
ksql-tools Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-udf-quickstart fix: use HTTPS instead of HTTP to resolve dependencies in Maven arche… Feb 18, 2020
ksql-udf Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
ksql-version-metrics-client Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
licenses Bump Confluent to 6.0.0-SNAPSHOT, Kafka to 6.0.0-SNAPSHOT Feb 7, 2020
notices Starting a clean ksql repo by removing all the previous commit history. Aug 16, 2017
.github_changelog_generator Add ability to autogenerate change logs (#692) Feb 3, 2018
.gitignore feat: add .vscode/ and .factorypath to .gitignore (#4035) Dec 4, 2019
.readthedocs.yml docs: add readthedocs.yml to 5.4.0-post branch (DOCS-3255) (#4338) Jan 16, 2020
CHANGELOG.md docs: add note in breaking changes about not skipping statements from… Feb 14, 2020
CONTRIBUTING.md Add note re. logs folder for server output (#4318) Jan 15, 2020
Jenkinsfile build: Do not run twist lock scan and other docker image operations s… Jan 13, 2020
LICENSE Update to use CCL (#2278) Dec 14, 2018
PULL_REQUEST_TEMPLATE.md Fix typo Sep 25, 2018
README.md Slack channels renamed (#3927) Nov 21, 2019
commitlint.config.js build: add commitlint for conventional commits (#3008) Jun 25, 2019
docker-compose.yml docs: clarify Docker image build availability (#4345) Jan 17, 2020
ksql-rocket.png chore: rename ksql rocket image Jul 22, 2019
mkdocs.yml docs: add ksqlDb upgrade topic to TOC (DOCS-3396) (#4549) Feb 13, 2020
package-lock.json chore: update npm package dependencies to remove known vulnerabilities ( Aug 14, 2019
package.json chore: update npm package dependencies to remove known vulnerabilities ( Aug 14, 2019
pom.xml Merge branch '5.5.x' Feb 11, 2020
screencast.jpg KSQL screencast image Aug 30, 2017

README.md

KSQL rocket ksqlDB

The event streaming database purpose-built for stream processing applications

Overview

ksqlDB is an event streaming database for Apache Kafka. It is distributed, scalable, reliable, and real-time. ksqlDB combines the power of real-time stream processing with the approachable feel of a relational database through a familiar, lightweight SQL syntax. ksqlDB offers these core primitives:

  • Streams and tables - Create relations with schemas over your Apache Kafka topic data
  • Materialized views - Define real-time, incrementally updated materialized views over streams using SQL
  • Push queries- Continuous queries that push incremental results to clients in real time
  • Pull queries - Query materialized views on demand, much like with a traditional database
  • Connect - Integrate with any Kafka Connect data source or sink, entirely from within ksqlDB

Composing these powerful primitives enables you to build a complete streaming app with just SQL statements, minimizing complexity and operational overhead. ksqlDB supports a wide range of operations including aggregations, joins, windowing, sessionization, and much more. You can find more ksqlDB tutorials and resources here.

Getting Started

Documentation

See the ksqlDB documentation for the latest stable release.

Use Cases and Examples

Materialized views

ksqlDB allows you to define materialized views over your streams and tables. Materialized views are defined by what is known as a "persistent query". These queries are known as persistent because they maintain their incrementally updated results using a table.

CREATE TABLE hourly_metrics AS
  SELECT url, COUNT(*)
  FROM page_views
  WINDOW TUMBLING (SIZE 1 HOUR)
  GROUP BY url EMIT CHANGES;

Results may be "pulled" from materialized views on demand via SELECT queries. The following query will return a single row:

SELECT * FROM hourly_metrics
  WHERE url = 'http://myurl.com' AND WINDOWSTART = '2019-11-20T19:00';

Results may also be continuously "pushed" to clients via streaming SELECT queries. The following streaming query will push to the client all incremental changes made to the materialized view:

SELECT * FROM hourly_metrics EMIT CHANGES;

Streaming queries will run perpetually until they are explicitly terminated.

Streaming ETL

Apache Kafka is a popular choice for powering data pipelines. ksqlDB makes it simple to transform data within the pipeline, readying messages to cleanly land in another system.

CREATE STREAM vip_actions AS
  SELECT userid, page, action
  FROM clickstream c
  LEFT JOIN users u ON c.userid = u.user_id
  WHERE u.level = 'Platinum' EMIT CHANGES;

Anomaly Detection

ksqlDB is a good fit for identifying patterns or anomalies on real-time data. By processing the stream as data arrives you can identify and properly surface out of the ordinary events with millisecond latency.

CREATE TABLE possible_fraud AS
  SELECT card_number, count(*)
  FROM authorization_attempts
  WINDOW TUMBLING (SIZE 5 SECONDS)
  GROUP BY card_number
  HAVING count(*) > 3 EMIT CHANGES;

Monitoring

Kafka's ability to provide scalable ordered messages with stream processing make it a common solution for log data monitoring and alerting. ksqlDB lends a familiar syntax for tracking, understanding, and managing alerts.

CREATE TABLE error_counts AS
  SELECT error_code, count(*)
  FROM monitoring_stream
  WINDOW TUMBLING (SIZE 1 MINUTE)
  WHERE  type = 'ERROR'
  GROUP BY error_code EMIT CHANGES;

Integration with External Data Sources and Sinks

ksqlDB includes native integration with Kafka Connect data sources and sinks, effectively providing a unified SQL interface over a broad variety of external systems.

The following query is a simple persistent streaming query that will produce all of its output into a topic named clicks_transformed:

CREATE STREAM clicks_transformed AS
  SELECT userid, page, action
  FROM clickstream c
  LEFT JOIN users u ON c.userid = u.user_id EMIT CHANGES;

Rather than simply send all continuous query output into a Kafka topic, it is often very useful to route the output into another datastore. ksqlDB's Kafka Connect integration makes this pattern very easy.

The following statement will create a Kafka Connect sink connector that continuously sends all output from the above streaming ETL query directly into Elasticsearch:

 CREATE SINK CONNECTOR es_sink WITH (
  'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
  'key.converter'   = 'org.apache.kafka.connect.storage.StringConverter',
  'topics'          = 'clicks_transformed',
  'key.ignore'      = 'true',
  'schema.ignore'   = 'true',
  'type.name'       = '',
  'connection.url'  = 'http://elasticsearch:9200');

Join the Community

For user help, questions or queries about KSQL please use our user Google Group or our public Slack channel #ksqldb in Confluent Community Slack

For discussions about development of KSQL please use our developer Google Group. You can also hang out in our developer Slack channel #ksqldb-dev in - Confluent Community Slack - this is where day to day chat about the development of KSQL happens. Everyone is welcome!

You can get help, learn how to contribute to KSQL, and find the latest news by connecting with the Confluent community.

For more general questions about the Confluent Platform please post in the Confluent Google group.

Contributing

Contributions to the code, examples, documentation, etc. are very much appreciated.

License

The project is licensed under the Confluent Community License.

Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation.

You can’t perform that action at this time.