Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support using KSQL as library to write streaming applications (aka KSQL embedded mode) #734

Closed
miguno opened this issue Feb 14, 2018 · 25 comments
Labels
enhancement popular Issues with significant user demand

Comments

@miguno
Copy link
Contributor

miguno commented Feb 14, 2018

Some users have expressed interest in leveraging KSQL as a Java library to write stream processing applications (on the JVM, i.e. primarily Java/Scala), similar to how Kafka's Streams API is used for developing such applications. Sometimes this has been called "KSQL embedded mode".

If you are interested in this functionality, please:

  1. Add your +1 vote (👍 ) to this message
  2. Add a comment that gives us context information about why would need it, for what, and which environment you'd be using it. For example, what would be the first concrete use case where you'd use KSQL as a library? Which programming language (Java, Scala, ...)? Would you like to mix and match Kafka's Streams API with KSQL in your processing application, or ...?
@deinspanjer
Copy link

deinspanjer commented Feb 19, 2018

We currently have a streaming ETL system implemented using Kafka Streams API in microservice Spring Boot applications.

There is a lot of general replication of SQL-like logic in this Java code which seems like it could potentially be replaced by some straightforward KSQL statements.

Being able to read from a raw stream of activity events, de-duplicated them, join them with other KTable data from lookup tables, and then write them to processed topics for re-use by other modules or for eventual loading into the data warehouse are the general use cases.

@tobihofmann
Copy link

We are building a set of microservices using the Streams-API. If KSQL could be used as a library (like the Streams-API) we would be able to simplify our code.

If KSQL will offer point in time queries (see #530) it would be good to have that available as an API / library, too.

@giltig
Copy link

giltig commented Mar 25, 2018

Hi,
We use Kafka Streams as a primary way to access our data in the we want.
It would be great to use KSQL as an embedded language (similar to dataframes in Spark),
so we could replace streaming code with KSQL code to simplify things.

@illusiz
Copy link

illusiz commented Apr 6, 2018

Hi,
We use KSQL to create a table which calculates user balances from different streams and we want to get that table data in our Java code.

@miguno
Copy link
Contributor Author

miguno commented Apr 9, 2018

@illusiz:

We use KSQL to create a table which calculates user balances from different streams and we want to get that table data in our Java code.

FYI: You can do that already today by reading the (KSQL) table's backing Kafka topic into a KTable in KStreams.

@hpgrahsl
Copy link
Contributor

I'd be really great if it will be possible to access the state stores backing KSQL queries and do interactive queries on them in a way we can do it for kstreams apps already. AFAIK that's currently not possible ... or actually only with an additional kstreams app.

@coderztf
Copy link

We are building a log monitor system with kafka, and want to use KSQL to filter the log data continuously. However, we doubt the reliability of our monitor system to filter data by the way of REST API. If we can use KSQL as a library rather than HTTP request, it would be more reliable

@Bas83
Copy link

Bas83 commented Jun 9, 2018

We want to use Kafka in a multi-tenant way with many customers having their own login credentials with ACLs defined for a limited amount of topics on the Kafka cluster. We'd like to only allow customers to access Kafka with their own credentials, but instantiating and running whole KSQL-server instances for every customer and also limiting access to those instances would get very cumbersome. Also, queries are made dynamically and starting a headless server with a predefined query file would lead to a lot of overhead. So this embedded mode sounds like a great feature for us. We'd be using it from Java (not Scala) for now, and for our use case there would not be a great need to mix the Streams API, as another app could just read the resulting topic for us. It would be ideal if we can read SELECT/non-persistent queries directly from the API though.

@johnbrinnand
Copy link

We want to make kafka the source of truth for all our data. And since we are generating a great deal of data, we want to query it in real time. We can use kafka streams, but ksql has semantics which abstracts the underlying streams API and has a programming model which could make it easier to adopt. I vote for KSQL as a Java Library.

@lkokhreidze
Copy link

lkokhreidze commented Jul 15, 2018

We have SpaaS (Stream-Processing as a Service) built using Kafka streams API and running/supporting separate KSQL server just to run KSQL queries seems extra overhead. We would love to use KSQL in our already existing streaming apps though. In addition, if running KSQL in Java streaming apps is possible it'll give us the possibility to enhance our SPaaS platform where engineers, data scientists, and analysts don't have to write Java code at all when they require new aggregations and/or pipeline. They can submit a pull request with KSQL statements and rest of the magic can be done inside the platform 🚀 🚀 🚀 Looking forward to this!!!

@joncourt
Copy link

joncourt commented Oct 4, 2018

we're not big fans of having platforms to run things like KSQL and Kafka Connect - we'd rather embed them in a Jar which can be run through our normal development cycle and pipelines. having this stuff on a 'platform' requires someone to manage that platform, yet another deployment burden and the risk of impacting unrelated jobs.

embedded, even if it's just wrapper jar.

@tkaszuba
Copy link

tkaszuba commented Feb 2, 2019

We have a lot of SQL developers who are uncomfortable with working on KStream and KTable directly. Allowing them to use KSQL in their spring boot micro services would really increase the adoption rate.

@jemantheiy
Copy link

I am developing a near real time architecture with kafka steams, ksql, registry. Our api read near real time off if kafka topics using spring boot flux and kafka reactive consumer. It be nice if I could convert that to ksql. I know I can post to the ksql interface which I am doing in some cases. A client lib would greatly simplify things overall.

@dzmitry-kankalovich
Copy link

We were struggling with the adoption of application logic written in plain streams, so we went with refactoring in KSQL. However, due to the complexity of business logic and lack of some features in KSQL atm, some of the parts are still in streams, in a separate module, thus creating a bit of discrepancy in a sense that entire otherwise singular flow split among two modules. Having embedded KSQL server could've helped solve this problem and enable us to do one java app.

Another thing is that we'd like to run it in integration tests, similar to EmbedKafka.

@confluentinc confluentinc deleted a comment from raguessner Jul 2, 2019
@kali786516
Copy link

kali786516 commented Jul 16, 2019

is confluent building this request? did anyone tried this https://github.com/mmolimar/ksql-jdbc-driver

@thobrien
Copy link

thobrien commented Oct 2, 2019

I would like to use KSQL directly (particularly for it's time alignment capabilities when crossing streams, e.g. with sliding window or session window) as a step in the middle of my KStreams processing. I'd like to add my own custom logic before and after the KStreams query.

@apurvam apurvam added the popular Issues with significant user demand label Oct 16, 2019
@arpan2501
Copy link

Spark streaming provides a very easy to use interface SparkSQL for Java Python Scala, so why not Kafka is coming up with the libraries so that we also can leverage this feature.

@andaag
Copy link

andaag commented Dec 17, 2019

We were struggling with the adoption of application logic written in plain streams, so we went with refactoring in KSQL. However, due to the complexity of business logic and lack of some features in KSQL atm, some of the parts are still in streams, in a separate module, thus creating a bit of discrepancy in a sense that entire otherwise singular flow split among two modules. Having embedded KSQL server could've helped solve this problem and enable us to do one java app.

Another thing is that we'd like to run it in integration tests, similar to EmbedKafka.

I think this is a great example of why this should be made available as a jar. With ksql+kstreams you can avoid complexity until you need it, and build only the complex parts in kstreams/code. As it is per today we need to deal with a large complex kstreams api, rather than being able to put the simple cases in sql and the complex ones in code.

Also offering this as a library is a large advantage, as you are probably already monitoring your services closely. The cost here really isnt "you can just start this service", you also need your company's monitoring frameworks to be integrated, healthchecks working, metrics working, alerts working. These are already things you've integrated with the kafka standard + your service...

@ShahOdin
Copy link

ShahOdin commented Jan 2, 2020

I'd love to see a fs2-based client for Scala code.

@wdonne
Copy link

wdonne commented Feb 28, 2020

KSQL uses Kafka Streams and the major advantage of the latter is that it is not a cluster but a library. This is a selling point for Confluent. It is therefore strange that KSQL moves away from that.
I also find that it is operationally not interesting to have all kinds of queries in one pot. This reminds me of the client-server era where all applications had their triggers and stored procedures in the same catalog.
KSQL queries are part of an application, but they can't be deployed with them in a simple way. Deployment pipelines become more difficult.
The number of KSQL instances fixes the parallelism for all queries, while it may be interesting to vary this. With embedded KSQL we would have more control over our consumer groups.
The KSQL language has limitations that require separate Kafka Streams applications to compensate them. Such a mix is more natural and manageable when you have just one application that creates one Kafka Streams topology. This is easy if StreamsBuilder also accepts KSQL statements.

@mark-dlc
Copy link

mark-dlc commented May 29, 2020

Wasn't it planned to offer KSQL as an embedded library as well?

Both the KSQL paper and some slides from confluent specifically highlight the different deployment modes of KSQL, including 'embedded'.
http://openproceedings.org/2019/conf/edbt/EDBT19_paper_329.pdf
https://www.itoug.it/wp-content/uploads/2017/12/2018-02-01-Kafka-Streams-and-KSQL.pdf

Confluent advertises that benefits of Kafka Streams as an embedded stream processing solution, wouldn't this also apply to KSQL for the same reasons? KSQL's DSL has advantages over the streams Java DSL, like ease of use.

@miguno
Copy link
Contributor Author

miguno commented Aug 12, 2020

For those of you wanting to use ksqlDB as a library: while this is not the same functionality as in this feature request, ksqlDB 0.10+ now ships with a native Java client:

This might cover some of the needs of the people in this thread. Feedback is of course welcome!

@bernhardttom
Copy link

bernhardttom commented Apr 28, 2021

Disclaimer, I work for EsperTech.
Esper's been available embedded and provides much richer functionality. It's license is GPLv2

@up-to-you
Copy link

up-to-you commented Jun 23, 2021

This issue will never be solved, since embedded library for really open source kafka-streams contradicts earnings model of confluent.
A lot of companies which extensively use kafka eventually comes to own framework for transformation/routing/aggregation on top of kafka-streams as i know. KSQLDB as a standalone solution in the world of microservices, that requires other confluent products for full functioning line registry, monitoring, syntax highlighting etc. is deadend product and i sure the embedded version will require it too.
To be honest, my team was able to build 90% of KSQLDB functionality for cloud solution for half of a year with more advanced features and fine-grained control.
I hope, soon, there will be open source solution for cloud environment that does not contradict the company's policies and are able to be opened for the community.

@miguno
Copy link
Contributor Author

miguno commented Apr 7, 2022

Closing this issue. If there are future requests similar to this, please reopen or create a new issue.

@miguno miguno closed this as completed Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement popular Issues with significant user demand
Projects
None yet
Development

No branches or pull requests