Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming ingestion support #2

Closed
danield137 opened this issue Nov 28, 2018 · 9 comments
Closed

Add streaming ingestion support #2

danield137 opened this issue Nov 28, 2018 · 9 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@danield137
Copy link

danield137 commented Nov 28, 2018

There were a few major reasons this wasn't the first option developed.

  1. Streaming wasn't (still isn't) available on our Java client (Support streaming ingest client azure-kusto-java#20)
  2. Streaming without a fallback mechanizem can create a potentially high load on our ingestion service, which in turn will cause 429 or 5xx and will result in data loss
  3. Streaming Async might cause data loss due to offset management issues
  4. Streaming Sync can cause a bottle neck on Kafka itself, as it is much faster to write locally than over the net

Going forward, this is something we would definitely want, but need to have good reasoning with regards to the above issues.

@danield137 danield137 added enhancement New feature or request help wanted Extra attention is needed labels Nov 28, 2018
@vladikbr vladikbr self-assigned this Aug 5, 2020
@vladikbr vladikbr assigned yihezkel and unassigned vladikbr Nov 24, 2020
@yogilad
Copy link
Contributor

yogilad commented Jun 1, 2021

Added to SDK backlog

@nielsberglund
Copy link

Where are we with streaming ingestion? Just compared ingesting from Kafka using the Kusto Connector and ingesting via EventHubs. No matter what I do I cannot get the latency using the Kafka Connector down to below (at the best case) 3 - 5 seconds, that is tweaking the ingestionbatching policy. When I ingest from EventHubs, the latency is ~50 - 100 ms!

@yogilad
Copy link
Contributor

yogilad commented Aug 23, 2021

Hi Nils,

This change is not yet scheduled for implementation.
In general Kusto is not built for millisecond velocities, and configuring it that way means making a big compromise on ingestion capacity (MBs/Sec).

I can temporarily suggest the following configurations,

  • Set the batching policy blob count to 1, preferably on the lowest possible level (table or DB)
  • Tune Kafka's Kusto Sink Connector flush.interval.ms parameter down to achieve a desirable result

Note!!!
This configuration can not ingest large amounts of data, and is highly inefficient on the engine.

@nielsberglund
Copy link

nielsberglund commented Aug 23, 2021

Thanks @yogilad!

I managed to get the latency down to the 3 - 5 seconds range by changing configuration as per your suggestion above - Thanks!

As for Kusto not being built for milliseconds latencies - I would be happy if I could get the latency down to the one second range, especially seeing that with EventHubs I get sub 100 - 300 ms latency. The fact that I am - ""out of the box" - getting that low latency from EventHubs makes me a bit concerned when you say that Kusto is not built for milliseconds velocities.

The use case right now for me is am comparing ADX with Apache Pinot, and looking at what to use for real-time user-facing analytics.

Anyway, thank you again for your input - most valuable!

@yogilad
Copy link
Contributor

yogilad commented Aug 23, 2021

Let me clarify myself.
Kusto has multiple data intake channels.
The commonly used ones are built to ingest gigabyte+ of data with +-minutes latency.
The streaming ones handle less data, but do so in ~seconds latency.
You can tune the former to come close to the latter, but at the expanse of data volume.
This is the concern I raised above.

Can you share the amount of data you wish to ingest (MB/s, rate of requests, number of concurrent clients)?
Also, what's your timeline for the evaluation?

@nielsberglund
Copy link

Hi Youchai!

Thanks for the clarifications!

If it is OK with you, I will send you an email, where I describe a bit more what we are thinking of doing.

Thanks!

Niels

@yogilad
Copy link
Contributor

yogilad commented Aug 23, 2021

Adding @ohadbitt

Update:

  • We'll try to push this change up the backlog, and hopefully get it working sometime next week.
    You can then check if you are able to achieve a satisfactory result with it.

  • Discussing this with our PM, if you do need 'under 1 second' times Azure Stream Analytics (ASA) is more suitable. If you need 'under 1 second' times in combination with Kusto's analytics capabilities you can opt for a joint solution where ASA also forwards the data to Kusto for deeper analysis (in Kusto latency times).
    This feature is about to be released (if it has not already)

@nielsberglund
Copy link

Hi Yochai!

Thanks for the above. I just sent you an email as well. I'll definitely check the connector out whenever you have done the changes. Oh, speaking of which; it seems there is an issue with the latest connector (2.1), where it loads into Kafka Connect but is not added as a connector. I have a separate issue logged for that.

Thanks again!

Niels

@yogilad
Copy link
Contributor

yogilad commented Oct 19, 2021

Released in PR #61.
Please use v2.2.0

@yogilad yogilad closed this as completed Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants