-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completely rework the Druid getting started process #2216
Conversation
I love this one. 👍 |
|
||
You will need: | ||
|
||
* Java 7 or better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/better/higher
@fjy could you gzip |
@himanshug @navis @gianm @pjain1 added clustering docs. More changes to come. |
|
||
## Tune Druid Brokers | ||
|
||
Druid Brokers also benefit greatly from being tuned to the hardware it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"they run on"?
@himanshug addressed comments |
|
||
```bash | ||
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o $zookeeper-3.4.6.tar.gz | ||
tar xzf $zookeeper-3.4.6.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why there's a $ in front of zookeeper-3.4.6.tar.gz on this line and the one before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should be removed
java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/* io.druid.cli.Main server coordinator | ||
java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/* io.druid.cli.Main server overlord | ||
java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/* io.druid.cli.Main server middleManager | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess most people trying this will know to put these each in background or run each in a different window or whatever, but it's tempting to cut/paste this whole thing to execute...
@rasahner addressed comments |
👍 |
We recommend this kind of architecture if you need real-time analytics but *also* need 100% fidelity | ||
for historical data. All streaming ingestion methods currently supported by Druid do introduce the | ||
possibility of dropped or duplicated messages in certain failure scenarios, and batch re-ingestion | ||
eliminates this potential source of error for historical data. This also gives you the option to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first part of the "also" isn't really an "also" - necessary re-ingestion because of possible errors is exactly what has been being discussed. I'd replace both sentences with something like
"Hybrid streaming also gives you the option to re-ingest your data if you needed to revise it for any reason."
+1 when author thinks it is ready. |
- [Streams-based tutorial](tutorial-streams.html) showing you how to push data over HTTP. | ||
- [Kafka-based tutorial](tutorial-kafka.html) showing you how to load data from Kafka. | ||
|
||
## Hybrid batch/streaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if my comments were confusing. Here's my recommended text for this whole section. I think it's not necessary to say anything right here about queries not caring how the data was ingested - it potentially adds more confusion than it takes away.
You can combine batch and streaming methods in a hybrid batch/streaming architecture. In a hybrid architecture, you use a streaming method to do initial ingestion, and then periodically re-ingest older data in batch mode (typically every few hours, or nightly). When Druid re-ingests data for a time range, the new data automatically replaces the data from the earlier ingestion.
All streaming ingestion methods currently supported by Druid do introduce the possibility of dropped or duplicated messages in certain failure scenarios, and batch re-ingestion eliminates this potential source of error for historical data.
Batch re-ingestion also gives you the option to re-ingest your data if you needed to revise it for any reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
I have no other comments. |
f82e1c7
to
067bfda
Compare
Completely rework the Druid getting started process
This PR depends on some CSS changes to Druid docs which are coming in a separate PR. The updated pages will not render correctly without those changes.
This will rework the Druid getting started process to be very similar to Imply's recommended getting started process, which was mostly written by @gianm . The packaging of Druid will also be similar to what Imply is doing.