Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely rework the Druid getting started process #2216

Merged
merged 1 commit into from
Feb 4, 2016
Merged

Conversation

fjy
Copy link
Contributor

@fjy fjy commented Jan 6, 2016

  • rejigger the distribution packaging to make more sense
  • new quickstart tutorial
  • new load batch data tutorial
  • new load streaming data tutorial
  • new load kafka tutorial
  • new clustering tutorial
  • new streaming ingestion overview page
  • new stream push page
  • new stream pull page
  • lots of new information about using Druid and Tranquility
  • added a new query optimization page
  • added a new caching info page

This PR depends on some CSS changes to Druid docs which are coming in a separate PR. The updated pages will not render correctly without those changes.

This will rework the Druid getting started process to be very similar to Imply's recommended getting started process, which was mostly written by @gianm . The packaging of Druid will also be similar to what Imply is doing.

@navis
Copy link
Contributor

navis commented Jan 7, 2016

I love this one. 👍


You will need:

* Java 7 or better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/better/higher

@gianm
Copy link
Contributor

gianm commented Jan 7, 2016

@fjy could you gzip examples/quickstart/wikiticker-2015-09-12-sampled.json? There's not much reason to have it there as a text file

@fjy
Copy link
Contributor Author

fjy commented Jan 7, 2016

@himanshug @navis @gianm @pjain1 added clustering docs. More changes to come.


## Tune Druid Brokers

Druid Brokers also benefit greatly from being tuned to the hardware it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"they run on"?

@fjy
Copy link
Contributor Author

fjy commented Feb 3, 2016

@himanshug addressed comments


```bash
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o $zookeeper-3.4.6.tar.gz
tar xzf $zookeeper-3.4.6.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why there's a $ in front of zookeeper-3.4.6.tar.gz on this line and the one before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should be removed

java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/* io.druid.cli.Main server coordinator
java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/* io.druid.cli.Main server overlord
java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/* io.druid.cli.Main server middleManager
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess most people trying this will know to put these each in background or run each in a different window or whatever, but it's tempting to cut/paste this whole thing to execute...

@fjy
Copy link
Contributor Author

fjy commented Feb 3, 2016

@rasahner addressed comments

@pjain1
Copy link
Member

pjain1 commented Feb 3, 2016

👍

We recommend this kind of architecture if you need real-time analytics but *also* need 100% fidelity
for historical data. All streaming ingestion methods currently supported by Druid do introduce the
possibility of dropped or duplicated messages in certain failure scenarios, and batch re-ingestion
eliminates this potential source of error for historical data. This also gives you the option to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part of the "also" isn't really an "also" - necessary re-ingestion because of possible errors is exactly what has been being discussed. I'd replace both sentences with something like
"Hybrid streaming also gives you the option to re-ingest your data if you needed to revise it for any reason."

@rasahner
Copy link
Contributor

rasahner commented Feb 3, 2016

+1 when author thinks it is ready.

- [Streams-based tutorial](tutorial-streams.html) showing you how to push data over HTTP.
- [Kafka-based tutorial](tutorial-kafka.html) showing you how to load data from Kafka.

## Hybrid batch/streaming
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my comments were confusing. Here's my recommended text for this whole section. I think it's not necessary to say anything right here about queries not caring how the data was ingested - it potentially adds more confusion than it takes away.

You can combine batch and streaming methods in a hybrid batch/streaming architecture. In a hybrid architecture, you use a streaming method to do initial ingestion, and then periodically re-ingest older data in batch mode (typically every few hours, or nightly). When Druid re-ingests data for a time range, the new data automatically replaces the data from the earlier ingestion.

All streaming ingestion methods currently supported by Druid do introduce the possibility of dropped or duplicated messages in certain failure scenarios, and batch re-ingestion eliminates this potential source of error for historical data.

Batch re-ingestion also gives you the option to re-ingest your data if you needed to revise it for any reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@rasahner
Copy link
Contributor

rasahner commented Feb 3, 2016

I have no other comments.

@fjy fjy force-pushed the new-tutorials branch 2 times, most recently from f82e1c7 to 067bfda Compare February 4, 2016 01:49
fjy added a commit that referenced this pull request Feb 4, 2016
Completely rework the Druid getting started process
@fjy fjy merged commit 7abad74 into master Feb 4, 2016
@fjy fjy deleted the new-tutorials branch February 4, 2016 18:43
@fjy fjy mentioned this pull request Feb 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants