Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First refactor of compaction docs #10935

Merged
merged 18 commits into from
Mar 24, 2021

Conversation

techdocsmith
Copy link
Contributor

@techdocsmith techdocsmith commented Mar 2, 2021

#10897

First pass refactor / update of compaction docs

Updates to "Data management" topic as follows:

  • Adds an introduction that describes the content in the topic.
  • Removes a duplicated section about "Schema changes" and leaves it in design/segments.md

Adds a new topic "Compaction" that defines compaction and automatic compaction as a strategy for segment optimization.

Repairs links for the refactor above.

This PR doesn't handle the remaining task of identifying reindexing and compaction as data management tasks for existing data and comparing the use cases between the two. This should come in a subsequent PR.

cc: @maytasm, @suneet-s , @loquisgon , @sthetland


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.

Copy link
Contributor

@2bethere 2bethere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this up! I think it's a much-needed improvement. Added some comments.

docs/configuration/index.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
Copy link

@sthetland sthetland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments & suggestions below.. Looks good though! It makes compaction clearer.

docs/configuration/index.md Outdated Show resolved Hide resolved
docs/configuration/index.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/index.md Outdated Show resolved Hide resolved
@suneet-s
Copy link
Contributor

suneet-s commented Mar 11, 2021

I echo others comments on this PR. This is a huge improvement - thank you @techdocsmith ! I haven't verified the correctness of how exactly compaction works, or the details of the different tuning knobs

Some overall structural feedback (doesn't need to be addressed in this PR):

  • I think the data management doc should be broken into a few separate docs. Seeing compaction pulled out of there - it feels like data management would be a good landing page - that then points you to "getting data in", "Optimizing data", "Updating data"(maybe) and "Deleting data" This is obviously beyond the scope of this PR, but I think it's worth mentioning because it adds structure around how to think about data and managing data in Druid.
  • Data management also talks about lookups, while the rest of the doc talks about datasources. This seemed a little out of place when I was reading locally. I don't have a suggestion for how to structure this right now, but wanted to surface it in case you had better ideas.
  • The compaction page currently talks about the what. I wonder if it needs to be split into 2 pages (or sections), one that spells out the "why should I care/ I want to do..." a little bit more, and another that spells out "how do I do that". Maybe it can be intertwined in the same page?
  • I really like the distinction between auto-compaction and manual compaction. However the page doesn't link to anything that tells me how to use auto-compaction, but it does link to something about manual compaction. Are there instructions for auto-compaction elsewhere?
  • There are some known differences between auto-compaction and manual compaction. Support for queryGranularity is one right now. Do you think we should call this out in the section that talks about the differences between the 2. This is tricky, because it's like a gap in functionality - but it's a gotcha I think users will want to know about.

docs/configuration/index.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/data-management.md Show resolved Hide resolved
docs/ingestion/data-management.md Show resolved Hide resolved
docs/ingestion/data-management.md Show resolved Hide resolved
docs/configuration/index.md Show resolved Hide resolved
@suneet-s
Copy link
Contributor

Docs failure looks legit

Could not find self anchor '#compaction-tuningconfig' in './build/ApacheDruid/docs/configuration/index.html'
Could not find './native_batch.md' linked from './build/ApacheDruid/docs/ingestion/compaction.html'
Could not find '../native-batch.md' linked from './build/ApacheDruid/docs/ingestion/compaction.html'
Could not find '../data-management.md' linked from './build/ApacheDruid/docs/ingestion/index.html'
Could not find '../compaction.md' linked from './build/ApacheDruid/docs/ingestion/index.html'
There are 5 issues

@techdocsmith
Copy link
Contributor Author

Apologize for that @suneet-s , I fixed links and spelling in a later commit.

docs/configuration/index.md Outdated Show resolved Hide resolved
docs/configuration/index.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Outdated Show resolved Hide resolved
docs/ingestion/compaction.md Show resolved Hide resolved
@maytasm maytasm merged commit d69533d into apache:master Mar 24, 2021
@clintropolis clintropolis changed the title First refactor of compaction First refactor of compaction docs Aug 12, 2021
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants