Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable final stages to MSQ ingestion queries #16699

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

adarshsanjeev
Copy link
Contributor

@adarshsanjeev adarshsanjeev commented Jul 8, 2024

Description

Modifies the DataSourceMSQDestination to have a TerminalStageSpec. This spec tells the controller how the segments are created. This allows queries to configure how segments are created after the queries have finished running.
This will allow queries to add new TerminalStageSpec which allow queries to have other results instead of generating segments.

The PR also adds a SegmentGenerationStageSpec to be used by default, which causes queries to create new segments with the query results.The SegmentGenerationTerminalStageSpec for ingestion is serialized as:

{"type":"segmentGeneration"}

This PR should not have a functional impact, and is meant to be used for other features in the future.

Backward Compatibility

The changes are backward compatible. The DataSourceMSQDestination created has a default value which uses the SegmentGenerationStageSpec by default, since normally all ingestions would require a segment generation stage. If indexers/MM are upgraded first, the stage spec automatically deserializes to segment generation.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 8, 2024
@adarshsanjeev adarshsanjeev changed the title Add a segmentMorphFactory to MSQ Datasource Destination Add configurable final stages to MSQ ingestion queries Jul 17, 2024
DataSourceMSQDestination destination = (DataSourceMSQDestination) querySpec.getDestination();
TerminalStageSpec terminalStageSpec = destination.getTerminalStageSpec();
if (terminalStageSpec instanceof SegmentGenerationStageSpec) {
return ((SegmentGenerationStageSpec) terminalStageSpec).constructFinalStage(queryId, queryDef, querySpec, jsonMapper);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a interface method and we pass query def only ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can inject the jsonMapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would also need the querySpec for the tuningConfig. Passing the jsonMapper here would also require SegmentGenerationStageSpec to not be a singleton class, and instead require a jsonMapper to be injected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants