Skip to content

Updating segment map function for QueryDataSource to ensure group by …#14112

Merged
clintropolis merged 3 commits intoapache:masterfrom
somu-imply:outer_join_groupby
Apr 20, 2023
Merged

Updating segment map function for QueryDataSource to ensure group by …#14112
clintropolis merged 3 commits intoapache:masterfrom
somu-imply:outer_join_groupby

Conversation

@somu-imply
Copy link
Contributor

@somu-imply somu-imply commented Apr 18, 2023

…of group by of join data source gets into proper segment map function path

A query such as

{
    "queryType": "groupBy",
    "dimensions": ["related_object_id", "r.category_id"],
    "dataSource":
    {
        "type": "query",
        "query":
        {
            "queryType": "groupBy",
            "dimensions": ["related_object_id","r.category_id"],
            "intervals": "2020-03-20/2020-03-21",
            "dataSource":
            {
                "type": "join",
                "left":
                {
                    "type": "inline",
                    "name": "DRUID_INLINE_DATASOURCE",
                    "columnNames":
                    [
                        "__time",
                        "related_object_id"
                    ],
                    "rows":
                    [
                        [1584662400000,"3229"],
                        [1584662400000,"4510"]
                    ]
                },
                "right":
                {
                    "type": "inline",
                    "columnNames":
                    ["related_object_id","category_id"],
                    "rows":
                    [
                        ["3229", 2985],
                        ["4510", 2985]
                    ]
                },
                "rightPrefix": "r.",
                "condition": "related_object_id == \"r.related_object_id\"",
                "joinType": "LEFT"
            },
            "granularity": "day",
            "filter":
            {
                "type": "in",
                "dimension": "related_object_id",
                "values":
                [
                    "3229",
                    "4510"
                ]
            }
        }
    },
    "intervals": "2020-03-20/2020-03-21",
    "granularity": "all",
}

was producing nulls as the Segment map function of Join was not called as the Inner Query Data source's segment map function was returning an identity function. Changed the query data source's segment map function which should delegate to the segment map function of the subquery's datasource. The test now passes.

Fixes #14088

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

…of group by of join data source gets into proper segment map function path
@somu-imply somu-imply marked this pull request as ready for review April 18, 2023 22:54
@somu-imply
Copy link
Contributor Author

somu-imply commented Apr 19, 2023

Although I added the test, the change in processing might not be covered and there might be a lack of coverage. But the issue is fixed with this. The screenshot is attached.

Screen Shot 2023-04-19 at 1 39 39 PM

),
"j0.",
equalsCondition(
makeColumnExpression("v0"),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
"j0.",
equalsCondition(
makeColumnExpression("v0"),
makeColumnExpression("j0.dim1")

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
@imply-cheddar
Copy link
Contributor

If you additionally add a test to the QueryDataSourceTest that calls the segment MapFn method and validates that it returns the thing that its child returned (you should be able to do this relatively easily by passing in a child Query object that you can control), then you can get coverage for the lines as well.

Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as this is good for now. But please add the test to QueryDataSourceTest so that coverage robot is happy too.

@clintropolis clintropolis merged commit 8d60edc into apache:master Apr 20, 2023
clintropolis pushed a commit to clintropolis/druid that referenced this pull request Apr 20, 2023
apache#14112)

* Updating segment map function for QueryDataSource to ensure group by of group by of join data source gets into proper segment map function path

* Adding unit tests for the failed case

* There you go coverage bot, be happy now
@clintropolis clintropolis added this to the 26.0 milestone Apr 20, 2023
clintropolis added a commit that referenced this pull request Apr 21, 2023
#14112) (#14134)

* Updating segment map function for QueryDataSource to ensure group by of group by of join data source gets into proper segment map function path

* Adding unit tests for the failed case

* There you go coverage bot, be happy now

Co-authored-by: Soumyava <93540295+somu-imply@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data from join not pushed to the outer groupby

3 participants