Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add tutorial with examples of sql null handling #16185

Merged
merged 17 commits into from Apr 1, 2024

Conversation

techdocsmith
Copy link
Contributor

Description

  • Adds a tutorial to demonstrate the basics of null handling in Apache Druid.
  • Updates the description of Null values to prioritize the default behavior
  • Calls out the description of the legacy mode and adds an "info box" that it is going away

This PR has:

  • [ x] been self-reviewed.
  • [x ] been tested in a test Druid cluster.

Before starting this tutorial, download and run Apache Druid on your local machine as described in
the [Local quickstart](index.md).

The tutorial assumes you are familiar with using the [Query view](./tutorial-sql-query-view.md) to ingest and query data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we need to call out that this tutorial expects druid.generic.useDefaultValueForNull and druid.generic.useThreeValueLogicForNativeFilters to be set to their default values of false and true respectively or if that would be confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment without detail since we don't want folks changing it if they're just getting started.

runtime property controls Druid's NULL handling mode. For the most SQL compliant behavior, maintain the default value of `false`.

There is some performance impact for null handling. see [segment internals](../design/segments.md#handling-null-values) for more information.
For examples of null handling, see the [null handling tutorial](../tutorials/tutorial-ansi-sql-null.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link seems wrong, filename is tutorial-sql-null.md

docs/querying/sql-data-types.md Outdated Show resolved Hide resolved
docs/querying/sql-data-types.md Outdated Show resolved Hide resolved
docs/querying/sql-data-types.md Outdated Show resolved Hide resolved
docs/querying/sql-data-types.md Outdated Show resolved Hide resolved
For example, the following expressions are equivalent:
- col IS NULL
- col = ''
Both evaluate to true if col contains an empty string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs white spaces
image


## Load data with null values

The tutorial loads some data with null values for string and numeric columns as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tutorial doesn't load data for them. Rather, they need to load dat for the tutorial, right?

SELECT * FROM "null_example"
```

|`__time`|`title`|`string_value`|`numeric_value`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table has empty columns:
image


Druid returns 2 for "another_value" and the empty string "". The null value is not counted.

Note that the null value is included in COUNT(*) but not as a count of the values in the column as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code font for COUNT(*) since it's a specific instance of COUNT and not COUNT as a function in general

GROUP BY 1
```

Druid returns the following data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your results above odn't have an intro sentence like this, and a subsequent intro sentence is slightly different, but that's not a huge deal


The resulting data set only includes two rows. Druid has filtered out example 1 (`some_value`) and example 4 (`null`):

|`__time`|`title`|`string_value`|`numeric_value`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty columns:
image

techdocsmith and others added 5 commits March 26, 2024 12:10
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Copy link
Contributor

@317brian 317brian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@317brian 317brian changed the title add tutorial with examples of sql null handling docs: add tutorial with examples of sql null handling Apr 1, 2024
@317brian 317brian merged commit 1aa6808 into apache:master Apr 1, 2024
12 checks passed
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants