-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Description
Since my last feature request the bitwise operators have been added. These are VERY helpful and work almost perfect.
The problem now is that when we want to use them, you always have to make use of a virtual column. For example:
A simple query like:
SELECT * FROM "table" WHERE BITWISE_AND("flags", 4) = 0Converts into:
{
"queryType": "scan",
...
"virtualColumns": [
{
"type": "expression",
"name": "v0",
"expression": "bitwiseAnd(\"flags\",4)",
"outputType": "LONG"
}
],
...
"filter": {
"type": "selector",
"dimension": "v0",
"value": "0",
"extractionFn": null
},
...
}As you can see, due to the fact that we use a virtual column, we are able to use the bitwise expressions.
However, druid has some other places where records can be filtered. For example, the DruidInputSource and the Filtered Aggregator.
In these places it is not possible to use a virtual column, and thus we cannot use the bitwise expressions.
Example. Assume a table where we store the role a user can have in a "roles" field. This field is a digit, where each bit represents a role.
Thus:
1 = Content moderator
2 = Financial user
4 = Data analyst
8 = Sales
... etc
A person can have multiple roles of course. So somebody with a "roles" value of 5 is both Content moderator as data analyst.
Now, back to druid.
When I want to create a new druid dataSource and fill it with a native batch ingestion task with data from another druid dataSource, I cannot filter on the "roles" as I want to. See the "filter" part below:
{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "dataAnalystUsers",
"timestampSpec": { ... },
"dimensionsSpec": { ... },
"metricsSpec": [ ... ],
"granularitySpec": { ... }
},
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "druid",
"dataSource": "users",
"interval": "2022-02-12/P1D",
"filter": {
'type': '????',
'dimension': 'roles',
'value': 4
}
}
},
"tuningConfig": { ... }
}
}As you can see, we have no way to filter here using the bitwise expressions. My suggestion is to create native filter types for the bitwise expressions:
Maybe something like below, but this is only a idea.
{
"filter": {
"type": "bitwiseAnd",
"dimension": "<DIMENSION>",
"value": "<VALUE>",
"expected": "<VALUE>"
}
}