Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The map implementation no longer converts empty characters, after the update from 0.20.0 to 0.20.1 #10859

Closed
odajun opened this issue Feb 5, 2021 · 4 comments · Fixed by #10869

Comments

@odajun
Copy link

odajun commented Feb 5, 2021

The map implementation no longer converts empty characters, after the update from 0.20.0 to 0.20.1

Affected Version

0.20.1

Description

In our environment, the results of the following queries changed before and after the upgrade.

{
  "queryType": "topN",
  "dataSource": "data_source_name",
  "intervals":  ["2018-01-07T15:00:00.000Z/2018-01-14T15:00:00.000Z"],
  "granularity": "all",
  "aggregations": [{"type": "longSum", "name": "value", "fieldName": "field_name"}],
  "dimension": {
    "type": "extraction", "dimension": "dimension", "outputName": "output_name", "outputType": "STRING",
    "extractionFn": {
      "type": "lookup",
      "lookup": {
        "type": "map",
        "map": { "", "(empty set)"},
        "isOneToOne": false
      },
      "retainMissingValue": true,
      "replaceMissingValueWith": null
    }
  },
  "threshold": 3,
  "metric": "value",
  "segments": []
}

expected result

output_name value
(empty set) 10000
aaa 9000
bbb 8000

actual result

output_name value
null 10000
aaa 9000
bbb 8000

We think the cause is known to be due to the following update.
ae4b192#diff-a6471b0e75ec5f881cceafb6186%5B%E2%80%A6%5De04e6fe4e8a1fca21674bf9a5R64-R112

@suneet-s
Copy link
Contributor

suneet-s commented Feb 5, 2021

For the person looking in to this - it appears there was a similar problem raised with this type of extraction function in #4301 - I didn't dig enough to know if they're related, but I thought I'd just link them.

It would probably be good to add an integration test to the query tests as part of this bug fix.

@Gabriel39
Copy link

You are right this is caused by ae4b192#diff-a6471b0e75ec5f881cceafb6186%5B%E2%80%A6%5De04e6fe4e8a1fca21674bf9a5R64-R112

It seems like annotated org.apache.druid.query.extraction.MapLookupExtractor is a subclass of com.fasterxml.jackson.databind.introspect.AnnotatedClass and expression ac instanceof AnnotatedParameter in org.apache.druid.guice.GuiceAnnotationIntrospector is always false in this case.

@abhishekagarwal87
Copy link
Contributor

@odajun Thanks for filing this bug. can you answer few questions

  • what value is the property druid.generic.useDefaultValueForNull set to?
  • are you setting this property via common.runtime.properties or via a system property?
  • can you share settings in common.runtime.properties?

@jihoonson jihoonson added this to the 0.21.0 milestone Feb 8, 2021
@odajun
Copy link
Author

odajun commented Feb 9, 2021

@abhishekagarwal87 Thanks for checking.

what value is the property druid.generic.useDefaultValueForNull set to?
are you setting this property via common.runtime.properties or via a system property?

We don't set this property.

can you share settings in common.runtime.properties?

druid.extensions.loadList=["druid-datasketches", "druid-lookups-cached-global", "mysql-metadata-storage", "druid-hdfs-storage", "druid-kafka-indexing-service"]
druid.startup.logging.logProperties=true
druid.zk.service.host=hostA,hostB,hostC,hostD,hostE
druid.zk.paths.base=/druid

druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql:mysql_uri:port_num/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=pass

druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator

druid.storage.type=hdfs
druid.storage.storageDirectory=storage_hdfs_path

druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=indexer_log_hdfs_path
druid.indexer.logs.kill.enabled=true
druid.indexer.logs.kill.durationToRetain=86400000
druid.indexer.logs.kill.delay=3600000

druid.hadoop.security.kerberos.principal=principal_name
druid.hadoop.security.kerberos.keytab=keytab_file

druid.indexing.doubleStorage=double
druid.server.hiddenProperties=["druid.metadata.storage.connector.password"]
druid.lookup.enableLookupSyncOnStartup=false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants