SQL firehose error #9359

saulfrank · 2020-02-13T18:12:44Z

SQL firehose error with org.apache.druid.segment.realtime.firehose.SqlFirehoseFactory cannot be cast to org.apache.druid.data.input.FiniteFirehoseFactory

Affected Version

0.17.0

Description

Ran this command (spec below):
bin/post-index-task --file postgresql-test.json --url http://localhost:8081

Got this error:

2020-02-13T17:52:53,628 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in NOT_STARTED.
java.lang.ClassCastException: org.apache.druid.segment.realtime.firehose.SqlFirehoseFactory cannot be cast to org.apache.druid.data.input.FiniteFirehoseFactory
	at org.apache.druid.indexing.common.task.IndexTask$IndexIOConfig.getNonNullInputSource(IndexTask.java:1148) ~[druid-indexing-service-0.17.0.jar:0.17.0]
	at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:477) [druid-indexing-service-0.17.0.jar:0.17.0]
	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:138) [druid-indexing-service-0.17.0.jar:0.17.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

All db connections were correct and networking was all working OK.

Using this spec

{
  "type": "index_parallel",
  "spec": {
    "dataSchema": {
      "dataSource": "some_datasource",
      "parser": {
        "parseSpec": {
          "format": "timeAndDims",
          "dimensionsSpec": {
            "dimensionExclusions": [],
            "dimensions": [
        "dim1",
        "dim2"
            ]
          },
          "timestampSpec": {
            "format": "auto",
            "column": "ts"
          }
        }
      },
      "metricsSpec": [],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": {
          "type": "none"
        },
        "rollup": false,
        "intervals": null
      },
      "transformSpec": {
        "filter": null,
        "transforms": []
      }
    },
    "ioConfig": {
      "type": "index_parallel",
      "firehose": {
        "type": "sql",
        "database": {
          "type": "postgresql",
          "connectorConfig": {
            "connectURI": "jdbc:postgresql://<location>:5432/db",
            "user": "user",
            "password": "password"
          }
        },
        "sqls": [
          "SELECT * FROM some_table"
        ]
      }
    },
    "tuningconfig": {
      "type": "index_parallel"
    }
  }
}

The text was updated successfully, but these errors were encountered:

fjy · 2020-02-13T18:45:27Z

The SQL firehose is community contributed and we recommend that you don't use it in any real workload.

saulfrank · 2020-02-13T18:52:04Z

@fjy We do a small batch upload daily and I wouldn't consider it "real workload". I think to dump the data out to csv (where date, number types are lost), move to storage/kafka and then ingest feels a bit overkill to create and manage IMO. Especially when you have a few data streams. I was looking at using Airflow, so many steps. I think connectivity to databases would make Druid far more useful/accessible in general.

fjy · 2020-02-13T18:53:03Z

@saulfrank absolutely agree. In fact, we very much plan to rework the SQL firehouse to be a bit more production ready

jihoonson · 2020-02-13T19:09:57Z

Hi @saulfrank, the sql firehose with one sql statement is currently processed by one worker task. Would you try out with the “index” task instead?

saulfrank · 2020-02-13T19:11:15Z

@jihoonson tried that too and it gave the same error message.

jihoonson · 2020-02-13T19:18:22Z

@saulfrank 😢 thanks for trying it out.

saulfrank added the Uncategorized problem report label Feb 13, 2020

jihoonson added Bug and removed Uncategorized problem report labels Feb 13, 2020

a2l007 mentioned this issue Feb 14, 2020

Fix compatibility issues with SqlFirehose #9365

Merged

2 tasks

jihoonson closed this as completed in #9365 Feb 15, 2020

MoushmiDas mentioned this issue Feb 27, 2020

Error while Load the dat from Postgresql to Druid through Spec #9430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL firehose error #9359

SQL firehose error #9359

saulfrank commented Feb 13, 2020

fjy commented Feb 13, 2020

saulfrank commented Feb 13, 2020

fjy commented Feb 13, 2020

jihoonson commented Feb 13, 2020

saulfrank commented Feb 13, 2020

jihoonson commented Feb 13, 2020

SQL firehose error #9359

SQL firehose error #9359

Comments

saulfrank commented Feb 13, 2020

Affected Version

Description

fjy commented Feb 13, 2020

saulfrank commented Feb 13, 2020

fjy commented Feb 13, 2020

jihoonson commented Feb 13, 2020

saulfrank commented Feb 13, 2020

jihoonson commented Feb 13, 2020