Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL firehose error #9359

Closed
saulfrank opened this issue Feb 13, 2020 · 6 comments · Fixed by #9365
Closed

SQL firehose error #9359

saulfrank opened this issue Feb 13, 2020 · 6 comments · Fixed by #9365
Labels

Comments

@saulfrank
Copy link

SQL firehose error with org.apache.druid.segment.realtime.firehose.SqlFirehoseFactory cannot be cast to org.apache.druid.data.input.FiniteFirehoseFactory

Affected Version

0.17.0

Description

Ran this command (spec below):
bin/post-index-task --file postgresql-test.json --url http://localhost:8081

Got this error:

2020-02-13T17:52:53,628 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in NOT_STARTED.
java.lang.ClassCastException: org.apache.druid.segment.realtime.firehose.SqlFirehoseFactory cannot be cast to org.apache.druid.data.input.FiniteFirehoseFactory
	at org.apache.druid.indexing.common.task.IndexTask$IndexIOConfig.getNonNullInputSource(IndexTask.java:1148) ~[druid-indexing-service-0.17.0.jar:0.17.0]
	at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:477) [druid-indexing-service-0.17.0.jar:0.17.0]
	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:138) [druid-indexing-service-0.17.0.jar:0.17.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

All db connections were correct and networking was all working OK.

Using this spec

{
  "type": "index_parallel",
  "spec": {
    "dataSchema": {
      "dataSource": "some_datasource",
      "parser": {
        "parseSpec": {
          "format": "timeAndDims",
          "dimensionsSpec": {
            "dimensionExclusions": [],
            "dimensions": [
        "dim1",
        "dim2"
            ]
          },
          "timestampSpec": {
            "format": "auto",
            "column": "ts"
          }
        }
      },
      "metricsSpec": [],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": {
          "type": "none"
        },
        "rollup": false,
        "intervals": null
      },
      "transformSpec": {
        "filter": null,
        "transforms": []
      }
    },
    "ioConfig": {
      "type": "index_parallel",
      "firehose": {
        "type": "sql",
        "database": {
          "type": "postgresql",
          "connectorConfig": {
            "connectURI": "jdbc:postgresql://<location>:5432/db",
            "user": "user",
            "password": "password"
          }
        },
        "sqls": [
          "SELECT * FROM some_table"
        ]
      }
    },
    "tuningconfig": {
      "type": "index_parallel"
    }
  }
}
@fjy
Copy link
Contributor

fjy commented Feb 13, 2020

The SQL firehose is community contributed and we recommend that you don't use it in any real workload.

@saulfrank
Copy link
Author

@fjy We do a small batch upload daily and I wouldn't consider it "real workload". I think to dump the data out to csv (where date, number types are lost), move to storage/kafka and then ingest feels a bit overkill to create and manage IMO. Especially when you have a few data streams. I was looking at using Airflow, so many steps. I think connectivity to databases would make Druid far more useful/accessible in general.

@fjy
Copy link
Contributor

fjy commented Feb 13, 2020

@saulfrank absolutely agree. In fact, we very much plan to rework the SQL firehouse to be a bit more production ready

@jihoonson
Copy link
Contributor

Hi @saulfrank, the sql firehose with one sql statement is currently processed by one worker task. Would you try out with the “index” task instead?

@saulfrank
Copy link
Author

@jihoonson tried that too and it gave the same error message.

@jihoonson
Copy link
Contributor

@saulfrank 😢 thanks for trying it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants