Skip to content

[Bug] [dataquality] Data quality - null value detection - execution error #16435

@wuchunfu

Description

@wuchunfu

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When I use PostgreSQL as the initialization database for the dolphin scheduler and run the data quality control detection task, the task reports an error indicating that the dolphin scheduler schema does not exist

What you expected to happen

[INFO] 2024-08-09 14:54:29.629 +0800 -  -> 
	24/08/09 14:54:29 INFO Client: Application report for application_1722308400032_0053 (state: RUNNING)
[INFO] 2024-08-09 14:54:30.630 +0800 -  -> 
	24/08/09 14:54:30 INFO Client: Application report for application_1722308400032_0053 (state: FINISHED)
	24/08/09 14:54:30 INFO Client: 
		 client token: N/A
		 diagnostics: User class threw exception: org.postgresql.util.PSQLException: ERROR: schema "dolphinschedulers" does not exist
	  Position: 14
		at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
		at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
		at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
		at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
		at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
		at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:329)
		at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:315)
		at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:291)
		at org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:265)
		at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:844)
		at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:95)
		at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
		at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
		at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
		at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
		at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
		at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
		at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
		at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
		at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
		at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
		at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
		at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
		at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
		at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
		at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
		at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
		at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
		at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
		at org.apache.dolphinscheduler.data.quality.flow.batch.writer.JdbcWriter.write(JdbcWriter.java:87)
		at org.apache.dolphinscheduler.data.quality.execution.SparkBatchExecution.executeWriter(SparkBatchExecution.java:132)
		at org.apache.dolphinscheduler.data.quality.execution.SparkBatchExecution.execute(SparkBatchExecution.java:58)
		at org.apache.dolphinscheduler.data.quality.context.DataQualityContext.execute(DataQualityContext.java:62)
		at org.apache.dolphinscheduler.data.quality.DataQualityApplication.main(DataQualityApplication.java:78)
		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:498)
		at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
	
		 ApplicationMaster host: 10.10.4.230
		 ApplicationMaster RPC port: 0
		 queue: default
		 start time: 1723186518789
		 final status: FAILED
		 tracking URL: http://node02:8088/proxy/application_1722308400032_0053/
		 user: default
	Exception in thread "main" org.apache.spark.SparkException: Application application_1722308400032_0053 finished with failed status
		at org.apache.spark.deploy.yarn.Client.run(Client.scala:1269)
		at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1627)
		at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
		at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
		at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
		at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
		at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	24/08/09 14:54:30 INFO ShutdownHookManager: Shutdown hook called
	24/08/09 14:54:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-71c505a9-2358-4455-b8dd-3838611055c9
	24/08/09 14:54:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-33420b51-2b30-4a48-9b4f-502a4b7976a0
[INFO] 2024-08-09 14:54:30.632 +0800 - process has exited. execute path:/tmp/dolphinscheduler/exec/process/default/14382384652384/14559278303968_3/39/46, processId:1301503 ,exitStatusCode:1 ,processWaitForStatus:true ,processExitValue:1
[INFO] 2024-08-09 14:54:30.633 +0800 - Start finding appId in /opt/dolphinscheduler/worker-server/logs/20240809/14559278303968/3/39/46.log, fetch way: log 
[INFO] 2024-08-09 14:54:30.639 +0800 - Find appId: application_1722308400032_0053 from /opt/dolphinscheduler/worker-server/logs/20240809/14559278303968/3/39/46.log
[INFO] 2024-08-09 14:54:30.640 +0800 - ***********************************************************************************************
[INFO] 2024-08-09 14:54:30.640 +0800 - *********************************  Finalize task instance  ************************************
[INFO] 2024-08-09 14:54:30.640 +0800 - ***********************************************************************************************
[INFO] 2024-08-09 14:54:30.641 +0800 - Upload output files: [] successfully
[INFO] 2024-08-09 14:54:30.657 +0800 - Send task execute status: FAILURE to master : 10.10.4.251:1234
[INFO] 2024-08-09 14:54:30.658 +0800 - Remove the current task execute context from worker cache
[INFO] 2024-08-09 14:54:30.658 +0800 - The current execute mode isn't develop mode, will clear the task execute file: /tmp/dolphinscheduler/exec/process/default/14382384652384/14559278303968_3/39/46
[INFO] 2024-08-09 14:54:30.697 +0800 - Success clear the task execute file: /tmp/dolphinscheduler/exec/process/default/14382384652384/14559278303968_3/39/46
[INFO] 2024-08-09 14:54:30.699 +0800 - FINALIZE_SESSION

How to reproduce

Use PostgreSQL as the initialization database for the Dolphin scheduler, and run the data quality control detection task to reproduce it

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions