Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Spark doris connector cannot read data from doris table via temporary view #36429

Open
2 of 3 tasks
wuzg2000 opened this issue Jun 18, 2024 · 5 comments
Open
2 of 3 tasks

Comments

@wuzg2000
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris : 2.1.3 & 2.1.4-rc02
spark doris connector:1.3.2
spark:3.4.3

What's Wrong?

Successfully created a temporary view in spark-sql, which refers to a doris table, but read from the view failed, write data is ok.
Logs repeated occurs in be.out:

I20240617 15:57:55.426945 2202688 thrift_client.cpp:72] (Attempt 1 of 1)
I20240617 15:57:56.783890 2202683 mem_info.cpp:455] Refresh cgroup memory win, refresh again after 10s, cgroup mem limit: 9223372036854710272, cgroup mem usage: 1454833664, cgroup mem info cached: 0
I20240617 15:57:57.527108 2202688 client_cache.h:174] Failed to get client from cache: [THRIFT_RPC_ERROR]Couldn't open transport for :0 (Could not resolve host for client socket.)

    0#  doris::ThriftClientImpl::open()
    1#  doris::ThriftClientImpl::open_with_retry(int, int)
    2#  doris::ClientCacheHelper::_create_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, int)
    3#  doris::ClientCacheHelper::get_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, int)
    4#  doris::ClientConnection<doris::FrontendServiceClient>::ClientConnection(doris::ClientCache<doris::FrontendServiceClient>*, doris::TNetworkAddress const&, int, doris::Status*, int)
    5#  doris::RuntimeQueryStatiticsMgr::report_runtime_query_statistics()
    6#  doris::Daemon::report_runtime_query_statistics_thread()
    7#  doris::Thread::supervise_thread(void*)
    8#  ?
    9#  ?

, retrying[2]...
W20240617 15:57:57.527325 2202688 doris_main.cpp:123] thrift internal message: TSocket::open() getaddrinfo() <Host: Port: 0>Name or service not known
W20240617 15:57:57.527401 2202688 status.h:412] meet error status: [THRIFT_RPC_ERROR]Couldn't open transport for :0 (Could not resolve host for client socket.)

    0#  doris::ThriftClientImpl::open()
    1#  doris::ThriftClientImpl::open_with_retry(int, int)
    2#  doris::ClientCacheHelper::_create_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, int)
    3#  doris::ClientCacheHelper::get_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, int)
    4#  doris::ClientConnection<doris::FrontendServiceClient>::ClientConnection(doris::ClientCache<doris::FrontendServiceClient>*, doris::TNetworkAddress const&, int, doris::Status*, int)
    5#  doris::RuntimeQueryStatiticsMgr::report_runtime_query_statistics()
    6#  doris::Daemon::report_runtime_query_statistics_thread()
    7#  doris::Thread::supervise_thread(void*)
    8#  ?
    9#  ?

I20240617 15:57:57.527413 2202688 thrift_client.cpp:67] Unable to connect to :0

What You Expected?

Read data from temporary view successfull.

How to Reproduce?

  1. Create a temporary view in spark-sql:
    spark-sql (default)> create temporary view pdm_org_organization1
    > using doris options (
    > "table.identifier" = "zfdsp_pdm.pdm_org_organization1",
    > "fenodes" = "node01:18030",
    > "user" = "dsp",
    > "password" = "*********"
    > );
    Response code
    Time taken: 4.441 seconds

  2. Read data from the view, failed with errors:
    spark-sql (default)> select * from pdm_org_organization1;
    08:29:57.287 [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.doris.spark.backend.BackendClient - Connect to doris Doris BE{host='node01', port=9060} failed.
    08:31:27.441 [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)
    org.apache.doris.spark.exception.ConnectedFailedException: Connect to Doris BE{host='node01', port=9060}failed.
    at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:195) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.spark.rdd.ScalaValueReader.$anonfun$hasNext$2(ScalaValueReader.scala:207) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.spark.rdd.ScalaValueReader.org$apache$doris$spark$rdd$ScalaValueReader$$lockClient(ScalaValueReader.scala:239) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.spark.rdd.ScalaValueReader.hasNext(ScalaValueReader.scala:207) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.spark.rdd.AbstractDorisRDDIterator.hasNext(AbstractDorisRDDIterator.scala:56) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) ~[scala-library-2.12.17.jar:?]
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?]
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) ~[spark-sql_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) ~[spark-sql_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.scheduler.Task.run(Task.scala:139) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.3.jar:3.4.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_391]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_391]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_391]
    Caused by: org.apache.doris.shaded.org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
    at org.apache.doris.shaded.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:179) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.sdk.thrift.TDorisExternalService$Client.recvGetNext(TDorisExternalService.java:92) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.sdk.thrift.TDorisExternalService$Client.getNext(TDorisExternalService.java:79) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:172) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    ... 23 more
    Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_391]
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_391]
    at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_391]
    at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_391]
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_391]
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_391]
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_391]
    at org.apache.doris.shaded.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:177) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.sdk.thrift.TDorisExternalService$Client.recvGetNext(TDorisExternalService.java:92) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.sdk.thrift.TDorisExternalService$Client.getNext(TDorisExternalService.java:79) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:172) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2]
    ... 23 more

Anything Else?

The error logs continue to appears even the spark-sql session terminated!

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@wuzg2000
Copy link
Author

不确定是doris还是doris spark connector的问题,先提到doris这里了

@SeaSand1024
Copy link

Please assign it to me

@jnfJNF
Copy link

jnfJNF commented Jun 20, 2024

不确定是doris还是doris spark Connector的问题,先提到doris这里了

请问解决了吗?

@wuzg2000
Copy link
Author

wuzg2000 commented Jul 4, 2024

补充:问题测试平台是arm64,x86 64平台下正常,今天升级doris到2.1.4正式版后,be.out不再有错误日志,但spark-sql查询报错不变,还是报be连接超时

@JNSimba
Copy link
Member

JNSimba commented Jul 4, 2024

It may be related to the compatibility of the arm environment. When Spark reads and gets stuck, you can type a pstack on the corresponding be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants