Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][Yarn] The access to RM 8032 was denied #3566

Closed
2 of 3 tasks
zhilinli123 opened this issue Feb 19, 2024 · 1 comment
Closed
2 of 3 tasks

[Bug][Yarn] The access to RM 8032 was denied #3566

zhilinli123 opened this issue Feb 19, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@zhilinli123
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

Java Version

1.8

Scala Version

2.11.x

StreamPark Version

current dev

Flink Version

1.13.5

deploy mode

yarn-perjob

What happened


------------------------------------------------------------------
Effective submit configuration: {restart-strategy.failure-rate.max-failures-per-interval=3, env.java.opts="-Dfile.encoding=UTF-8", jobmanager.rpc.address=localhost, metrics.reporter.influxdb.password=tianyancha, yarn.application.type=Apache Flink, high-availability.zookeeper.path.root=/flink, state.checkpoint-storage=filesystem, high-availability.storageDir=hdfs:///flink/recovery, metrics.reporter.influxdb.connectTimeout=60000, parallelism.default=1, pipeline.classpaths=[], restart-strategy.failure-rate.failure-rate-interval=10 min, historyserver.archive.fs.dir=hdfs:///flink/completed-jobs/, taskmanager.memory.process.size=1024mb, execution.checkpointing.mode=EXACTLY_ONCE, execution.checkpointing.tolerable-failed-checkpoints=3, pipeline.name=TrDWDCompanyBaseAnnualReportSocialSecurityDetailsV5_error_test, metrics.reporter.influxdb.username=admin, yarn.tags=streampark, historyserver.archive.fs.refresh-interval=20000, jobmanager.rpc.port=6123, taskmanager.memory.preallocate=false, execution.checkpointing.interval=5 s, execution.checkpointing.timeout=10 min, metrics.reporter.influxdb.port=8086, metrics.reporter.influxdb.retentionPolicy=flink_retention, high-availability.zookeeper.quorum=ip1:2181,ip-89:2181,ip2:2181,ip3:2181,ip3:2181, $internal.pipeline.job-id=dba5285bc6b1eaee8dba9ffb38834870, state.backend=hashmap, execution.checkpointing.max-concurrent-checkpoints=1, $internal.deployment.config-dir=/home/work/streampark/flink-1.13.5-streampark/conf, historyserver.web.address=ip-108, state.checkpoints.num-retained=3, historyserver.web.port=8082, metrics.reporter.influxdb.interval=60 SECONDS, classloader.check-leaked-classloader=false, metrics.reporter.influxdb.host=ip-96, jobmanager.execution.failover-strategy=region, state.savepoints.dir=hdfs:///flink/savepoints, metrics.reporter.influxdb.db=flink, execution.savepoint.ignore-unclaimed-state=false, $internal.application.program-args=[--conf, one_data/company_base/company_base_annual_report_social_security_details.properties], yarn.application-attempts=3, taskmanager.numberOfTaskSlots=1, yarn.application.name=TrDWDCompanyBaseAnnualReportSocialSecurityDetailsV5_error_test, $internal.application.main=com.tyc.darwin.transform.JobStart, jobmanager.archive.fs.dir=hdfs:///flink/completed-jobs/, restart-strategy.failure-rate.delay=1 min, classloader.resolve-order=child-first, metrics.reporter.influxdb.scheme=http, execution.target=yarn-per-job, jobmanager.memory.process.size=1024mb, yarn.application.submit.user=zhaojie, execution.attached=true, metrics.reporter.influxdb.writeTimeout=60000, taskmanager.memory.managed.size=0m, high-availability=NONE, execution.checkpointing.externalized-checkpoint-retention=RETAIN_ON_CANCELLATION, execution.shutdown-on-attached-exit=true, pipeline.jars=[file:/home/work/workspace_prod/workspace/100004/streampark-flinkjob_TrDWDCompanyBaseAnnualReportSocialSecurityDetailsV5_error_test.jar], metrics.reporter.influxdb.consistency=ANY, execution.checkpointing.min-pause=5 s, restart-strategy=failure-rate, metrics.reporter.influxdb.factory.class=org.apache.flink.metrics.influxdb.InfluxdbReporterFactory, state.checkpoints.dir=hdfs:///flink/checkpoints}
------------------------------------------------------------------

2024-02-19 18:03:50 | WARN  | streampark-flink-app-bootstrap-0 | org.apache.flink.yarn.configuration.YarnLogConfigUtil:73] The configuration directory ('/home/work/streampark/flink-1.13.5-streampark/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.yarn.YarnClusterDescriptor:202] No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils:330] The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils:330] The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils:330] The derived from fraction network memory (57.600mb (60397978 bytes)) is less than its min value 64.000mb (67108864 bytes), min value will be used instead
18:03:50.587 [streampark-flink-app-bootstrap-0] INFO org.apache.streampark.flink.client.impl.YarnPerJobClient - [StreamPark] 
------------------------<<specification>>-------------------------
ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, slotsPerTaskManager=1}
------------------------------------------------------------------

sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@7b2cb54f
2024-02-19 18:03:50 | ERROR | streampark-flink-app-bootstrap-0 | com.tyc.tethys.common.utils.PropertiesUtil:108] load Properties failed, property file name: flink.properties
2024-02-19 18:03:50 | ERROR | streampark-flink-app-bootstrap-0 | com.tyc.tethys.common.utils.PropertiesUtil:108] load Properties failed, property file name: kafka-source.properties
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.api.java.typeutils.TypeExtractor:1994] class com.tyc.tethys.common.models.MultiRow does not contain a setter for field value
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.api.java.typeutils.TypeExtractor:2037] Class class com.tyc.tethys.common.models.MultiRow cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.api.java.typeutils.TypeExtractor:1991] class com.tyc.tethys.common.models.OneRow does not contain a getter for field metaData
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.api.java.typeutils.TypeExtractor:1994] class com.tyc.tethys.common.models.OneRow does not contain a setter for field metaData
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.api.java.typeutils.TypeExtractor:2037] Class class com.tyc.tethys.common.models.OneRow cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.api.java.typeutils.TypeExtractor:1991] class com.tyc.tethys.common.models.OneRow does not contain a getter for field metaData
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.api.java.typeutils.TypeExtractor:1994] class com.tyc.tethys.common.models.OneRow does not contain a setter for field metaData
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.api.java.typeutils.TypeExtractor:2037] Class class com.tyc.tethys.common.models.OneRow cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.
2024-02-19 18:03:50 | ERROR | streampark-flink-app-bootstrap-0 | com.tyc.tethys.common.utils.PropertiesUtil:108] load Properties failed, property file name: mysql-sink.properties
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:137] Sender retryInterval is 5000
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:138] Sender retryQueueLength is 100
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:139] Sender maxRetries is 0
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:140] Sender connectName is mysql.rds465.company_base.prod
2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:141] Sender compatExpression is false
18:03:50.743 [streampark-flink-app-bootstrap-0] INFO org.apache.streampark.flink.client.impl.YarnPerJobClient - [StreamPark] 
-------------------------<<applicationId>>------------------------
jobGraph getJobID: 471df45049f4f572399d8d9064ad276a
__________________________________________________________________

2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.yarn.YarnClusterDescriptor:582] Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, slotsPerTaskManager=1}
2024-02-19 18:03:50 | WARN  | streampark-flink-app-bootstrap-0 | org.apache.flink.core.plugin.PluginConfig:69] The plugins directory [plugins] does not exist.
2024-02-19 18:03:52 | WARN  | streampark-flink-app-bootstrap-0 | org.apache.flink.core.plugin.PluginConfig:69] The plugins directory [plugins] does not exist.
2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils:330] The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.flink.yarn.YarnClusterDescriptor:1177] Submitting application master application_1683968765756_7675
2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag., while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 21. Trying to failover immediately.
2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 22
2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 1 failover attempts. Trying to failover after sleeping for 44837ms.
2024-02-19 18:04:00 | WARN  | XNIO-1 task-1 | com.zaxxer.hikari.pool.PoolBase:184] HikariPool-1 - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@7adaa88e (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
2024-02-19 18:04:40 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 21
2024-02-19 18:04:40 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag., while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 2 failover attempts. Trying to failover after sleeping for 30371ms.
2024-02-19 18:05:11 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 22
2024-02-19 18:05:11 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 3 failover attempts. Trying to failover after sleeping for 26618ms.
2024-02-19 18:05:37 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 21
2024-02-19 18:05:37 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag., while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 4 failover attempts. Trying to failover after sleeping for 44556ms.
2024-02-19 18:06:00 | WARN  | XNIO-1 task-5 | com.zaxxer.hikari.pool.PoolBase:184] HikariPool-1 - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@714da0da (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
2024-02-19 18:06:22 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 22
2024-02-19 18:06:22 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 5 failover attempts. Trying to failover after sleeping for 29735ms.
2024-02-19 18:06:52 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 21
2024-02-19 18:06:52 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag., while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 6 failover attempts. Trying to failover after sleeping for 24812ms.
2024-02-19 18:07:00 | WARN  | XNIO-1 task-2 | com.zaxxer.hikari.pool.PoolBase:184] HikariPool-1 - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@4ddb038a (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
2024-02-19 18:07:17 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 22
2024-02-19 18:07:17 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 7 failover attempts. Trying to failover after sleeping for 40249ms.
2024-02-19 18:07:57 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 21
2024-02-19 18:07:57 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag., while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 8 failover attempts. Trying to failover after sleeping for 16279ms.
2024-02-19 18:08:13 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 22
2024-02-19 18:08:13 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 9 failover attempts. Trying to failover after sleeping for 39926ms.
2024-02-19 18:08:53 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 21
2024-02-19 18:08:53 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag., while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 10 failover attempts. Trying to failover after sleeping for 29811ms.
2024-02-19 18:09:23 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing over to 22
2024-02-19 18:09:23 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 11 failover attempts. Trying to failover after sleeping for 39440ms.

Error Exception

2024-02-19 18:05:11 | INFO  | streampark-flink-app-bootstrap-0 | org.apache.hadoop.io.retry.RetryInvocationHandler:411] java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 3 failover attempts. Trying to failover after sleeping for 26618ms.

Screenshots

image The cloud services used by Huawei include cluster, etc. Now RM is master1 and master2 respectively. Now requests are sent to master2 itself and port 8032 in master2 does not exist

Are you willing to submit PR?

  • Yes I am willing to submit a PR!(您是否要贡献这个PR?)

Code of Conduct

@zhilinli123 zhilinli123 added the bug Something isn't working label Feb 19, 2024
@zhilinli123
Copy link
Contributor Author

image There is no problem with current port access

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant