-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[Flink-37703][UnitTest]The testRecoverFromIntermWithoutAdditionalState test failed of azure cron connector pipeline #26633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ARN's File Creation to Manage Disk Space and Network Load in Labeled YARN Nodes
…NK on YARN's File Creation to Manage Disk Space and Network Load in Labeled YARN Nodes
…cron connector pipeline
|
@lsyldliu is this pr OK? |
lsyldliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liangyu-1 Thanks for your contribution, I left some comments.
| recoverables.put(INIT_EMPTY_PERSIST, stream.persist()); | ||
| } catch (IOException e) { | ||
| System.err.println("Unable to open file for writing " + path.toString()); | ||
| throw e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apr 28 12:19:16 java.io.IOException: All datanodes [DatanodeInfoWithStorage[127.0.0.1:46278,DS-26d47d25-42de-4eef-a409-8a700a8bc82a,DISK]] are bad. Aborting...
Apr 28 12:19:16 at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1537)
Apr 28 12:19:16 at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1472)
Apr 28 12:19:16 at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1244)
Apr 28 12:19:16 at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:663)
Based on the original error message, we need to locate why the HDFS DataNode node is broken, can throwing an exception on the client side find the root cause? I don't know much about HDFS, so I'm not sure. Do we need to turn on the logging on the Server side when we pull up the HDFS cluster and observe the behavior on the Server side?
| try { | ||
| stream.write(testData1.getBytes(StandardCharsets.UTF_8)); | ||
| } catch (IOException e) { | ||
| System.err.println("Initial write failed: " + e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use LOG.info to print the message?
| recoverables.put(INIT_EMPTY_PERSIST, stream.persist()); | ||
|
|
||
| stream.write(testData1.getBytes(StandardCharsets.UTF_8)); | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should simplify the print log logic, only print it the catch block as following:
// This is just for locate the root cause:
// https://issues.apache.org/jira/browse/FLINK-37703
// After the fix, this logic should be reverted.
int branch = 0;
try {
branch++;
stream = initWriter.open(path);
branch++;
recoverables.put(INIT_EMPTY_PERSIST, stream.persist());
branch++;
stream.write(testData1.getBytes(StandardCharsets.UTF_8));
branch++;
recoverables.put(INTERM_WITH_STATE_PERSIST, stream.persist());
branch++;
recoverables.put(INTERM_WITH_NO_ADDITIONAL_STATE_PERSIST, stream.persist());
// and write some more data
branch++;
stream.write(testData2.getBytes(StandardCharsets.UTF_8));
branch++;
recoverables.put(FINAL_WITH_EXTRA_STATE, stream.persist());
} catch (IOException e) {
LOG.info(
"The exception branch was: {}, detail exception msg: {}",
branch,
e.getMessage());
throw e;
} finally {
|
|
||
| recoveredStream.write(testData3.getBytes(StandardCharsets.UTF_8)); | ||
| recoveredStream.closeForCommit().commit(); | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| */ | ||
| public abstract class AbstractRecoverableWriterTest { | ||
|
|
||
| private static final Logger Log = LoggerFactory.getLogger(AbstractRecoverableWriterTest.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| private static final Logger Log = LoggerFactory.getLogger(AbstractRecoverableWriterTest.class); | |
| private static final Logger LOG = LoggerFactory.getLogger(AbstractRecoverableWriterTest.class); |

What is the purpose of the change
This PR is to find out what makes the hadoop-fs UT unstable.
Brief change log
AbstractRecoverableWriterTest.javaVerifying this change
Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation