-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-23000] Add log bundler to collect and compress logs #16166
base: master
Are you sure you want to change the base?
Conversation
triggerBundle() { | ||
var res = this.httpClient.get(`${BASE_URL}/logbundler?action=trigger`).pipe(catchError(() => { | ||
return EMPTY | ||
})); | ||
// TODO this seems to be needed to trigger the call? | ||
res.toPromise().then(() => { | ||
console.log("fulfilled") | ||
}) | ||
return res; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Airblader What do I have to do to do a "fire and forget" HTTP call here?
Could you also take a look at the other front-end related changes? Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at the rest, but to answer the question quickly:
this.http.get(...).subscribe();
Observables are lazy, you need to subscribe to them for it to fire. The toPromise hack worked because promises are eager and thus the conversion triggers it.
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 436ccf1 (Sat Aug 28 13:07:42 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
Open TODOs:
|
@zentol Could you take a look at this PR? |
Are there any requirements w.r.t to the log configuration for this feature to work? |
The logs need to be accessible in the Flink UI, since its using the same mechanism for fetching the logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've exclusively reviewed the frontend changes)
@@ -49,6 +49,9 @@ <h1>Apache Flink Dashboard</h1></a> | |||
<span><i nz-icon type="upload"></i><span>Submit New Job</span></span> | |||
</li> | |||
</ul> | |||
<nz-affix *ngIf="!collapsed" style="position: absolute; bottom: 15px; left:15px" id="affix-container-target"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline styling is a bad practice, the minimum should be to use a CSS class instead and move the styles into app.component.less
.
Absolute positioning is kind of an art form. It can be the fitting solution in some cases, but most of the time it's an escape hatch that just causes problems (for example when collapsing the menu, as you noticed), because it takes the element entirely out of the document flow. If the intent is simply to have this on the bottom of the menu, an easier solution would be
margin: auto 15px 15px 15px;
I haven't verified this here, but it should work (we use the same thing in Ververica Platform).
(The final nit would be that I'm guessing the value of 15px was more or less chosen at random, AntD actually uses a different progression for spacing, so technically 12px or 16px would be in line with the design language.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But let's maybe address this once we have agreement where to place the download button in general
Sounds good. If it does stay here, I'd also not hard-oppose the absolute positioning, but alternatively we could also do this in pairing.
*/ | ||
|
||
export interface LogsBundlerStatus { | ||
status: string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines should both end with ;
.
@@ -83,7 +84,7 @@ export function AppInitServiceFactory( | |||
|
|||
@NgModule({ | |||
declarations: [AppComponent], | |||
imports: [BrowserModule, AppRoutingModule, NgZorroAntdModule, FormsModule, HttpClientModule, BrowserAnimationsModule], | |||
imports: [BrowserModule, AppRoutingModule, NgZorroAntdModule, FormsModule, HttpClientModule, BrowserAnimationsModule, ShareModule], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be improperly formatted, both the import and this line here. If the tooling doesn't complain then I guess we have no tooling for the UI for this, in which case feel free to disregard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the formatting, but no tooling complains here. Would be nice to use an auto formatter for the project, in particular for noobies like me.
triggerBundle() { | ||
var res = this.httpClient.get(`${BASE_URL}/logbundler?action=trigger`).pipe(catchError(() => { | ||
return EMPTY | ||
})); | ||
// TODO this seems to be needed to trigger the call? | ||
res.toPromise().then(() => { | ||
console.log("fulfilled") | ||
}) | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triggerBundle() { | |
var res = this.httpClient.get(`${BASE_URL}/logbundler?action=trigger`).pipe(catchError(() => { | |
return EMPTY | |
})); | |
// TODO this seems to be needed to trigger the call? | |
res.toPromise().then(() => { | |
console.log("fulfilled") | |
}) | |
return res; | |
} | |
triggerBundle() { | |
return this.httpClient.get(`${BASE_URL}/logbundler?action=trigger`).pipe( | |
catchError(() => EMPTY) | |
); | |
} |
The subscribe
should not actually happen here, but rather at the caller site anyway. Otherwise this is very surprising behavior. Either you fire-and-forget, meaning you don't return anything, or you return the observable and let the caller manage the subscription. Since this is a service which acts as a facade to the API, I would always opt for returning the observable either way so the caller can choose whether and what to do with the response.
--> | ||
|
||
<button nz-button nzType="primary" (click)="requestArchive()">Request New Log Archive</button> | ||
<button nz-button nzType="primary" [hidden]="hideDownloadButton" (click)="downloadArchive()" style="margin-top:5px">Download Log Archive</button> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use inline CSS for the styling (and use 4px?), also in the lines below.
*/ | ||
|
||
export interface LogsBundlerStatus { | ||
status: string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming the possible values for status
are known, so we could be more precise than string
here:
status: "PROCESSING" | "BUNDLE_READY" | "… possible other ones…";
With string literal types we get completion and better type-safety for where you compare the status against these values.
this.hideSpinner = true; | ||
this.hideDownloadButton = true; | ||
|
||
if(status.status == "PROCESSING") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In JavaScript, using ==
is a deadly sin these days, and we use ===
instead. :-)
this.hideSpinner = true; | ||
this.hideDownloadButton = true; | ||
|
||
if(status.status == "PROCESSING") { | ||
this.hideSpinner = false; | ||
} | ||
if (status.status == "BUNDLE_READY") { | ||
this.hideDownloadButton = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.hideSpinner = true; | |
this.hideDownloadButton = true; | |
if(status.status == "PROCESSING") { | |
this.hideSpinner = false; | |
} | |
if (status.status == "BUNDLE_READY") { | |
this.hideDownloadButton = false; | |
} | |
this.hideSpinner = (status.status !== "PROCESSING"); | |
this.hideDownloadButton = (status.status !== "BUNDLE_READY"); |
?
this.hideSpinner = false; | ||
this.hideDownloadButton = true; | ||
} | ||
downloadArchive() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think this method belongs into the service since it builds the specific path of the API, which is generally one of the things you want to encapsulate in those services so that components don't need to be aware of the API.
*/ | ||
triggerBundle() { | ||
var res = this.httpClient.get(`${BASE_URL}/logbundler?action=trigger`).pipe(catchError(() => { | ||
return EMPTY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mapping errors to EMPTY
means this observable will emit nothing (and just complete) in a failure case. I'll have to read on to see if this is problematic.
Edit from my future self: It means that if the API starts failing, the UI will simply not update anymore. Probably not what we'd want, so a nicer solution might be to map to a "fake" response, or let the error propagate and handle it at the caller site.
This is better than doing nothing at all, at least, since it prevents the error from clogging up the console. But it would leave the user in the dark. I'll leave it to you whether this is something we want to address or not given that this is probably the status quo of how things are done in Flink UI and thus this doesn't make it worse.
On a very high level, putting this feature (and not just a button, but two + spinner + message) into the navigation menu feels a bit cluttered to me, personally. It makes this a super important thing users will be looking at at all times when they're in the UI. I'd personally probably put this into one of the pages rather than the side menu, but I might be unaware of considerations, discussions and decisions to place it there. |
changeDetection: ChangeDetectionStrategy.OnPush | ||
}) | ||
export class LogsBundlerComponent implements OnInit{ | ||
destroy$ = new Subject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this isn't actually ever "fed", which means this component currently leaks subscriptions. You need this component to implement OnDestroy
and then add
ngOnDestroy() {
this.destroy$.next();
this.destroy.complete();
}
Thanks a lot for your extensive feedback, it was very helpful!
I'm also not fully convinced that the current location makes a lot of sense. The other option would be the "Overview" page. |
*/ | ||
|
||
export interface LogsBundlerStatus { | ||
status: "IDLE" | "PROCESSING" | "BUNDLE_READY" | "BUNDLE_FAILED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing ;
still
this.hideDownloadButton = (status.status !== "BUNDLE_READY"); | ||
this.cdr.markForCheck(); | ||
}, error => { | ||
this.message = "Error while fetching status: " + error.message; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're going to want to this.cdr.markForCheck();
here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true true, need to pay more attention 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UI changes look good to me (% formatting, but this is a job for tooling)
internalEntryName = getNextEntryName(internalEntryName); | ||
} | ||
entryNames.add(internalEntryName); | ||
// ArchiveEntry entry = archiveOutputStream.createArchiveEntry(file, internalEntryName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out code?
Thanks a lot for your approval. I would like to revisit the location and prominence of the download mechanism. Instead of having it in the sidebar, how about this (cc @knaufk) |
+1 to that proposal. I think the side menu is both too prominent and also not the place for such actions at all. |
@rmetzger It looks like Konstantin agrees with the proposal (not sure you saw it). |
Yes, I saw his reaction. I'll pick up this PR again after my vacation. |
@zentol Could you review this PR? |
// copy with limit: the input log file could have additional data, which would confuse | ||
// the archiver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was indeed sloppy. I added a more extensive one to the method:
/**
* Special copy method which defines a "reading window" on the input stream. The
* "TarArchiveOutputStream" expects an TarArchiveEntry with a defined size. If the log file is
* growing while the entry is added to the output stream, it will throw an exception. This
* method ensures that only "limit" bytes will be written from the "input" to the "output"
* stream.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only case where the file can grow are the JM logs, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK then the limit makes sense.
But it does make me wonder how well this approach works when the log files roll over; the contents may suddenly change, as does the file size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(How well does the log transmission from the TM handle rollovers?)
entry.setSize(size); | ||
entry.setMode(TarArchiveEntry.DEFAULT_FILE_MODE); | ||
entry.setModTime(modTime); | ||
entry.setUserName(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment that is done because it defaults to the user.name
system property.
What does the userName actually do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, setting this explicitly is not needed: https://github.com/apache/commons-compress/blob/rel/1.20/src/main/java/org/apache/commons/compress/archivers/tar/TarArchiveEntry.java#L298
} | ||
|
||
@Test | ||
public void testCopy_limited2() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this test for?
|
||
// status of the bundler | ||
@GuardedBy("statusLock") | ||
private volatile Status status = Status.IDLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this handler implemented with a custom state machine, instead of as async handler like the savepoint handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I wasn't aware of the async handler. I'll try to use it.
this.message = "No tmp directory available for log bundler"; | ||
log.error(message); | ||
this.tmpDir = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to handle this case. If no tmpDir was configured the cluster won't even start up.
} | ||
synchronized (statusLock) { | ||
final String action = queryParams.get(0); | ||
if ("download".equals(action)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move download/trigger into an enum that is also used in the headers.
private void addTaskManagerLogFile(TaskManagerLogAndId logFile) { | ||
checkNotNull(logArchiver, "Assuming log archiver to be set"); | ||
try { | ||
logArchiver.addArchiveEntry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should run in a separate executor to ensure the REST API remains responsive.
return resourceManagerGateway | ||
.requestTaskManagerFileUploadByType( | ||
taskManager.getResourceId(), FileType.LOG, rpcTimeout) | ||
.thenApplyAsync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is an async operation such that subsequent steps run in the executor? If so, then I'd prefer making this explicitly when we setup the addTaskManagerLogFile
call.
} | ||
|
||
private void collectAndCompressLogs() { | ||
synchronized (statusLock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This locking kind of voids the benefit of using an executor, because as long as this operation is running any request to the handler results in a blocked thread.
extends AbstractHandler<RestfulGateway, EmptyRequestBody, LogBundlerMessageParameters> { | ||
|
||
// set high timeout for uploading files to the blob store | ||
private final Time rpcTimeout = Time.minutes(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be based in some form on the configured rpc timeout.
FYI: I was hoping to find some time to address the last outstanding comments before my parental leave, but I won't be able to. |
@rmetzger Do you still plan to complete this feature? |
I like the feature, but I currently lack the time to complete it. Afaik there's some substantial effort required to address all remaining PR comments. |
What is the purpose of the change
Users often struggle to provide all relevant logs. By having a button in the Web UI that collects the logs of all TaskManagers and the JobManager will make things easier for the community supporting users.
Screenshots:
Verifying this change
Documentation