New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/#5390 segment indexing reload status api #5718
Feature/#5390 segment indexing reload status api #5718
Conversation
- added a new API endpoint for users to query segment reload status API - Table metadata from Server - added a new endpoint to fetch segment metadata - added helper classes and methods to fetch metadata from the server Tests - added test to server API to fetch metadata including indexing information
6914c32
to
422c76a
Compare
- Moved status classes to logical places Logs - Added logging statements Tests - Added unit tests for Pinot Controller reload status and segment metadata API - Added unit tests for Pinot Server reload status and segment metadata API License Headers - Add license headers to files added to this feature
422c76a
to
7e31c11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadocs to all new classes and to any public methods introduced
...ler/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentRestletResource.java
Outdated
Show resolved
Hide resolved
...ler/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentRestletResource.java
Outdated
Show resolved
Hide resolved
...ler/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentRestletResource.java
Outdated
Show resolved
Hide resolved
...ler/src/main/java/org/apache/pinot/controller/api/resources/ServerSegmentMetadataReader.java
Outdated
Show resolved
Hide resolved
...ler/src/main/java/org/apache/pinot/controller/api/resources/ServerSegmentMetadataReader.java
Outdated
Show resolved
Hide resolved
...ler/src/main/java/org/apache/pinot/controller/api/resources/ServerSegmentMetadataReader.java
Outdated
Show resolved
Hide resolved
...ler/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentRestletResource.java
Outdated
Show resolved
Hide resolved
...ler/src/main/java/org/apache/pinot/controller/api/resources/ServerSegmentMetadataReader.java
Outdated
Show resolved
Hide resolved
...-server/src/main/java/org/apache/pinot/server/api/resources/SegmentColumnIndexesFetcher.java
Outdated
Show resolved
Hide resolved
...-server/src/main/java/org/apache/pinot/server/api/resources/SegmentColumnIndexesFetcher.java
Outdated
Show resolved
Hide resolved
...-server/src/main/java/org/apache/pinot/server/api/resources/SegmentColumnIndexesFetcher.java
Outdated
Show resolved
Hide resolved
pinot-server/src/test/java/org/apache/pinot/server/api/TablesResourceTest.java
Outdated
Show resolved
Hide resolved
- Updating code as per PR review comments
Removing SegmentMetadataFetcher as it seemed redundant Refactoring code to save failed segment reload status API calls as part of response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of code duplication with the existing table size reader. Please find ways to use a common base class if possible
...ler/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentRestletResource.java
Outdated
Show resolved
Hide resolved
tableReloadStatus = getSegmentsReloadStatus(tableNameWithType); | ||
} catch (InvalidConfigException e) { | ||
throw new ControllerApplicationException(LOGGER, | ||
"Failed to load segment reload status for table: " + tableName, Status.NOT_FOUND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are returning 404 (NOT_FOUND) then please do not use "Failed" in the exception message. Since the exception is invalid config, determine what is invalid and throw that exception, may be as 400 (BAD_REQUEST)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in latest commit.
"Failed to load segment reload status for table: " + tableName, Status.NOT_FOUND); | ||
} | ||
if (Objects.isNull(tableReloadStatus)) | ||
throw new ControllerApplicationException(LOGGER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception message reads as if the table is not found. That is not the case, right?
@ApiParam(value = "Name of the table", required = true) @PathParam("tableName") String tableName, | ||
@ApiParam(value = "OFFLINE|REALTIME") @QueryParam("type") String tableTypeStr) { | ||
List<String> tableNamesWithType = getExistingTableNamesWithType(tableName, Constants.validateTableType(tableTypeStr)); | ||
Map<String, TableMetadataReader.TableReloadStatus> reloadStatusMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no tables found, then this is the place to throw 404
|
||
private Map<String, String> getSegmentsMetadataFromServer(String tableNameWithType) | ||
throws InvalidConfigException, IOException { | ||
LOGGER.trace("Inside getSegmentsMetadataFromServer() entry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these trace logs please
|
||
import java.util.Objects; | ||
|
||
public class SegmentStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document each member in this object clearly, what it contains in various situations
...ler/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentRestletResource.java
Outdated
Show resolved
Hide resolved
@ApiOperation(value = "Get the server metadata for all table segments", notes = "Get the server metadata for all table segments") | ||
public Map<String, String> getServerMetadata(@ApiParam(value = "Name of the table", required = true) @PathParam("tableName") String tableName, | ||
@ApiParam(value = "OFFLINE|REALTIME") @QueryParam("type") String tableTypeStr) { | ||
LOGGER.info("Received a request to fetch metadata for all segments for table {}", tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a debug level log.
throw new ControllerApplicationException(LOGGER, e.getMessage(), Status.BAD_REQUEST); | ||
} catch (IOException ioe) { | ||
throw new ControllerApplicationException(LOGGER, | ||
"Error parsing Pinot server response: " + ioe.getMessage(), Status.INTERNAL_SERVER_ERROR, ioe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indicate the server name here that caused the error (unless that is logged elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, is logged in the helper class
@GET | ||
@Path("segments/{tableName}/reload-status") | ||
@Produces(MediaType.APPLICATION_JSON) | ||
@ApiOperation(value = "Status of segment reload", notes = "Status of segment reload") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly suggest adding a verbosity level and/or a limit here. Can be added later if you wish. Imagine a table with a million segments. Do we really want to kill the servers trying to query all the segments? Or, output them only to let the client time out?
An example could be: limit=100 by default, verbosity=5. A level of 4, 3, 2,1 will show less information for each segment. Maybe 0 will only show how many segments that are online/offline etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Makes sense to add limit. I did think about this, but then, the issue will be knowing the status of the remaining segments. For a table with say, 1000 segments, how do we let the user know of the status of the rest of the segments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an API to get status for a range of segments, maybe? Or, add some sort of start/limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are adding one to get the status of one segment at a time, then the user can (if needed) iterate over the segments and get each segment. Let us evaluate the use case first. Are we talking about a full table reload or a segment reload? If full table reload, maybe we only want to return those segments that DID NOT reload properly?
The API definition leaves much discussion to be desired, and a PR is NOT the place to discuss API. If you have a design doc, we will discuss there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be done in another PR. Lets get this in and add the optimizations as we need them. Million segments in a table is not a common use case.
Initially, the thought was to have status for a single segment reload. But
then it was observed that usually, users request for a full segment reload.
So, I added the API to show reload status for all segments.
I do have a design doc
<https://docs.google.com/document/d/1E_J7PxF9WtaE6ido__u0O-Emtyu0IghQicC-1eLZeRo/edit#>.
Sure, let's discuss there. :)
…-Guru
On Fri, Jul 31, 2020 at 5:25 PM Subbu Subramaniam ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentRestletResource.java
<#5718 (comment)>
:
> @@ -485,4 +503,80 @@ private void deleteSegmentsInternal(String tableNameWithType, List<String> segme
throw new ControllerApplicationException(LOGGER, e.getMessage(), Response.Status.FORBIDDEN);
}
}
+
+ @get
+ @path("segments/{tableName}/reload-status")
+ @produces(MediaType.APPLICATION_JSON)
+ @ApiOperation(value = "Status of segment reload", notes = "Status of segment reload")
If you are adding one to get the status of one segment at a time, then the
user can (if needed) iterate over the segments and get each segment. Let us
evaluate the use case first. Are we talking about a full table reload or a
segment reload? If full table reload, maybe we only want to return those
segments that DID NOT reload properly?
The API definition leaves much discussion to be desired, and a PR is NOT
the place to discuss API. If you have a design doc, we will discuss there.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5718 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFCTWTQNNHNVC63IBMT3DDR6NOHNANCNFSM4PBWT4GA>
.
|
is this ready to go? |
There were a few concerns raised by Subbu on the number of API calls that
would be made to a server in the case where there are a large number of
segments.
Wanted to discuss further on what to do next for this feature. What do you
suggest?
…On Sun, Aug 9, 2020 at 7:58 AM Kishore Gopalakrishna < ***@***.***> wrote:
is this ready to go?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5718 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFCTWVTNNAWM3UHNLGNRV3R722SLANCNFSM4PBWT4GA>
.
|
Pinot codestyle corrections Moving ServerSegmentMetadataReader to util
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well
Is this addressed? |
Oh! Somehow missed this comment. I think it got lost in between. Let me make this change and commit again. Sorry about that! |
…xing-reload-status-api' into feature/apache#5390-segment-indexing-reload-status-api # Conflicts: # pinot-controller/src/main/java/org/apache/pinot/controller/util/CompletionServiceHelper.java
|
||
import java.util.Objects; | ||
|
||
public class SegmentStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all comments have been addressed. Please justify the use of the same class to return values to the user. It makes upgrades bad. Add json ignore case so that the pain is at least reduced a bit.
serverToSegmentSizeInfoListMap.put(streamResponse.getKey(), tableSizeInfo.segments); | ||
} catch (IOException e) { | ||
failedParses++; | ||
LOGGER.error("Unable to parse server response due to an error: ", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the server name to this log
} else { | ||
LOGGER.info("Finish reading segment sizes for table: {}", tableNameWithType); | ||
if (failedParses != 0) { | ||
LOGGER.warn("Failed to parse {} segment size info responses from server.", failedParses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGGER.warn("Failed to parse {} segment size info responses from server.", failedParses); | |
LOGGER.warn("Failed to parse segment size info responses from {} servers.", failedParses); |
If possible, add the total number of servers to this message as well
int numServersResponded = completionServiceResponse._httpResponses.size(); | ||
if (numServersResponded != serverURLs.size()) { | ||
LOGGER.warn("Finish reading information for table: {} with {}/{} server responses", tableNameWithType, | ||
numServersResponded, serverURLs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numServersResponded, serverURLs); | |
numServersResponded, serverURLs.size()); |
segmentsMetadata.add(JsonUtils.objectToString(segmentMetadata)); | ||
} catch (IOException e) { | ||
failedParses++; | ||
LOGGER.error("Unable to parse server response due to an error: ", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add server name in the log
segmentsStatus.add(segmentStatus); | ||
} catch (IOException e) { | ||
failedParses++; | ||
LOGGER.error("Unable to parse server response due to an error: ", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add server name
} | ||
} | ||
if (failedParses != 0) { | ||
LOGGER.warn("Failed to parse {} segment load status responses from server.", failedParses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGGER.warn("Failed to parse {} segment load status responses from server.", failedParses); | |
LOGGER.warn("Failed to parse segment load status responses from {} servers.", failedParses); |
...t-controller/src/main/java/org/apache/pinot/controller/util/ServerSegmentMetadataReader.java
Outdated
Show resolved
Hide resolved
public String _segmentName; | ||
// The last segment reload time in ISO date format (yyyy-MM-dd HH:mm:ss:SSS UTC) | ||
// If the segment reload failed for a segment, then the value will be the previous segment reload was successful | ||
public String _segmentReloadTimeUTC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a String and not long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The date is in string format: ISO date format (yyyy-MM-dd HH:mm:ss:SSS UTC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the API to return long instead of String
...ler/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentRestletResource.java
Outdated
Show resolved
Hide resolved
f3e67e7
to
6e03c9b
Compare
@mcvsubbu can you please review again? |
Updating variable names to reflect their value type
f26a5aa
to
ac98a41
Compare
Codecov Report
@@ Coverage Diff @@
## master #5718 +/- ##
=========================================
Coverage ? 67.15%
=========================================
Files ? 1205
Lines ? 63599
Branches ? 9741
=========================================
Hits ? 42708
Misses ? 17747
Partials ? 3144
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Code refactor to clean up lines that went beyond line length
ae2ce79
to
ec3603b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing! Very well done
Most of the changes are addressed
Description
This PR adds an APIs on the controller and one new API on the server. The purpose of the APIs is to provide segment metadata from the Pinot Server.
Following updates need to be made in the documentation:
Sample response:
Documentation
The documentation to the PR is here.