Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/#5390 segment indexing reload status api #5718

Merged

Conversation

guruguha
Copy link
Contributor

@guruguha guruguha commented Jul 20, 2020

Description

This PR adds an APIs on the controller and one new API on the server. The purpose of the APIs is to provide segment metadata from the Pinot Server.

Following updates need to be made in the documentation:

  • Need to update documentation for querying segment metadata from the server

Sample response:

{
  "baseballStats_OFFLINE_0": {
    "segmentName": "baseballStats_OFFLINE_0",
    "schemaName": null,
    "crc": 2400783875,
    "creationTimeMillis": 1599866671163,
    "creationTimeReadable": "2020-09-11T23:24:31:163 UTC",
    "timeGranularitySec": null,
    "startTimeMillis": null,
    "startTimeReadable": null,
    "endTimeMillis": null,
    "endTimeReadable": null,
    "pushTimeMillis": -9223372036854776000,
    "pushTimeReadable": null,
    "refreshTimeMillis": -9223372036854776000,
    "refreshTimeReadable": null,
    "segmentVersion": "v3",
    "creatorName": null,
    "paddingCharacter": "\u0000",
    "columns": [],
    "indexes": {
      "homeRuns": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "$hostName": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "YES",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "playerStint": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "groundedIntoDoublePlays": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "numberOfGames": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "$segmentName": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "YES",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "AtBatting": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "stolenBases": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "tripples": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "hitsByPitch": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "teamID": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "YES",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "numberOfGamesAsBatter": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "strikeouts": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "sacrificeFlies": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "caughtStealing": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "baseOnBalls": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "playerName": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "league": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "YES",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "doules": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "$docId": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "YES",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "yearID": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "YES",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "hits": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "runsBattedIn": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "G_old": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "sacrificeHits": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "intentionalWalks": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "runs": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "NO",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      },
      "playerID": {
        "bloom-filter": "NO",
        "dictionary": "YES",
        "forward-index": "YES",
        "inverted-index": "YES",
        "null-value-vector-reader": "NO",
        "range-index": "NO"
      }
    }
  }
}

Documentation

The documentation to the PR is here.

- added a new API endpoint for users to query segment reload status

API - Table metadata from Server
- added a new endpoint to fetch segment metadata

- added helper classes and methods to fetch metadata from the server

Tests
- added test to server API to fetch metadata including indexing information
@guruguha guruguha force-pushed the feature/#5390-segment-indexing-reload-status-api branch from 6914c32 to 422c76a Compare July 21, 2020 01:50
- Moved status classes to logical places

Logs
- Added logging statements

Tests
- Added unit tests for Pinot Controller reload status and segment metadata API
- Added unit tests for Pinot Server reload status and segment metadata API

License Headers
- Add license headers to files added to this feature
@guruguha guruguha force-pushed the feature/#5390-segment-indexing-reload-status-api branch from 422c76a to 7e31c11 Compare July 21, 2020 02:44
Copy link
Contributor

@npawar npawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadocs to all new classes and to any public methods introduced

- Updating code as per PR review comments
Removing SegmentMetadataFetcher as it seemed redundant
Refactoring code to save failed segment reload status API calls as part of response
Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of code duplication with the existing table size reader. Please find ways to use a common base class if possible

tableReloadStatus = getSegmentsReloadStatus(tableNameWithType);
} catch (InvalidConfigException e) {
throw new ControllerApplicationException(LOGGER,
"Failed to load segment reload status for table: " + tableName, Status.NOT_FOUND);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are returning 404 (NOT_FOUND) then please do not use "Failed" in the exception message. Since the exception is invalid config, determine what is invalid and throw that exception, may be as 400 (BAD_REQUEST)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in latest commit.

"Failed to load segment reload status for table: " + tableName, Status.NOT_FOUND);
}
if (Objects.isNull(tableReloadStatus))
throw new ControllerApplicationException(LOGGER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception message reads as if the table is not found. That is not the case, right?

@ApiParam(value = "Name of the table", required = true) @PathParam("tableName") String tableName,
@ApiParam(value = "OFFLINE|REALTIME") @QueryParam("type") String tableTypeStr) {
List<String> tableNamesWithType = getExistingTableNamesWithType(tableName, Constants.validateTableType(tableTypeStr));
Map<String, TableMetadataReader.TableReloadStatus> reloadStatusMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no tables found, then this is the place to throw 404


private Map<String, String> getSegmentsMetadataFromServer(String tableNameWithType)
throws InvalidConfigException, IOException {
LOGGER.trace("Inside getSegmentsMetadataFromServer() entry");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these trace logs please


import java.util.Objects;

public class SegmentStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document each member in this object clearly, what it contains in various situations

@ApiOperation(value = "Get the server metadata for all table segments", notes = "Get the server metadata for all table segments")
public Map<String, String> getServerMetadata(@ApiParam(value = "Name of the table", required = true) @PathParam("tableName") String tableName,
@ApiParam(value = "OFFLINE|REALTIME") @QueryParam("type") String tableTypeStr) {
LOGGER.info("Received a request to fetch metadata for all segments for table {}", tableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a debug level log.

throw new ControllerApplicationException(LOGGER, e.getMessage(), Status.BAD_REQUEST);
} catch (IOException ioe) {
throw new ControllerApplicationException(LOGGER,
"Error parsing Pinot server response: " + ioe.getMessage(), Status.INTERNAL_SERVER_ERROR, ioe);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicate the server name here that caused the error (unless that is logged elsewhere)

Copy link
Contributor Author

@guruguha guruguha Aug 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, is logged in the helper class

@GET
@Path("segments/{tableName}/reload-status")
@Produces(MediaType.APPLICATION_JSON)
@ApiOperation(value = "Status of segment reload", notes = "Status of segment reload")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest adding a verbosity level and/or a limit here. Can be added later if you wish. Imagine a table with a million segments. Do we really want to kill the servers trying to query all the segments? Or, output them only to let the client time out?

An example could be: limit=100 by default, verbosity=5. A level of 4, 3, 2,1 will show less information for each segment. Maybe 0 will only show how many segments that are online/offline etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Makes sense to add limit. I did think about this, but then, the issue will be knowing the status of the remaining segments. For a table with say, 1000 segments, how do we let the user know of the status of the rest of the segments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an API to get status for a range of segments, maybe? Or, add some sort of start/limit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are adding one to get the status of one segment at a time, then the user can (if needed) iterate over the segments and get each segment. Let us evaluate the use case first. Are we talking about a full table reload or a segment reload? If full table reload, maybe we only want to return those segments that DID NOT reload properly?

The API definition leaves much discussion to be desired, and a PR is NOT the place to discuss API. If you have a design doc, we will discuss there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done in another PR. Lets get this in and add the optimizations as we need them. Million segments in a table is not a common use case.

@guruguha
Copy link
Contributor Author

guruguha commented Aug 1, 2020 via email

@kishoreg
Copy link
Member

kishoreg commented Aug 9, 2020

is this ready to go?

@guruguha
Copy link
Contributor Author

guruguha commented Aug 9, 2020 via email

Pinot codestyle corrections
Moving ServerSegmentMetadataReader to util
Copy link
Contributor

@npawar npawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

@kishoreg
Copy link
Member

A lot of code duplication with the existing table size reader. Please find ways to use a common base class if possible

Is this addressed?

@guruguha
Copy link
Contributor Author

A lot of code duplication with the existing table size reader. Please find ways to use a common base class if possible

Is this addressed?

Oh! Somehow missed this comment. I think it got lost in between. Let me make this change and commit again. Sorry about that!

…xing-reload-status-api' into feature/apache#5390-segment-indexing-reload-status-api

# Conflicts:
#	pinot-controller/src/main/java/org/apache/pinot/controller/util/CompletionServiceHelper.java
@kishoreg kishoreg requested a review from mcvsubbu August 17, 2020 03:17

import java.util.Objects;

public class SegmentStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all comments have been addressed. Please justify the use of the same class to return values to the user. It makes upgrades bad. Add json ignore case so that the pain is at least reduced a bit.

serverToSegmentSizeInfoListMap.put(streamResponse.getKey(), tableSizeInfo.segments);
} catch (IOException e) {
failedParses++;
LOGGER.error("Unable to parse server response due to an error: ", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the server name to this log

} else {
LOGGER.info("Finish reading segment sizes for table: {}", tableNameWithType);
if (failedParses != 0) {
LOGGER.warn("Failed to parse {} segment size info responses from server.", failedParses);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOGGER.warn("Failed to parse {} segment size info responses from server.", failedParses);
LOGGER.warn("Failed to parse segment size info responses from {} servers.", failedParses);

If possible, add the total number of servers to this message as well

int numServersResponded = completionServiceResponse._httpResponses.size();
if (numServersResponded != serverURLs.size()) {
LOGGER.warn("Finish reading information for table: {} with {}/{} server responses", tableNameWithType,
numServersResponded, serverURLs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
numServersResponded, serverURLs);
numServersResponded, serverURLs.size());

segmentsMetadata.add(JsonUtils.objectToString(segmentMetadata));
} catch (IOException e) {
failedParses++;
LOGGER.error("Unable to parse server response due to an error: ", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add server name in the log

segmentsStatus.add(segmentStatus);
} catch (IOException e) {
failedParses++;
LOGGER.error("Unable to parse server response due to an error: ", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add server name

}
}
if (failedParses != 0) {
LOGGER.warn("Failed to parse {} segment load status responses from server.", failedParses);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOGGER.warn("Failed to parse {} segment load status responses from server.", failedParses);
LOGGER.warn("Failed to parse segment load status responses from {} servers.", failedParses);

public String _segmentName;
// The last segment reload time in ISO date format (yyyy-MM-dd HH:mm:ss:SSS UTC)
// If the segment reload failed for a segment, then the value will be the previous segment reload was successful
public String _segmentReloadTimeUTC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a String and not long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date is in string format: ISO date format (yyyy-MM-dd HH:mm:ss:SSS UTC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the API to return long instead of String

@guruguha guruguha force-pushed the feature/#5390-segment-indexing-reload-status-api branch from f3e67e7 to 6e03c9b Compare August 22, 2020 15:33
@guruguha
Copy link
Contributor Author

@mcvsubbu can you please review again?

@guruguha guruguha force-pushed the feature/#5390-segment-indexing-reload-status-api branch from f26a5aa to ac98a41 Compare September 2, 2020 04:17
@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@7708341). Click here to learn what that means.
The diff coverage is 62.02%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #5718   +/-   ##
=========================================
  Coverage          ?   67.15%           
=========================================
  Files             ?     1205           
  Lines             ?    63599           
  Branches          ?     9741           
=========================================
  Hits              ?    42708           
  Misses            ?    17747           
  Partials          ?     3144           
Flag Coverage Δ
#integration 43.55% <0.63%> (?)
#unittests 58.36% <62.02%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...che/pinot/controller/util/TableMetadataReader.java 0.00% <0.00%> (ø)
...ler/api/resources/PinotSegmentRestletResource.java 14.19% <5.88%> (ø)
...che/pinot/server/api/resources/TablesResource.java 72.83% <25.00%> (ø)
...ontroller/api/resources/ServerTableSizeReader.java 83.87% <58.33%> (ø)
...t/server/api/resources/SegmentMetadataFetcher.java 70.45% <70.45%> (ø)
...t/controller/util/ServerSegmentMetadataReader.java 84.37% <84.37%> (ø)
...pinot/controller/util/CompletionServiceHelper.java 93.75% <93.75%> (ø)
...e/indexsegment/immutable/ImmutableSegmentImpl.java 74.35% <100.00%> (ø)
...pinot/core/util/IntDoubleIndexedPriorityQueue.java 85.36% <0.00%> (ø)
...java/org/apache/pinot/spi/utils/DataSizeUtils.java 88.23% <0.00%> (ø)
... and 1203 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7708341...b36d4cf. Read the comment docs.

@npawar npawar force-pushed the feature/#5390-segment-indexing-reload-status-api branch from ae2ce79 to ec3603b Compare September 12, 2020 00:32
Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! Very well done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants