Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Realtime Decoder #8658

Merged
merged 13 commits into from
May 16, 2022
Merged

CSV Realtime Decoder #8658

merged 13 commits into from
May 16, 2022

Conversation

suddendust
Copy link
Contributor

@suddendust suddendust commented May 7, 2022

@codecov-commenter
Copy link

codecov-commenter commented May 7, 2022

Codecov Report

Merging #8658 (22a0b01) into master (4e14101) will increase coverage by 36.76%.
The diff coverage is 13.70%.

❗ Current head 22a0b01 differs from pull request most recent head 568a68d. Consider uploading reports for the commit 568a68d to get more accurate results

@@              Coverage Diff              @@
##             master    #8658       +/-   ##
=============================================
+ Coverage     29.36%   66.13%   +36.76%     
- Complexity        0     4385     +4385     
=============================================
  Files          1691     1277      -414     
  Lines         89308    64445    -24863     
  Branches      13530    10018     -3512     
=============================================
+ Hits          26225    42620    +16395     
+ Misses        60671    18806    -41865     
- Partials       2412     3019      +607     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 66.13% <13.70%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...common/config/provider/AccessControlUserCache.java 0.00% <0.00%> (ø)
...ache/pinot/common/metadata/ZKMetadataProvider.java 14.52% <0.00%> (-64.67%) ⬇️
...not/common/metadata/segment/SegmentZKMetadata.java 53.74% <0.00%> (-28.58%) ⬇️
...a/org/apache/pinot/common/metrics/BrokerMeter.java 93.02% <ø> (-6.98%) ⬇️
...t/common/response/broker/BrokerResponseNative.java 92.48% <ø> (+6.15%) ⬆️
...ava/org/apache/pinot/common/utils/BcryptUtils.java 0.00% <0.00%> (ø)
.../main/java/org/apache/pinot/common/utils/Pair.java 60.00% <ø> (ø)
...mon/utils/config/AccessControlUserConfigUtils.java 0.00% <0.00%> (ø)
...g/apache/pinot/common/utils/helix/HelixHelper.java 13.30% <0.00%> (-37.50%) ⬇️
...pache/pinot/common/utils/request/RequestUtils.java 81.57% <ø> (+49.60%) ⬆️
... and 1584 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e14101...568a68d. Read the comment docs.


public class CSVMessageDecoder implements StreamMessageDecoder<byte[]> {

private static final String CONFIG_FILE_FORMAT = "csv.fileFmt";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the properties is under the context of the decoder, we may remove the csv. prefix for simplicity and consistency with other decoders. Also, suggest using the full name for the key, e.g. fileFormat, header, delimiter etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. My thought process was that

...decoder.props.csv.delimiter

is more idiomatic than:

...decoder.props.delimiter

, as the former tells the users that this property configures the delimiter of the CSV record. But I think this prefix is not really needed as the user already configures the decoder class in the config, so he would already know what these props are about even without the prefix. Thanks!

format = CSVFormat.TDF;
break;
default:
format = CSVFormat.DEFAULT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log a warning here stating that we cannot recognize the format, and fall back to the default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

String incomingRecord = "Alice;18;F";
GenericRow destination = new GenericRow();
messageDecoder.decode(incomingRecord.getBytes(StandardCharsets.UTF_8), destination);
Assert.assertNotNull(destination.getValue("name"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We may static import these Assert methods in tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@suddendust
Copy link
Contributor Author

@Jackie-Jiang Requesting re-review, thanks :)

@suddendust
Copy link
Contributor Author

@Jackie-Jiang @npawar Just a reminder, thanks :)

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor comments. Thanks for contributing the feature!

format = CSVFormat.TDF;
break;
default:
LOGGER.info("Could not recognise the configured CSV file format: {}, falling back to DEFAULT format",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log warning for unrecognized format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh missed it, done.

if (csvFormat == null) {
format = CSVFormat.DEFAULT;
} else {
switch (csvFormat.toUpperCase()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a case for the "DEFAULT"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
format = format.withHeader(StringUtils.split(csvHeader, csvDelimiter));
}
if (props.containsKey(CONFIG_COMMENT_MARKER)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Follow the same way aw csvDelimiter to reduce one map lookup (get and check if it is null), same for other configs

format = format.withHeader(StringUtils.split(csvHeader, csvDelimiter));
}
if (props.containsKey(CONFIG_COMMENT_MARKER)) {
Character commentMarker = props.get(CONFIG_COMMENT_MARKER).charAt(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) no need to boxing, same for other configs

Copy link
Contributor Author

@suddendust suddendust May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined all of the configs.

@Jackie-Jiang Jackie-Jiang merged commit 7548b67 into apache:master May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants