Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-6680: Nifi PutKudu processor - Convert record field names to low… #3748

Closed

Conversation

kjmccarthy
Copy link
Contributor

@kjmccarthy kjmccarthy commented Sep 18, 2019

Thank you for submitting a contribution to Apache NiFi.

Please provide a short description of the PR here:

Description of PR

Adds additional attribute that specifies whether or not to convert columns in the Kudu schema to lowercase when retrieving the column's index

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not squash or use --force when pushing to allow for clean monitoring of changes.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • Have you verified that the full build is successful on both JDK 8 and JDK 11?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@@ -103,6 +103,7 @@ private void setUpTestRunner(TestRunner testRunner) throws InitializationExcepti
testRunner.setProperty(PutKudu.KUDU_MASTERS, DEFAULT_MASTERS);
testRunner.setProperty(PutKudu.SKIP_HEAD_LINE, SKIP_HEAD_LINE);
testRunner.setProperty(PutKudu.IGNORE_NULL, "true");
testRunner.setProperty(PutKudu.LOWERCASE_FIELD_NAMES, "true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the current tests are not capable of verifying the correct behaviour regarding this change.
Wether these settings are omitted or or changed to "false" makes no difference.

This setProperty could be removed (a bit misleading as it doesn't really do anything in the current state) and a couple targeted tests could be added, something like these:

    @Test
    public void testBuildPartialRowWithLowercaseKuduColumn() {
        String kuduColumn = "name";
        final String recordField = "NAME";

        boolean lowercaseFields = true;

        testBuildPartialRow(kuduColumn, recordField, lowercaseFields);
    }

    @Test
    public void testBuildPartialRowWithUppercaseKuduColumn() {
        String kuduColumn = "NAME";
        final String recordField = "NAME";

        boolean lowercaseFields = false;

        testBuildPartialRow(kuduColumn, recordField, lowercaseFields);
    }

    private void testBuildPartialRow(String kuduColumn, String recordField, boolean lowercaseFields) {
        // GIVEN
        final Schema kuduSchema = new Schema(Arrays.asList(
                new ColumnSchema.ColumnSchemaBuilder(kuduColumn, Type.STRING).nullable(true).build()
        ));

        final RecordSchema schema = new SimpleRecordSchema(Arrays.asList(
                new RecordField(recordField, RecordFieldType.STRING.getDataType())
        ));

        Map<String, Object> values = new HashMap<>() {{
            put(recordField, "foo");
        }};

        PartialRow row = kuduSchema.newPartialRow();
        
        // WHEN
        processor.buildPartialRow(
                kuduSchema,
                row,
                new MapRecord(schema, values),
                schema.getFieldNames(),
                true,
                lowercaseFields
        );

        // THEN
        assertEquals("foo", row.getString(kuduColumn));
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tpalfy, I will add a test to verify the correct behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpalfy, I've pushed additional unit tests to ensure behavior is as expected.

@@ -412,6 +413,7 @@ private void buildPartialRow(Long id, String name, Short age) {
kuduSchema.newPartialRow(),
new MapRecord(schema, values),
schema.getFieldNames(),
true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting lowercaseFields to true is a bit misleading. At the moment this only works because

  1. All the fieldNames in the test are already in lower case.
  2. There are no assertions that the buildPartialRow actually did set the values correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a unit test with uppercase schema.

@kjmccarthy kjmccarthy force-pushed the lowercase_kudu_column_names branch 2 times, most recently from f0fcd47 to 0e07e11 Compare September 20, 2019 21:28
for (String colName : fieldNames) {
String recordFieldName = colName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's kinda minor, but I think this way it would be really less confusing and more consistent:

...
        for (String recordFieldName : fieldNames) {
            String colName = recordFieldName;
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -366,54 +368,219 @@ public void testUpdateFlowFiles() throws Exception {

@Test
public void testBuildRow() {
buildPartialRow((long) 1, "foo", (short) 10);
final Schema kuduSchema = new Schema(Arrays.asList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite a few code duplication here. Not sure why even the existing abstraction was replaced with code duplication.

Also it might be better for tests to be as thin and focused as possible so that it's easier to understand what and how is being tested and to know what went wrong when the test fails.

To test different types we indeed need a more complex schema.
But to test only the lowercase functionality, a simply schema with a single field would be enough (unless the the logic depends on the type of the field).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will revert back to the abstracted method and reduce the test size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since logic doesn't depend on type, I added a parameter in the buildPartialRow method to allow you to modify the name of the id kudu column and id record field and modified this argument for unit tests.

Copy link
Contributor

@tpalfy tpalfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1
(I'm still reluctant of

public class TestPutKudu {
...
    private void setUpTestRunner(TestRunner testRunner) throws InitializationException {
...
        testRunner.setProperty(PutKudu.LOWERCASE_FIELD_NAMES, "true");

as it sets the property to a non-default value while not really having any effect, but I guess it's not that big of a deal.)

@kjmccarthy
Copy link
Contributor Author

@SandishKumarHN, Would you be able to merge these changes, or should I tag someone else?

@SandishKumarHN
Copy link
Contributor

@SandishKumarHN, Would you be able to merge these changes, or should I tag someone else?

@kjmccarthy after build pass. I think @tpalfy should be able to do it.

@kjmccarthy
Copy link
Contributor Author

@tpalfy would you be able to merge this?

@asfgit asfgit closed this in eb366c8 Nov 7, 2019
patricker pushed a commit to patricker/nifi that referenced this pull request Jan 22, 2020
…record field names to lowercase

Signed-off-by: Joe Witt <joewitt@apache.org>
ekovacs pushed a commit to ekovacs/nifi-apache that referenced this pull request Feb 24, 2020
…record field names to lowercase

Signed-off-by: Joe Witt <joewitt@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants