Skip to content

NIFI-10143 - ListFTP/GetFTP charset issues#6172

Closed
greyp9 wants to merge 4 commits intoapache:mainfrom
greyp9:NIFI-10143
Closed

NIFI-10143 - ListFTP/GetFTP charset issues#6172
greyp9 wants to merge 4 commits intoapache:mainfrom
greyp9:NIFI-10143

Conversation

@greyp9
Copy link
Contributor

@greyp9 greyp9 commented Jun 30, 2022

Summary

NIFI-10143

  • Additional unit test for FTP processors, using embedded mina/ftpserver.
  • Additional configuration for FTP client in the case of usage in non-UTF-8 environments.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 8
    • JDK 11
    • JDK 17

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@greyp9 greyp9 requested a review from exceptionfactory June 30, 2022 20:49
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this issue and developing a new test @greyp9! The ClientProvider changes make sense.

The new test is helpful, and covers a lot of ground. It does a lot of work, and it seems to stretch the boundary of a unit test versus an integration test. In light of having existing unit tests for FTP processors, it seems better to maintain a minimal unit test, and then provide this new test class as an integration test. As noted in the detailed comments, there are several ways the test could be optimized. Supporting an external FTP server is also useful, and could be parameterized through System properties or environment variables.

What do you think about changing the test to an integration test, an updating existing unit tests with more minimal changes to incorporate evaluation of UTF-8 encoding?

private static final int PORT = 2121;
private static final String USER = "ftpuser";
private static final String PASSWORD = "admin";
private static final File FOLDER = new File("target", TestFTPCharset.class.getSimpleName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a static directory in the target directory, it would better to use the JUnit 5 TempDir annotation to manage the lifecycle of the temporary directory.


private static final String USE_UTF8 = Boolean.TRUE.toString();
private static final String HOSTNAME = "localhost";
private static final int PORT = 2121;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use a random port to avoid theoretical conflicts in different environments.

* @throws FtpException on server startup failure
*/
@BeforeAll
static void beforeAll() throws FtpException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to name this method something like startServer, which may obviate the need for the method comments.

// setup ftp user
final PropertiesUserManagerFactory userManagerFactory = new PropertiesUserManagerFactory();
userManagerFactory.setUrl(TestFTPCharset.class.getClassLoader()
.getResource("TestFTPCharset/users.properties"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it would take a little more setup work, creating the properties file in the test class, using the parameterized username and password, would help avoid potential mismatch between class properties and file properties.

final FileSystemView view = fileSystemFactory.createFileSystemView(ftpuser);
final FtpFile workingDirectory = view.getWorkingDirectory();
final Object physicalFile = workingDirectory.getPhysicalFile();
assertTrue(physicalFile instanceof File);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit 5 supports assertInstanceOf

import static org.junit.jupiter.params.provider.Arguments.arguments;

/**
* Seed a UTF-7 folder in FTP server with files using i18n known filenames. Iterate through a set of i18n folder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Seed a UTF-7 folder in FTP server with files using i18n known filenames. Iterate through a set of i18n folder
* Seed a UTF-8 folder in FTP server with files using i18n known filenames. Iterate through a set of i18n folder

Comment on lines 67 to 71
* To test against a live FTP server:
* <ol>
* <li>replace EMBED_FTP_SERVER value with <code>false</code></li>
* <li>replace HOSTNAME, PORT, USER, PASSWORD as needed</li>
* </ol>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to provide the option for testing against a real FTP server, this could be parameterized through JUnit 5 annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added method serverParametersProvider(), which allows for override based on a system property.

throw new RuntimeException(e);
}
}
FileUtils.deleteDirectory(FOLDER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring to use JUnit 5 TempDir should remove the need for deleting the directory manually, although it might be necessary to clear the directory, depending on the approach.

Comment on lines 153 to 157
try {
Thread.sleep(100L);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InterruptedException could be declared on the method signature instead of caught and thrown again.

Comment on lines 156 to 157
final Charset charset = unicodeEnabled ? StandardCharsets.UTF_8 : Charset.defaultCharset();
client.setCharset(charset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change the current behavior, or does the internal character set of the client use Charset.defaultCharset() when not otherwise specified? This looks like the right way to go, just wanted to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After evaluating the Apache FTPClient class and parent classes, it appears that the Charset property is not used, so it seems best to avoid setting the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a speculative change. I have no evidence that it makes a difference. Removed this.

@greyp9
Copy link
Contributor Author

greyp9 commented Jul 1, 2022

Thanks for digging into this issue and developing a new test @greyp9! The ClientProvider changes make sense.

The new test is helpful, and covers a lot of ground. It does a lot of work, and it seems to stretch the boundary of a unit test versus an integration test. In light of having existing unit tests for FTP processors, it seems better to maintain a minimal unit test, and then provide this new test class as an integration test. As noted in the detailed comments, there are several ways the test could be optimized. Supporting an external FTP server is also useful, and could be parameterized through System properties or environment variables.

What do you think about changing the test to an integration test, an updating existing unit tests with more minimal changes to incorporate evaluation of UTF-8 encoding?

I've renamed to *IT to align with integration test naming convention. I'd like to defer modifications of the existing unit tests to be part of future efforts, as needed.

@greyp9 greyp9 requested a review from exceptionfactory July 11, 2022 17:18
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @greyp9, noted a couple additional issues, but this looks close to completion.

@greyp9 greyp9 requested a review from exceptionfactory July 11, 2022 19:54
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through the feedback @greyp9! The additional integration test is helpful and the latest version looks good. +1 merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants