Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-1534: [parquet-cli] Argument error: Illegal character in opaque part at index 2 on Windows #627

Merged
merged 3 commits into from May 27, 2019

Conversation

masayuki038
Copy link
Contributor

Some commands of parquet-cli don't work on Windows.

$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main schema C:\Users\masayuki\Downloads\test.parquet                                                               Argument error: Illegal character in opaque part at index 2: C:\Users\masayuki\Downloads\test.parquet          
                                    
$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main csv-schema C:\Users\masayuki\Downloads\test.csv --record-name Test                                              
Argument error: Illegal character in opaque part at index 2: C:\Users\masayuki\Downloads\test.csv              

$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main convert-csv C:\Users\masayuki\Downlo ads\test.csv -o C:\Users\masayuki\Downloads\test-csv.parquet
Argument error: Illegal character in opaque part at index 2: C:\Users\masayuki\Downloads\test.csv

$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main convert C:\Users\masayuki\Downloads\ userdata1.avro -o C:\Users\masayuki\Downloads\userdata1.parquet
Argument error: Illegal character in opaque part at index 2: C:\Users\masayuki\Downloads\userdata1.avro

$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main to-avro C:\Users\masayuki\Downloads\ test.parquet -o C:\Users\masayuki\Downloads\test.avro
Argument error: Illegal character in opaque part at index 2: C:\Users\masayuki\Downloads\test.parquet

$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main cat C:\Users\masayuki\Downloads\test.parquet
Argument error: Illegal character in opaque part at index 2: C:\Users\masayuki\Downloads\test.parquet

$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main head -n 10 C:\Users\masayuki\Downloa ds\test.parquet
Argument error: Illegal character in opaque part at index 2: C:\Users\masayuki\Downloads\test.parquet

Calling BaseCommand#qualifiedURI with Windows file path, java.net.URI.create throws IllegalArgumentException and the command execution is aborted.

This PR fixes the issue. With this PR, parquet-cli will work all commands on Windows.

parquet-cli has no tests. I created PR to add simple tests for each command at #625. Could you review that first?

…ue part at index 2 on Windows

Calling BaseCommand#qualifiedURI with Windows file path,  java.net.URI.create throws IllegalArgumentException and the command execution is aborted.
@gszadovszky
Copy link
Contributor

Could you fix the test failures?

Add Assume#assumeFalse because org.apache.hadoop.fs.Path.initialize throws java.lang.IllegalArgumentException  when called it with a windows file path on Linux.
@masayuki038
Copy link
Contributor Author

@gszadovszky I fixed the test failures. Could you review this?

…ue part at index 2 on Windows

Clarify on which OS the tests are supposed to work.
@masayuki038
Copy link
Contributor Author

Could anyone merge or review this?

@zivanfi zivanfi merged commit f1a719b into apache:master May 27, 2019
if (RESOURCE_URI_SCHEME.equals(fileURI.getScheme())) {
return fileURI;
}
} catch (URISyntaxException ignore) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change that breaks the stated contract of this method: @throws IOException if there is an error creating a qualified URI.

In addition, the tests do not validate this change. What happens to invalid URIs after this change? Are characters automatically escaped by using Path? What about a resource URI with an unescaped character?

It is fine to change the method contract, but the new behavior must be as well understood as the previous behavior and it should be tested. The tests in this PR check, for example, that the file name has not changed and don't exercise the error handling code at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue Thanks for your comment.

This is a behavior change that breaks the stated contract of this method: @throws IOException if there is an error creating a qualified URI.

I don't agree. URI.create and fileURI.getScheme() don't throw IOException and URISyntaxException is not a subclass of IOException.

In addition, the tests do not validate this change. What happens to invalid URIs after this change?

The difference from URI.create and new URI is only whether the exception to throw is the IllegalArgumentException or URISyntaxException. The reason to use new URI is that catching with IllegalArgumentException may catch an unintended exception.

https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/2d4edd21bb38/src/share/classes/java/net/URI.java#l848

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants