Skip to content

Conversation

yatharthranjan
Copy link
Member

@yatharthranjan yatharthranjan commented May 29, 2018

Add a command line parser for all the options

closes #22

The following options are present -

java -jar build/libs/restructurehdfs-all-0.3.3-SNAPSHOT.jar --help
Usage: hadoop jar restructurehdfs-all-0.3.3.jar [options] <input_path_1> 
      [<input_path_2> ...]
  Options:
    -c, --compression
      Compression to use when converting the files. Gzip is available.
      Default: none
    -d, --deduplicate
      Boolean to define if to use deduplication or not.
      Default: false
    -f, --format
      Format to use when converting the files. JSON and CSV is available.
      Default: csv
  * -u, --hdfs-uri
      The HDFS uri to connect to. Eg - 'hdfs://<HOST>:<RPC_PORT>/<PATH>'.
    -h, --help
      Display the usage of the program with available options.
  * -o, --output-directory
      The output folder where the files are to be extracted.



// Default set to false because causes loss of records from Biovotion data. https://github.com/RADAR-base/Restructure-HDFS-topic/issues/16
@Parameter(names = { "-d", "--deduplicate" }, description = "Boolean to define if to use deduplication or not.", validateWith = BooleanValidator.class)
public Boolean deduplicate = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jcommander supports on/off flags, right? Boolean should not be needed, just the -d flag. Like --help

}

USE_GZIP = "gzip".equalsIgnoreCase(commandLineArgs.compression);
DO_DEDUPLICATE = commandLineArgs.deduplicate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these parameters are explicitly parsed anyway, perhaps just pass them to the class instead of having them static?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the restructureRecords could possibly take a builder pattern.

@Parameter(names = { "-u", "--hdfs-uri" }, description = "The HDFS uri to connect to. Eg - 'hdfs://<HOST>:<RPC_PORT>/<PATH>'.", required = true, validateWith = { HdfsUriValidator.class, PathValidator.class })
public String hdfsUri;

@Parameter(names = { "-i", "--hdfs-root-directory" }, description = "The input HDFS root directory from which files are to be read. Eg - '/topicAndroidNew'", required = true, validateWith = PathValidator.class)
Copy link
Contributor

@blootsvoets blootsvoets May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we choose between having a URI or multiple paths? Having both a path and an URI seems overkill... So

command <input_uri> <output_dir>
# or
command -u <uri> -o <dir> <input1> [<input2> ...]

@yatharthranjan
Copy link
Member Author

@blootsvoets the changes requested have been made. Thanks

Copy link
Contributor

@blootsvoets blootsvoets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some minor comments


public RestructureAvroRecords(String inputPath, String outputPath) {
this.setInputWebHdfsURL(inputPath);
private RestructureAvroRecords(String hdfsUri, String outputPath, boolean gzip, boolean dedup, String format) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass the Builder class here instead, to prevent large argument lists.

return this;
}

public Builder doDeuplicate(final boolean dedup) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doDeuplicate -> doDeduplicate

Copy link
Contributor

@blootsvoets blootsvoets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@yatharthranjan yatharthranjan merged commit e015f31 into dev May 31, 2018
@yatharthranjan yatharthranjan deleted the command_line_parser branch May 31, 2018 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants