ANY23-308 [NEW] Adding option -d support for yaml files #42
Conversation
- validate yaml file - rename csvutils -> utils - bring all utility class into util module - update README Signed-off-by: Jacek Grzebyta <grzebyta.dev@gmail.com>
- resolve conflicts: README.md Signed-off-by:Jacek Grzebyta <grzebyta.dev@gmail.com>
Just for testing I have added another travis branch at https://github.com/jgrzebyta/any23/tree/travis to run all unit tests with command What do you think? Would it be useful to run tests on travis and build on the jenkins as it is now? Or is it better to run both on jenkins? |
Jenkins is good the way it is right now doing both testing and deploying snapshots. Just running the usual java sequence without customizing the "script" on TravisCI is the simplest to have it as an alternate regression test site. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs redesigning inside of YAMLValidator for maintainability and reuse.
@Test | ||
public void csvTest() | ||
throws Exception { | ||
assertExtract("/org/apache/any23/extractor/csv/test-comma.csv"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with YAML, but is CSV (RFC 4180 Section 2 and all other variations) a strict subset of YAML or is it incidental that this test file works with the YAML parser? If it is only incidental we shouldn't be testing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is false positive. It is just proof that for text-based formats additional recognising should be done from meta
. I will delete that.
break; | ||
} | ||
|
||
is.reset(); // MUST BE AT THE END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this must be called last, it should be done in a try-finally block. If this must be done at the end, it isn't clear why it is also being done inside of the loop.
return false; | ||
} | ||
|
||
if (!is.markSupported()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original InputStream will be modified without the ability to roll it back in this case. Simply try to call is.mark and if that fails, they need to change their calling code to provide a markable InputStream and try again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked in the source and markSupported()
method olny contains return true
or return false
.
is = new BufferedInputStream(is); | ||
} | ||
|
||
boolean result = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variable isn't a great idea in the default false case, as you can simply return true whenever you decide it is try (and rely on try-finally to cleanup), and then return false at the end of the method.
} | ||
|
||
// final break | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
breaking at the end of a while(true) loop with no calls to "continue" shows the code needs to be changed. There doesn't seem to be anything iterative about the algorithm, so not clear why you are doing it this way.
break; | ||
} | ||
} catch (Exception ex) { | ||
//do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return false rather than silently aborting
// mark the reading frame position. MUST BE FIRST | ||
is.mark(Integer.MAX_VALUE); | ||
|
||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is not being used. Replace it with try-finally and replace the "result =..." cases with "return ...".
- file path add to meta - add documentation to unit test Signed-off-by: Jacek Grzebyta <grzebyta.dev@gmail.com>
- restore csvutils - detect yaml based on the file name - remove utils module Signed-off-by:Jacek Grzebyta <grzebyta.dev@gmail.com>
I have changed code and yaml mime type is detected based on file name only rather by the content. I removed useless code: restored |
I am fixing some compile and test errors and merging it into master. Thanks! |
Continuation of #38:
TikaMIMETypeDetector
classapache-any23-utils