Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

ANY23-308 [NEW] Adding option -d support for yaml files #42

Merged
merged 4 commits into from Jul 13, 2017

Conversation

jgrzebyta
Copy link
Contributor

Continuation of #38:

  • Fix ANY23-308
    • Add unit test
    • Upadate TikaMIMETypeDetector class
    • Bring two utility classes into one module apache-any23-utils

- validate yaml file
- rename csvutils -> utils
- bring all utility class into util module
- update README

Signed-off-by: Jacek Grzebyta <grzebyta.dev@gmail.com>
- resolve conflicts:
	README.md

Signed-off-by:Jacek Grzebyta <grzebyta.dev@gmail.com>
@jgrzebyta jgrzebyta changed the title Any23 308 Any23-308 NEW Adding option -d support for yaml files Jul 11, 2017
@jgrzebyta jgrzebyta changed the title Any23-308 NEW Adding option -d support for yaml files ANY23-308 [NEW] Adding option -d support for yaml files Jul 11, 2017
@jgrzebyta
Copy link
Contributor Author

Hi @ansell @lewismc ,

Just for testing I have added another travis branch at https://github.com/jgrzebyta/any23/tree/travis to run all unit tests with command mvn -B clean test. Tests results in displayed in README file.

What do you think? Would it be useful to run tests on travis and build on the jenkins as it is now? Or is it better to run both on jenkins?

@ansell
Copy link
Member

ansell commented Jul 11, 2017

Jenkins is good the way it is right now doing both testing and deploying snapshots.

Just running the usual java sequence without customizing the "script" on TravisCI is the simplest to have it as an alternate regression test site.

Copy link
Member

@ansell ansell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs redesigning inside of YAMLValidator for maintainability and reuse.

@Test
public void csvTest()
throws Exception {
assertExtract("/org/apache/any23/extractor/csv/test-comma.csv");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with YAML, but is CSV (RFC 4180 Section 2 and all other variations) a strict subset of YAML or is it incidental that this test file works with the YAML parser? If it is only incidental we shouldn't be testing it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is false positive. It is just proof that for text-based formats additional recognising should be done from meta. I will delete that.

break;
}

is.reset(); // MUST BE AT THE END
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this must be called last, it should be done in a try-finally block. If this must be done at the end, it isn't clear why it is also being done inside of the loop.

return false;
}

if (!is.markSupported()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original InputStream will be modified without the ability to roll it back in this case. Simply try to call is.mark and if that fails, they need to change their calling code to provide a markable InputStream and try again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked in the source and markSupported() method olny contains return true or return false.

is = new BufferedInputStream(is);
}

boolean result = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variable isn't a great idea in the default false case, as you can simply return true whenever you decide it is try (and rely on try-finally to cleanup), and then return false at the end of the method.

}

// final break
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breaking at the end of a while(true) loop with no calls to "continue" shows the code needs to be changed. There doesn't seem to be anything iterative about the algorithm, so not clear why you are doing it this way.

break;
}
} catch (Exception ex) {
//do nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return false rather than silently aborting

// mark the reading frame position. MUST BE FIRST
is.mark(Integer.MAX_VALUE);

while (true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is not being used. Replace it with try-finally and replace the "result =..." cases with "return ...".

- file path add to meta
- add documentation to unit test

Signed-off-by: Jacek Grzebyta <grzebyta.dev@gmail.com>
- restore csvutils
- detect yaml based on the file name
- remove utils module

Signed-off-by:Jacek Grzebyta <grzebyta.dev@gmail.com>
@jgrzebyta
Copy link
Contributor Author

jgrzebyta commented Jul 12, 2017

I have changed code and yaml mime type is detected based on file name only rather by the content. I removed useless code: restored csvutils module from master branch and deleted utils one.

@ansell
Copy link
Member

ansell commented Jul 13, 2017

I am fixing some compile and test errors and merging it into master. Thanks!

@asfgit asfgit merged commit 9839e21 into apache:master Jul 13, 2017
@jgrzebyta jgrzebyta deleted the ANY23-308 branch July 13, 2017 10:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants