Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINIFICPP-72: Add Tar and Zip Support for MergeContent #146

Closed
wants to merge 1 commit into from

Conversation

minifirocks
Copy link

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with MINIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file?
  • If applicable, have you updated the NOTICE file?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

int64_t ret = 0;
uint64_t read_size;
ret = stream->read(buffer, buffer_size_);
if (!stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking whether stream is null after you attempt a function call on it seems counterintuitive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!stream means that we read EOF after the operation, in this case, we did not get the full size.

}
}
}
ReadCallback readCb(flow->getSize(), arch, entry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We instantiate the entire buffer of the archive into memory? If that is our process, should we not have memory limits?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will try to do incremental read.

@@ -24,7 +24,7 @@ find_package(CURL REQUIRED)
set(CMAKE_EXE_LINKER_FLAGS "-Wl,--export-all-symbols")
set(CMAKE_SHARED_LINKER_FLAGS "-Wl,--export-symbols")

include_directories(../../libminifi/include ../../libminifi/include/c2 ../../libminifi/include/c2/protocols/ ../../libminifi/include/core/state ./libminifi/include/core/statemanagement/metrics ../../libminifi/include/core/yaml ../../libminifi/include/core ../../thirdparty/spdlog-20170710/include ../../thirdparty/concurrentqueue ../../thirdparty/yaml-cpp-yaml-cpp-0.5.3/include ../../thirdparty/civetweb-1.9.1/include ../../thirdparty/jsoncpp/include ../../thirdparty/leveldb-1.18/include ../../thirdparty/)
include_directories(../../libminifi/include ../../libminifi/include/c2 ../../libminifi/include/c2/protocols/ ../../libminifi/include/core/state ./libminifi/include/core/statemanagement/metrics ../../libminifi/include/core/yaml ../../libminifi/include/core ../../thirdparty/spdlog-20170710/include ../../thirdparty/concurrentqueue ../../thirdparty/yaml-cpp-yaml-cpp-0.5.3/include ../../thirdparty/civetweb-1.9.1/include ../../thirdparty/jsoncpp/include ../../thirdparty/leveldb-1.18/include ../../thirdparty/libarchive-3.3.2/libarchive ../../thirdparty/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you recall the error that necessitated adding lib archive here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanning dependencies of target minifi-http-curl
[ 15%] Building CXX object extensions/http-curl/CMakeFiles/minifi-http-curl.dir/HttpCurlLoader.cpp.o
In file included from /Users/binqiu/m/nifi-minifi-cpp/extensions/http-curl/HttpCurlLoader.cpp:20:
In file included from /Users/binqiu/m/nifi-minifi-cpp/extensions/http-curl/../../libminifi/include/core/FlowConfiguration.h:37:
/Users/binqiu/m/nifi-minifi-cpp/extensions/http-curl/../../libminifi/include/processors/MergeContent.h:24:10: fatal error: 'archive_entry.h' file not found

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's far too generic of an include directories statement, that's why. we shouldn't be cross compiling merge content for curl.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is FlowConfiguration.h which include MergeContent.h

try {
permInt = std::stoi(perm);
archive_entry_set_perm(entry, (mode_t) permInt);
} catch (...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we log perm when this happens?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apiri @phrocker fixed.

flows.front()->getAttribute(BinFiles::SEGMENT_ORIGINAL_FILENAME, fileName);
}
if (!fileName.empty()) {
fileName += ".tar";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if your file is already a tar? Will we handle that case and then also rename it fileName.tar.tar?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as blew zip. we will tar the tar file into a tar ball.

flows.front()->getAttribute(BinFiles::SEGMENT_ORIGINAL_FILENAME, fileName);
}
if (!fileName.empty()) {
fileName += ".zip";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here...I don't really mind if it's named file.zip.zip, but if we have the flow going through zip twice, will we zip the two separate zips into a larger one or combine them? This isn't clear to me from reading the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we will zip the two separate zip into a larger one

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if two sep zip files fail into the same bin, we will create a single zip file which zip these two zip files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool, thanks!

@minifirocks
Copy link
Author

@phrocker please review the same
@apiri @jdye64 please review the LICENSE file for libarchive to see whether it is OK. Joe raise some concern in last PR.

@minifirocks
Copy link
Author

@phrocker @apiri please let me know if you have comments.

Copy link
Member

@apiri apiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minifirocks I think the LICENSE needs to be revisited and left some thoughts on that. Also, were you able to adjust the incremental approach to avoid the allocation for memory as pointed out by Marc? Functionality looks okay otherwise and did some simple testing.

LICENSE Outdated
@@ -564,4 +564,55 @@ ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
========================================================================
The libarchive distribution as a whole is Copyright by Tim Kientzle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be more detailed and specific in our LICENSE than this. We will need to break out particular items and their terms for those efforts as there are several licenses and considerations at play for the various components of this library.

I would additionally recommend removing all extraneous parts of libarchive to reduce this footprint such as examples and test resources and those build scripts unneeded for our build.

@minifirocks
Copy link
Author

The incremental read already been added to address Marc comments.

@minifirocks minifirocks changed the title Archive merge MINIFICPP-72: Add Tar and Zip Support for MergeContent Oct 13, 2017
@minifirocks
Copy link
Author

@apiri remove test and doc from lib archieve, update LICENSE to add each author.

Copy link
Member

@apiri apiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do a bit more with the LICENSE terms as it is not currently in the correct format nor explicitly enumerating the various components of the library. I also noticed there are still a couple items that @phrocker had requested changes on you were going to evaluate that have not had any updates I can see around them (the incremental read of the archive and logging requests are the two that come to mind).

LICENSE Outdated
(END LICENSE TEXT)

Each individual file in this distribution should have a clear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to call out the specific items and not just include the boilerplate message by the source LICENSE.

In the case of those exclusions we must note that "the project includes <project/source> which is available under " and the associated copyright. As a reference, scope out the other items in this LICENSE to see how they are handled.

We should not have the caveats listed by the source but interpret them and include the appropriate clauses for those items which we do include.

LICENSE Outdated
@@ -564,4 +564,64 @@ ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
========================================================================
The libarchive distribution as a whole is Copyright by Tim Kientzle and others
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be converted to a "This project includes ... which is available under ..." phrasing in lieu of taking the associated words from the contributing license.

@@ -564,4 +564,68 @@ ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apiri fix the license

@@ -67,15 +70,19 @@ class BinaryConcatenationMerge : public MergeBin {
~ReadCallback() {
}
int64_t process(std::shared_ptr<io::BaseStream> stream) {
uint8_t buffer[buffer_size_];
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apiri fixed for incremental read

@minifirocks
Copy link
Author

@apiri @phrocker rebased.

Copy link
Contributor

@phrocker phrocker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 will merge as soon as I have a chance. Thanks!

@minifirocks minifirocks deleted the archive_merge branch October 20, 2017 20:11
asfgit pushed a commit that referenced this pull request Oct 20, 2017
This closes #146.

Signed-off-by: Marc Parisi <phrocker@apache.org>
nghiaxlee pushed a commit to nghiaxlee/nifi-minifi-cpp that referenced this pull request Jul 8, 2019
This closes apache#146.

Signed-off-by: Marc Parisi <phrocker@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants