Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add self extracting executable #35775

Merged
merged 20 commits into from Jun 10, 2022

Conversation

FArthur-cmd
Copy link
Contributor

@FArthur-cmd FArthur-cmd commented Mar 30, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add self extracting executable #34755

@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 30, 2022
@FArthur-cmd FArthur-cmd added the can be tested Allows running workflows for external contributors label Mar 30, 2022
@alexey-milovidov alexey-milovidov self-assigned this Apr 1, 2022
@yakov-olkhovskiy yakov-olkhovskiy self-assigned this Apr 11, 2022
@FArthur-cmd FArthur-cmd marked this pull request as ready for review April 14, 2022 11:44
@FArthur-cmd FArthur-cmd changed the title [WIP] Add self exctr exec Add self extracting executable Apr 14, 2022
@robot-ch-test-poll1 robot-ch-test-poll1 removed the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 21, 2022
else
{
/// move decompressed file instead of this binary and apply command
char bash[] = "/usr/bin/bash";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why cannot we run it directly, without bash?
ClickHouse should not require shell interpreter to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to execute two programs: replace current binary with other file (clickhouse.decompressed) and execute it. I found solution only with bash. I'm open to ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use rename function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From man: If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing. However, there will probably be a window in which both oldpath and newpath refer to the file being renamed.
On practice it renames successfully, but execve tries to execute decompressor instead other file, and fails with Text file busy. But next command to this binary will be successful.

@@ -0,0 +1,352 @@
#include <cstring>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, compressor can be written in C++ with exceptions, RAII, etc...
Only decompressor is intended to be lightweight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it to make the compressor and decompressor look the same. I can change it, but this version looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@alexey-milovidov
Copy link
Member

@yakov-olkhovskiy one month has passed, I will reassign.

@yakov-olkhovskiy
Copy link
Member

@alexey-milovidov missed this one doing other projects... can do next week

utils/self-extr-exec/compressor.cpp Outdated Show resolved Hide resolved
utils/self-extr-exec/compressor.cpp Outdated Show resolved Hide resolved
utils/self-extr-exec/compressor.cpp Show resolved Hide resolved
utils/self-extr-exec/compressor.cpp Outdated Show resolved Hide resolved
utils/self-extr-exec/compressor.cpp Outdated Show resolved Hide resolved
@yakov-olkhovskiy yakov-olkhovskiy self-assigned this May 28, 2022
@nikitamikhaylov
Copy link
Member

nikitamikhaylov commented Jun 3, 2022

@yakov-olkhovskiy Did you test it with ClickHouse directly? It would be nice to add it by default to our containers

@yakov-olkhovskiy
Copy link
Member

@nikitamikhaylov no, I didn't - only tested with some helloworld executable to check arguments and environment transfer

@antonio2368 antonio2368 self-assigned this Jun 3, 2022
Copy link
Member

@antonio2368 antonio2368 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before reviewing the code I noticed it's written more in a C-style.
What do you think about rewriting it first in a more C++-style (e.g. replacing new[], delete[] with std::vector)?

EDIT: now I see that a lot of C++ code was removed on purpose.

@nikitamikhaylov
Copy link
Member

It would be also nice to forward arguments to a binary when we do decompression and execute it. From quick look into code I don't understand whether we do it or not

@FArthur-cmd
Copy link
Contributor Author

@nikitamikhaylov If I understand you correctly

execvp(bash, newargv);

@yakov-olkhovskiy
Copy link
Member

@nikitamikhaylov we do

/// Set command to `mv filename.decompressed filename && filename agrs...`
void fillCommand(char command[], int argc, char * argv[], size_t length)
{
memset(command, '\0', 3 + strlen(argv[0]) + 14 + strlen(argv[0]) + 4 + strlen(argv[0]) + length + argc);
/// position in command
size_t shift = 0;
/// Support variables to create command
char mv[] = "mv ";
char decompressed[] = ".decompressed ";
char add_command[] = " && ";
char space[] = " ";
fill(command, mv, 3, shift);
fill(command, argv[0], strlen(argv[0]), shift);
fill(command, decompressed, 14, shift);
fill(command, argv[0], strlen(argv[0]), shift);
fill(command, add_command, 4, shift);
fill(command, argv[0], strlen(argv[0]), shift);
fill(command, space, 1, shift);
/// forward all arguments
for (int i = 1; i < argc; ++i)
{
fill(command, argv[i], strlen(argv[i]), shift);
if (i != argc - 1)
fill(command, space, 1, shift);
}
}

utils/self-extr-exec/decompressor.cpp Outdated Show resolved Hide resolved
utils/self-extr-exec/decompressor.cpp Outdated Show resolved Hide resolved
yakov-olkhovskiy and others added 2 commits June 7, 2022 07:04
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
@yakov-olkhovskiy
Copy link
Member

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Jun 8, 2022

update

✅ Branch has been successfully updated

@yakov-olkhovskiy yakov-olkhovskiy merged commit 5a59957 into ClickHouse:master Jun 10, 2022
fillCommand(command, argc, argv, length);

/// replace file and call executable
char * newargv[] = { bash, executable, command, nullptr };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we run it with sh -c?
Why is shell needed?

It makes everything more complicated, and proper argument forwarding is nearly impossible.

I think ClickHouse should run under chroot with no shell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I remember the reason for using sh was that attempt to run just clickhouse gives "text file is busy" error.
And what about chroot here? how it helps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can workaround the "text file busy" error by renaming the old executable to a temporary file first.
Chroot will not help, it is just a motivation to not use sh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will try

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov doesn't work either. Algorithm is next:

system("mv " + argv[0] + " " + argv[0] + ".tmp");
system("mv " + argv[0] + ".decompressed " + argv[0]);
system("rm " + argv[0] + ".tmp"); // optional, result is the same without it
execvp(argv[0], argv);

result the same - last execvp gives "Text file busy" error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov BUT replacing mv clickhouse.decompressed clickhouse by cp clickhouse.decompressed clickhouse seems to work! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants