Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make (Read/Write)BinaryFile work with char vector, use AutoFile #29229

Closed
wants to merge 1 commit into from

Conversation

Sjors
Copy link
Member

@Sjors Sjors commented Jan 11, 2024

ReadBinaryFile and WriteBinaryFile current work with std::string. This PR adds support for std::vector<unsigned char>>.

It also uses AutoFile now.

This is [update: probably not] used in #28983 to store the static key for the Template Provider, in a manner very similar to how we store the Tor v3 and i2p key. However it made no sense to me to store a CKey as plain text. See commit "Persist static key for Template Provider" for how it's used.

It uses a template and leverages the fact that both std::string and std::vector<unsigned char>> have an insert() method that can take a char array.

The unsigned char support is not used in this PR, but tests do cover it.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 11, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@Sjors
Copy link
Member Author

Sjors commented Jan 11, 2024

cc @vasild

@Sjors Sjors mentioned this pull request Jan 11, 2024
30 tasks
src/util/readwritefile.h Outdated Show resolved Hide resolved
@Sjors
Copy link
Member Author

Sjors commented Jan 11, 2024

Added extern template ... to the header to prevent "implicit instantiation of undefined template" error on the "no wallet, libbitcoinkernel" CI (https://cirrus-ci.com/task/6254856449556480)

That error doesn't happen on my machine running macOS clang 15, nor on Ubuntu gcc 13.2 - maybe a specific configure warning flag?

Update: that didn't work, still not sure how to reproduce.

 util/readwritefile.cpp:23:84: error: implicit instantiation of undefined template 'std::vector<unsigned char>'
        const size_t n = fread(buffer, 1, std::min(sizeof(buffer), maxsize - output.size()), f);

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/20390186031

@Sjors Sjors force-pushed the 2024/01/rw-binary-file branch 2 times, most recently from 18d18ac to 257b14c Compare January 12, 2024 08:26
@Sjors
Copy link
Member Author

Sjors commented Jan 12, 2024

I switched to using std::optional as the return type. It happily compiles on my end, we'll see what CI thinks...

@Sjors
Copy link
Member Author

Sjors commented Jan 12, 2024

Had to #include <vector> to please clang13 (should have included it anyway since it's used).

@vasild
Copy link
Contributor

vasild commented Jan 18, 2024

However it made no sense to me to store a CKey as plain text

I guess that by "plain text" here you mean std::string, right? std::string can store arbitrary non-ASCII characters, including '\0', so it is technically ok to use it for binary data.

More relevant in this case is that CKey stores sensitive data and takes care to wipe it from memory when freed. In #28983 Read/WriteBinaryData() is used in a way that defeats that - the sensitive data will be copied to a temporary variable (ordinary std::vector) for IO. Can we change Read/WriteBinaryData() to operate directly on CKey in such a way that data goes directly from CKey to the disk?

src/Makefile.test.include Show resolved Hide resolved
src/util/readwritefile.h Outdated Show resolved Hide resolved
@maflcko
Copy link
Member

maflcko commented Jan 18, 2024

Can we change Read/WriteBinaryData() to operate directly on CKey in such a way that data goes directly from CKey to the disk?

I haven't looked in detail, but writing bytes to a file can be achieved with one line of code:

CAutoFile{...} << Span{data};

@Sjors
Copy link
Member Author

Sjors commented Jan 22, 2024

I like the idea of operating on CKey directly. I'll try to add that, though not sure how to implement that securely.

The keys used in #28983 are not as important as wallet keys, but if we add a generic method to store a CKey on disk, then it should do so securely - so that future devs using that function don't shoot themselves in the foot. At least the current std::vector<unsigned char>> does not pretend to be secure.

@Sjors
Copy link
Member Author

Sjors commented Jan 22, 2024

I guess that by "plain text" here you mean std::string, right? std::string can store arbitrary non-ASCII characters, including '\0', so it is technically ok to use it for binary data.

I think the way WriteBinaryFile was initially used with Tor was that we parse the JSON returned by the Tor daemon and store the PrivateKey key field (UTF8 encoced?) in a file.

@Sjors
Copy link
Member Author

Sjors commented Jan 22, 2024

I haven't looked in detail, but writing bytes to a file can be achieved with one line of code

That would be a nice simplification. Almost (?) to the point of not needing these helper functions.

@maflcko
Copy link
Member

maflcko commented Jan 22, 2024

I haven't looked in detail, but writing bytes to a file can be achieved with one line of code

That would be a nice simplification. Almost (?) to the point of not needing these helper functions.

WriteBinaryFile is unused right now either way outside of tests, so I guess this could be removed regardless, as future code can just use the in-line one-liner?

@Sjors
Copy link
Member Author

Sjors commented Jan 23, 2024

@maflcko WriteBinaryFile is used by Tor and I2P to cache the service private key.

But I might still close this PR if all that's needed is one-liner.

@Sjors Sjors marked this pull request as draft January 23, 2024 09:51
@Sjors
Copy link
Member Author

Sjors commented Jan 23, 2024

It's not quite a one-liner because you still need to open a close a FILE and deal with various errors. So instead I modified [Write/Write]BinaryFile to use AutoFile instead of fwrite / fread.

However, AutoFile{f} >> output only returns a fraction of the file in the test...

@maflcko
Copy link
Member

maflcko commented Jan 23, 2024

However, AutoFile{f} >> output only returns a fraction of the file in the test...

(De)serialization of vectors or strings assumes the run-time length to be encoded first. Only arrays and spans assume no length, because it is assumed to be known at compile-time.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/20764871422

@Sjors
Copy link
Member Author

Sjors commented Jan 23, 2024

Ok, fixed that issue by first getting the file size and then resizing the std::vector / std::string.

That doesn't generalise nicely to known-size things like CKey. The maxsize argument also makes no sense for things with known size. So I might make a separate method for that. Which in turn makes this PR just a refactor with unused, but tested, support for std::vector<unsigned char>.

(this is ready for review even without CKey support, which I'll add in a separate commit and/or PR when I get to it)

@Sjors Sjors marked this pull request as ready for review January 23, 2024 10:39
@Sjors Sjors changed the title Make (Read/Write)BinaryFile work with char vector Make (Read/Write)BinaryFile work with char vector, use AutoFile Jan 23, 2024
@Sjors
Copy link
Member Author

Sjors commented Jan 23, 2024

I tried to find some existing code that could use the std::vector<unsigned char> variant. Didn't find it at first glance. It seems we almost always know what size to expect (per object).

So that might be a good reason to kill that variant and only support loading an std::string of unknown size for now.

Comment on lines 40 to 43
std::FILE *f = fsbridge::fopen(filename, "wb");
if (f == nullptr) return false;
try {
AutoFile{f} << Span{data};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::FILE *f = fsbridge::fopen(filename, "wb");
if (f == nullptr) return false;
try {
AutoFile{f} << Span{data};
try {
AutoFile{fsbridge::fopen(filename, "wb")} << Span{data};

nit: Can be written shorter

Copy link
Member Author

@Sjors Sjors Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to wrap my head around the various calls involved, but I guess the nullptr check is handled in read(), which is called by the various ser_read... functions in seralize.h, which is called by the Unserialize implementations, which is called by <<. (presumably the same for writing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the read side I'm keeping the explicit nullptr check for now, so I don't have to catch fs::file_size failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the point WriteBinaryFile is so small you might as well have the called do the try - catch. But might as well keep if around as long as ReadBinaryFile can't be made smaller.

ReadBinaryFile and WriteBinaryFile currently work with std::string.
This commit adds support for std::vector<unsigned char>>.

It uses a template and leverages the fact that both std::string and
std::vector<unsigned char>> have an insert() method that can take
a char array.

Also switch to using AutoFile.
@Sjors
Copy link
Member Author

Sjors commented Jan 23, 2024

I opened #29295 for CKey.

I also refactored #28983 to use AutoFile directly, see Sjors#32. So I no longer need this PR myself, but happy to continue it.

size_t file_size = fs::file_size(filename);
output.resize(std::min(file_size, maxsize));
try {
AutoFile{f} >> Span{output};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked more the previous version which called fread(3) here. It was simple stupid. This >> is now hard to follow, especially given that it depends on T. For vector it ends up calling AutoFile::detail_fread(). It does not check whether ferror(3) occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For vector it ends up calling AutoFile::detail_fread(). It does not check whether ferror(3) occurred.

It does. If the return value of detail_fread is not output.size(), operator>> will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not check whether ferror(3) occurred.

It does.

Where? There is no ferror(3) call. From the man page: "The function fread() does not distinguish between end-of-file and error, and callers must use feof(3) and ferror(3) to determine which occurred."

If the return value of detail_fread is not output.size(), operator>> will fail.

Yes, but it can be equal to the desired size under two conditions: eof, or error. The previous code distinguished between both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the return value of detail_fread is not output.size(), operator>> will fail.

Yes, but it can be equal to the desired size under two conditions: eof, or error.

No, the eof-error would only be raised if read past the desired size, not to it. Unless I am missing something?

I am asking, because if there was a bug, it should be fixed, or at least an issue should be filed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That "must" in the man page is pretty clear: "callers must use feof(3) and ferror(3) to determine which occurred". A ferror(3) check can't hurt and it is better to have an extra check that always returns "no error" than a missing check, failing to detect an IO error. The previous code was doing that - a dumb fread(3) followed by an unconditional ferror(3).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is called to determine which error occurred, see

throw std::ios_base::failure(feof() ? "AutoFile::read: end of file" : "AutoFile::read: fread failed");

Again, if there is a bug in the current code in master, it should be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extend the check to:

 void AutoFile::read(Span<std::byte> dst)
 {
-    if (detail_fread(dst) != dst.size()) {
+    if (detail_fread(dst) != dst.size() || std::ferror(m_file)) {
         throw std::ios_base::failure(feof() ? "AutoFile::read: end of file" : "AutoFile::read: fread failed");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is a bug in the current code in master, it should be fixed.

Done in #29307

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad something useful came out of this PR :-)

}
if (fclose(f) != 0) {
try {
AutoFile{fsbridge::fopen(filename, "wb")} << Span{data};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, the dumb version was easier to follow. The new one does not check whether fclose(3) failed, but it should. I think that is a serious deficiency in AutoFile itself.

fwrite(3) may succeed, but if a subsequent fclose(3) fails we should consider the data did not make it safely to disk and that the file is corrupted (fclose(3) writes any buffered data to disk using fflush(3), so a failure at fclose(3) is as bad as failure at fwrite(3)).

Copy link
Contributor

@vasild vasild Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a serious deficiency in AutoFile itself

Logged as #29307

Copy link
Member Author

@Sjors Sjors Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasild I would also prefer to fix things in AutoFile, but perhaps add a few comments to explain what happens under the hood?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could happen:

  1. fwrite() succeeds, returns ok the caller, some of the data is buffered in the OS and not yet on disk
  2. fclose() is called, it tries to fflush() the buffered data to disk but fails due to IO error. The caller ignores the error returned by fclose()
  3. the program continues with the wrong assumption that the data is safely on disk

Copy link
Member Author

@Sjors Sjors Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not good. I guess we also don't want to sync to disk, and block for that to complete, for every field that's >>'d to a file though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That's why I don't see a good solution. It is a design issue with AutoFile to flush/close from the destructor which can't signal the failure to the caller.

@achow101
Copy link
Member

achow101 commented Apr 9, 2024

The PR didn't seem to attract much attention in the past. Also, the issue seems not important enough right now to keep it sitting around idle in the list of open PRs.

Closing due to lack of interest.

@achow101 achow101 closed this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants