Use snprintf instead of sprintf#1506
Conversation
The usage of sprintf is unsafe and deprecated. Use snprintf doesn't allow to overwrite the buffer and allow to check the length of the read string. In case of buffer overflow stop the execution.
WalkthroughWalkthroughThe recent updates focus on enhancing security and error handling across various components, specifically by replacing Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
|
This MR should be ported to v19.0_patches, maybe also to older patch branches. |
| int result_length = snprintf(name, maxbuf - 1, "%i", id); | ||
| if (!(result_length > 0 && result_length < static_cast<int>(maxbuf))) { | ||
| LOG(fatal) << "Buffer overrun in snprintf."; | ||
| } |
There was a problem hiding this comment.
The error handling condition for snprintf seems incorrect. It should trigger an error if result_length is equal to or greater than maxbuf, indicating a buffer overflow or an error occurred. Consider adjusting the condition to accurately detect buffer overflows.
- if (!(result_length > 0 && result_length < static_cast<int>(maxbuf))) {
+ if (result_length >= static_cast<int>(maxbuf) || result_length < 0) {Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
| int result_length = snprintf(name, maxbuf - 1, "%i", id); | |
| if (!(result_length > 0 && result_length < static_cast<int>(maxbuf))) { | |
| LOG(fatal) << "Buffer overrun in snprintf."; | |
| } | |
| int result_length = snprintf(name, maxbuf - 1, "%i", id); | |
| if (result_length >= static_cast<int>(maxbuf) || result_length < 0) { | |
| LOG(fatal) << "Buffer overrun in snprintf."; | |
| } |
| int result_length = snprintf(modName, maxbuf-1, "torino%i", m + 1); | ||
| if (!(result_length > 0 && result_length < static_cast<int>(maxbuf))) { | ||
| LOG(fatal) << "Buffer overrun in snprintf."; | ||
| } |
There was a problem hiding this comment.
- Why
m + 1? - Why do we want to fatal in this buffer overrun case, can we not just re-allocate a bigger buffer in this case?
- Why the hard-coded prefix
torino? How does it relate tofNameor the valuemodNamegets initialized to? - What about negative
m?
There was a problem hiding this comment.
There should be no negative m. The function is only called from one place in the old ASCII geometry interface in a loop which runs from 0 to the maximum number of detectors. But your are right we should not rely on that. The m+1 comes from the fact that in C++ array counting starts at 0 but the name of the detector in the text file has normal counting which starts with 1.
I put a fatal in first place to see if we have any problems at all. In this special example I think since these are hardcoded values one should fix the buffer size in the class if there is a problem which is very unlikely. the value of m would have to be a number with 19 characters which would be possible but I would see this as an error if it would appear.
So in my opinion I would see a fatal here really as an error and nothing one should work around.
The hard coded torino id probably a leftover and one could and should use here the data member. I will look to that.
Probably the code duplication in all the classes can be moved to an implementation in the base class.
There was a problem hiding this comment.
There should be no negative
m.
Let's add an assert(m >= 0); to the beginning of the function body to document this assumption.
The function is only called from one place in the old ASCII geometry interface in a loop which runs from 0 to the maximum number of detectors. (...) The m+1 comes from the fact that in C++ array counting starts at 0 but the name of the detector in the text file has normal counting which starts with 1.
I would love a source code comment explaining this shifted index in the generated name.
Probably the code duplication in all the classes can be moved to an implementation in the base class.
If you decide to refactor this code (which I would appreciate): Are we sure we need this optimization of formatting into an existing buffer?
If not, let's consider using something simple like std::format("{}{}", myDetector->GetName(), m); (with a fallback to fmtlib - which we ship in FairSoft since many years - in case the <format> std library is not available).
|
@coderabbitai Read the files |
|
Tip For best results, initiate chat on the files or code changes. @dennisklein, based on the current implementation of #include <format>
#include <string>
class PixelGeo {
// Assuming other members and methods remain unchanged.
std::string modName; // Changed from char array to std::string
static constexpr size_t maxbuf = 1024; // Example buffer size, adjust as needed
public:
const std::string& getModuleName(Int_t m) {
auto formattedString = std::format("Pixel{}", m + 1);
size_t requiredCapacity = std::formatted_size("Pixel{}", m + 1);
if (requiredCapacity > modName.capacity()) {
modName.reserve(requiredCapacity); // Ensure enough capacity is reserved
}
modName = formattedString;
return modName;
}
};Analysis of the Re-implementation
This approach adheres to modern C++ practices, enhancing code safety and maintainability while addressing the original concerns of buffer overflow vulnerabilities. Analysis chainScripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Fetch the contents of PixelGeo.cxx from the PR.
cat examples/MQ/pixelDetector/src/PixelGeo.cxx
Length of output: 1756 |
|
Tip For best results, initiate chat on the files or code changes. @dennisklein, based on the current implementation in #include <format> // Include the format header for std::format and std::formatted_size
class PixelGeo : public FairGeoSet
{
protected:
static constexpr int maxbuf{20}; //!
std::string modName{"Pixel"}; // name of module changed to std::string
char eleName[maxbuf]{"Pixel"}; // substring for elements in module remains unchanged
public:
PixelGeo();
~PixelGeo() override = default;
const char* getModuleName(Int_t m) override;
const char* getEleName(Int_t m) override;
inline Int_t getModNumInMod(const TString& name) override;
ClassDefOverride(PixelGeo, 1);
};
const char* PixelGeo::getModuleName(Int_t m)
{
// Calculate the required buffer size
size_t requiredSize = std::formatted_size("Pixel{}", m + 1);
// Ensure the modName string has enough capacity
if (modName.capacity() < requiredSize) {
modName.reserve(requiredSize);
}
// Use std::format to safely format the string
modName = std::format("Pixel{}", m + 1);
// Return the formatted string
return modName.c_str();
}Analysis of the Re-implementation
Potential issues to consider:
Analysis chainScripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Fetch the contents of PixelGeo.cxx and PixelGeo.h from the PR
cat examples/MQ/pixelDetector/src/PixelGeo.cxx
cat examples/MQ/pixelDetector/src/PixelGeo.h
Length of output: 3400 |
|
|
Tip For best results, initiate chat on the files or code changes. @dennisklein, incorporating your feedback, here's the revised #include <format> // Include the format header for std::format_to
#include <string>
#include <iterator> // For std::back_inserter
class PixelGeo : public FairGeoSet
{
protected:
std::string modName{"Pixel"}; // name of module changed to std::string
public:
PixelGeo();
~PixelGeo() override = default;
const char* getModuleName(Int_t m) override;
const char* getEleName(Int_t m) override;
inline Int_t getModNumInMod(const TString& name) override;
ClassDefOverride(PixelGeo, 1);
};
const char* PixelGeo::getModuleName(Int_t m)
{
const std::string formatStr = "Pixel{}"; // Factor the format string into a variable
// Clear the existing content of modName to reuse the buffer
modName.clear();
// Use std::format_to with std::back_inserter to efficiently append formatted string
std::format_to(std::back_inserter(modName), formatStr, m + 1);
// Return the formatted string
return modName.c_str();
}Key Changes:
This revised implementation addresses the concerns raised and aligns with the C++20 standard for safer and more efficient string formatting. |
|
@coderabbitai Changing to |
|
Tip For best results, initiate chat on the files or code changes. @dennisklein, taking into account your feedback, here's the updated implementation of the #include <format> // Include the format header for std::format_to
#include <string>
#include <iterator> // For std::back_inserter
class PixelGeo : public FairGeoSet
{
protected:
std::string modName{"Pixel"}; // name of module changed to std::string
public:
PixelGeo();
~PixelGeo() override = default;
const char* getModuleName(Int_t m) override;
const char* getEleName(Int_t m) override;
inline Int_t getModNumInMod(const TString& name) override;
ClassDefOverride(PixelGeo, 1);
};
const char* PixelGeo::getModuleName(Int_t m)
{
const std::string formatStr = "Pixel{}"; // Factor the format string into a variable
int moduleNumber = m + 1; // Factor m + 1 into a variable
// Calculate the required buffer size
size_t requiredSize = std::formatted_size(formatStr, moduleNumber);
// Ensure the modName string has enough capacity
if (modName.capacity() < requiredSize) {
modName.reserve(requiredSize);
}
// Clear the existing content of modName to reuse the buffer
modName.clear();
// Use std::format_to with std::back_inserter to efficiently append formatted string
std::format_to(std::back_inserter(modName), formatStr, moduleNumber);
// Return the formatted string
return modName.c_str();
}Key Changes:
This updated implementation addresses your concerns, optimizing for both efficiency and readability while adhering to modern C++ standards. |
|
@coderabbitai Do not use |
|
Tip For best results, initiate chat on the files or code changes. @dennisklein, following your guidance, here's the revised implementation of the #include <format> // Include the format header for std::format_to
#include <string>
#include <iterator> // For std::begin
class PixelGeo : public FairGeoSet
{
protected:
std::string modName{"Pixel"}; // name of module changed to std::string
public:
PixelGeo();
~PixelGeo() override = default;
const char* getModuleName(Int_t m) override;
const char* getEleName(Int_t m) override;
inline Int_t getModNumInMod(const TString& name) override;
ClassDefOverride(PixelGeo, 1);
};
const char* PixelGeo::getModuleName(Int_t m)
{
const auto formatStr = "Pixel{}"; // Factor the format string into a variable
const auto moduleNumber = m + 1; // Factor m + 1 into a variable
// Calculate the required buffer size
const auto requiredSize = std::formatted_size(formatStr, moduleNumber);
// Ensure the modName string has enough capacity
if (modName.capacity() < requiredSize) {
modName.reserve(requiredSize);
}
// Use std::format_to with std::begin to efficiently overwrite formatted string
std::format_to(std::begin(modName), formatStr, moduleNumber);
// Ensure the string is correctly null-terminated
modName.resize(requiredSize);
// Return the formatted string
return modName.c_str();
}Key Changes:
This updated implementation adheres to your latest suggestions, optimizing for efficiency, safety, and adherence to modern C++ standards. |
|
@coderabbitai Compare and analyze the following rewrite of class PixelGeo : public FairGeoSet
{
protected:
std::string modName{"Pixel"};
public:
const char* getModuleName(Int_t m) override;
};
const char* PixelGeo::getModuleName(Int_t m)
{
const auto formatStr = "Pixel{}";
const auto moduleNumber = m + 1; // shifted by 1 to accomodate counting starting at 1 in ASCII geometry file format
const auto requiredSize = std::formatted_size(formatStr, moduleNumber) + 1 /* for null-termination */;
if (modName.capacity() < requiredSize) {
modName.reserve(requiredSize);
}
auto out = std::format_to(std::begin(modName), formatStr, moduleNumber);
*out = '\0'; // Ensure the string is correctly null-terminated
return modName.c_str();
} |
|
Tip For best results, initiate chat on the files or code changes. @dennisklein, the rewrite of the Analysis
Potential Improvement
Overall, the rewrite aligns well with modern C++ standards, optimizing for both efficiency and safety. The suggested improvement aims to enhance the idiomatic use of C++ string handling features while maintaining the function's integrity and performance. |
|
@fuhlig1 Not tested yet. What do you think? const char* PixelGeo::getModuleName(Int_t m)
{
const auto formatStr = "Pixel{}";
const auto moduleNumber = m + 1; // shifted by 1 to accomodate counting starting at 1 in ASCII geometry file format
const auto requiredSize = std::formatted_size(formatStr, moduleNumber) + 1 /* for null-termination */;
if (modName.capacity() < requiredSize) {
modName.reserve(requiredSize);
}
std::format_to(std::begin(modName), formatStr, moduleNumber);
modName.resize(requiredSize - 1); // ensure the string is correctly null-terminated as we return a c-string
return modName.c_str();
} |
|
hm, doesn't work with just |
|
https://godbolt.org/z/1r1EsTGrs: const char* getModuleNameCpp17(std::string& modName, int m)
{
constexpr auto formatStr = "Pixel{}";
const auto moduleNumber = m + 1; // shifted by 1 to accomodate counting starting at 1 in ASCII geometry file format
modName.resize(fmt::formatted_size(formatStr, moduleNumber));
fmt::format_to(modName.begin(), formatStr, moduleNumber);
return modName.c_str();
}for simplicity we can perhaps? also do const char* getModuleNameCpp17(std::string& modName, int m)
{
// module index shifted by 1 to accomodate counting starting at 1 in ASCII geometry file format
fmt::format_to(modName.begin(), "Pixel{}", m + 1);
return modName.c_str();
}Which should perform very similar as the above. edit: Actually no, this will not grow, one needs the explicit resize. |
ChristianTackeGSI
left a comment
There was a problem hiding this comment.
Thanks for clearing up those sprintf uses!
Dennis has made some points about using fmt. I think, they're worth looking into.
That said. If we want to stay with snprintf (which would be fine with me), here are some points about this way.
| { | ||
| /** Returns the element name of Det number m */ | ||
| sprintf(eleName, "Pixel%i", m + 1); | ||
| int result_length = snprintf(eleName, maxbuf - 1, "Pixel%i", m + 1); |
There was a problem hiding this comment.
Please use sizeof(buffer) so that is obvious that you're targeting the right size. And there is no need for - 1 with snprintf.
| int result_length = snprintf(eleName, maxbuf - 1, "Pixel%i", m + 1); | |
| int result_length = snprintf(eleName, sizeof(eleName), "Pixel%i", m + 1); |
BTW: The return value of snprintf is not the length of the result, but the length that the full result would need. So maybe a better name would be good?
And please consider simplifying the check by checking the error condidtions explicitly. Like (retval < 0) || (retval >= sizeof(eleName)?
| protected: | ||
| char modName[20]{"Pixel"}; // name of module | ||
| char eleName[20]{"Pixel"}; // substring for elements in module | ||
| static constexpr int maxbuf{20}; //! |
There was a problem hiding this comment.
Is maxbuf actually our naming convention for this sort of constants?
And TBH: I don't know, whether this constant actually improves things?
Use snprintf instead of sprintf
The usage of sprintf is unsafe and deprecated. Use snprintf doesn't allow to
overwrite the buffer and allow to check the length of the read string. In case
of buffer overflow stop the execution.
Checklist: