Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Edit atoe_sprintf to handle arguments of any length #6601

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

danjapapajani
Copy link

This commit rewrites atoe_sprintf to handle string arguments of any length by:

  • Computing length of incoming arguments using vsnprintf
  • Use this length to create wrkbuf

Signed-off-by: Danja Papajani danja.papajani@ibm.com

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for supporting the project, and congratulations on your first contribution! A project committer will shortly review your contribution. In the mean time, if you haven't had a chance please skim over the contribution guidelines which all pull requests must adhere to. If the ECA pull request check fails, have a look at the instructions for signing the ECA in the legal considerations section.

If you run into any problems our community will be happy to assist you in any way we can. There are a number of recommended ways to interact with the community. We encourage you to ask questions, or drop by to say hello.

@danjapapajani danjapapajani changed the title WIP: Edit atoe_sprintf to handle arguments of any length Edit atoe_sprintf to handle arguments of any length Jul 7, 2022
@joransiu
Copy link
Contributor

@0xdaryl / @mstoodle : Can you recommend the appropriate committers who can review this PR? Thanks!

@0xdaryl
Copy link
Contributor

0xdaryl commented Jul 12, 2022

@babsingh : please review if this is within your purview.

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

  1. Why these changes are needed? Which problem/issue will this fix resolve?
  2. Can you confirm if these changes are zOS specific?
  3. Was any OpenJ9 testing done with this change?
  4. There will be a performance impact. How expensive is the new implementation? Can you evaluate the perf cost with a micro-benchmark?
  5. There are other places where a static char array of size BUFLEN is used. Updating just atoe_sprintf will make the rest of the API look inconsistent. So, the other similar locations will need to be updated in the same manner.
  6. Can the problem/issue be resolved by just increasing the size of BUFLEN?

Reducing the perf cost

With the below impl, the va_list size will only be evaluated for outlier cases that cannot be satisfied with the BUFLEN static char array.

    Take the Fast-path // current implementation
    if (Fast-path fails):
       Take the slow-path // your implementation

util/a2e/atoe.c Outdated
@@ -1342,12 +1343,20 @@ int
atoe_sprintf(char *buf, char *ascii_chars, ...)
{
int len;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign an initial value to each variable: https://github.com/eclipse/omr/blob/master/doc/CodingStandard.md#variable-declarations

Suggested change
int len;
int len = 0;

util/a2e/atoe.c Outdated
@@ -1342,12 +1343,20 @@ int
atoe_sprintf(char *buf, char *ascii_chars, ...)
{
int len;
char wrkbuf[BUFLEN];
int argSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int argSize;
int argSize = 0;

util/a2e/atoe.c Outdated

va_list args;
va_list args, args_copy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never declare more than one variable on a line: https://github.com/eclipse/omr/blob/master/doc/CodingStandard.md#variable-declarations

Suggested change
va_list args, args_copy;
va_list args;
va_list args_copy;

util/a2e/atoe.c Outdated Show resolved Hide resolved

len = atoe_vsnprintf(wrkbuf, BUFLEN, ascii_chars, args);
len = atoe_vsnprintf(wrkbuf, argSize, ascii_chars, args);

va_end(args);
if (-1 == len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strcpy((char *)buf, wrkbuf);

Will the strcpy segfault if buf == NULL (user-input)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, even with your suggested implementation for reducing performance costs, we will at least have to grab the length of the incoming arguments to decide if the "fast-path" option would fail

Not necessarily. See https://linux.die.net/man/3/vsnprintf.

The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated. (See also below under NOTES.)

Please confirm if the above statement is also true for our implementation.

If so, then the following should work:

len = atoe_vsnprintf(wrkbuf, BUFLEN, ascii_chars, args);
if (len >= BUFLEN) {
   /* Truncated: take the slow path. */
}

Size of wrkbuf must be defined before the call to vsnprintf, this is why we grab the length of the incoming string before creating the char array. If the conditional you suggest passes, we would create another char array with the correct size. Is this acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it helps or not but with my changes to atoe_vsnprintf in this1 PR, it would behave more like std vsnprintf! for instance if you call it like atoe_vsnprintf(NULL, 0, format, arguments) it would return estimate of how much space that operation would take without actually write to any buffer! (it was returning -1 before my PR)

util/a2e/atoe.c Outdated
/* Add 1 to accommodate for null terminating char */
argSize = vsnprintf(NULL, 0, ebcdic_chars, args_copy) + 1;

char wrkbuf[argSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
char wrkbuf[argSize];
char wrkbuf[argSize] = {0};

util/a2e/atoe.c Outdated
va_copy(args_copy, args);

char *ebcdic_chars = a2e_string(ascii_chars);
/* Add 1 to accommodate for null terminating char */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper grammar should be enforced for comments.

Suggested change
/* Add 1 to accommodate for null terminating char */
/* Add 1 to accommodate for the null terminating char. */

@danjapapajani
Copy link
Author

danjapapajani commented Jul 14, 2022

  1. Why these changes are needed? Which problem/issue will this fix resolve?
  2. Can you confirm if these changes are zOS specific?
  3. Was any OpenJ9 testing done with this change?
  4. There will be a performance impact. How expensive is the new implementation? Can you evaluate the perf cost with a micro-benchmark?
  5. There are other places where a static char array of size BUFLEN is used. Updating just atoe_sprintf will make the rest of the API look inconsistent. So, the other similar locations will need to be updated in the same manner.
  6. Can the problem/issue be resolved by just increasing the size of BUFLEN?

Reducing the perf cost

With the below impl, the va_list size will only be evaluated for outlier cases that cannot be satisfied with the BUFLEN static char array.

    Take the Fast-path // current implementation
    if (Fast-path fails):
       Take the slow-path // your implementation
  1. When creating a classpath value, only paths that completely occur within the first 6144 characters are honoured. When setting the classpath string that is dependant on user input, sprintf located here: https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/openj9/src/java.base/share/native/libjli/java.c#L996 appends the classpath. On z/OS, if a user specifies a long enough classpath, the following call to atoe_sprintf currently truncates the argument, resulting in incorrect behaviour. For example, if the location of a jar file does not completely occur in the first 6144 characters, a NoClassDefFoundError will occur while attempting to load classes which exist in that jar file.
  2. Although the atoe code can be used by any platform, it is likely only used on z/OS. This specific code is built into libwrappers.so on z/OS
  3. OpenJ9 testing has been successful. VMFarm Build ID (Axxon): 32382
  4. There haven't been any formal tests/microbenchmarks regarding performance. These can be looked into. However, even with your suggested implementation for reducing performance costs, we will at least have to grab the length of the incoming arguments to decide if the "fast-path" option would fail
  5. Changes to the following functions can be made for consistency: atoe_fprintf, atoe_printf, atoe_vfprintf, and atoe_vsprintf
  6. Increasing BUFLEN will only increase the character limit allowed for incoming arguments (>6144). Truncation will still occur if any string length is larger than BUFLEN without the atoe_sprintf logic change

@babsingh
Copy link
Contributor

babsingh commented Jul 14, 2022

However, even with your suggested implementation for reducing performance costs, we will at least have to grab the length of the incoming arguments to decide if the "fast-path" option would fail

Not necessarily. See https://linux.die.net/man/3/vsnprintf.

The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated. (See also below under NOTES.)

Please confirm if the above statement is also true for our implementation.

If so, then the following should work:

len = atoe_vsnprintf(wrkbuf, BUFLEN, ascii_chars, args);
if (len >= BUFLEN) {
   /* Truncated: take the slow path. */
}

@babsingh
Copy link
Contributor

Size of wrkbuf must be defined before the call to vsnprintf, this is why we grab the length of the incoming string before creating the char array. If the conditional you suggest passes, we would create another char array with the correct size. Is this acceptable?

Stack allocating a char array based on a user specified string, which can vary in size, has the risk of exhausting the stack. In case of the slow-path, the char array should be allocated on the heap via malloc. The slow-path will be only be used in rare cases where the 6144 char array is not big enough. It should be acceptable to create another char array since this scenario won't happen very often. But, it will be cheaper than evaluating the size and performing va_copy everytime atoe_sprintf is invoked.

Will the strcpy segfault if buf == NULL (user-input)?

It will segfault if the src and dest pointers are uninitialized. There needs to be a NULL check for the strcpy.

If the src/buf string provided by the user is not big enough to store the generated string (wrkbuf), then there will be a buffer-overflow, which is a security issue. This is allowed because the user is expected to provide a sufficiently sized string before calling sprintf.

On z/OS, if a user specifies a long enough classpath, the following call to atoe_sprintf currently truncates the argument, resulting in incorrect behaviour.

In libjli/java.c#L996, a string of sufficient size (def/buf) is created before sprintf is invoked. There will be no buffer overflow. But, snprintf can be used in libjli/java.c#L996, which should resolve the NoClassDefFoundError.

size_t defSize = 0;
...
defSize = sizeof(format) - 2 + JLI_StrLen(s);
def = JLI_MemAlloc(defSize);
snprintf(def, defSize, format, s);

@danjapapajani
Copy link
Author

danjapapajani commented Jul 15, 2022

Stack allocating a char array based on a user specified string, which can vary in size, has the risk of exhausting the stack. In case of the slow-path, the char array should be allocated on the heap via malloc. The slow-path will be only be used in rare cases where the 6144 char array is not big enough. It should be acceptable to create another char array since this scenario won't happen very often. But, it will be cheaper than evaluating the size and performing va_copy everytime atoe_sprintf is invoked.

It will segfault if the src and dest pointers are uninitialized. There needs to be a NULL check for the strcpy.

If the src/buf string provided by the user is not big enough to store the generated string (wrkbuf), then there will be a buffer-overflow, which is a security issue. This is allowed because the user is expected to provide a sufficiently sized string before calling sprintf.

On z/OS, if a user specifies a long enough classpath, the following call to atoe_sprintf currently truncates the argument, resulting in incorrect behaviour.

In libjli/java.c#L996, a string of sufficient size (def/buf) is created before sprintf is invoked. There will be no buffer overflow. But, snprintf can be used in libjli/java.c#L996, which should resolve the NoClassDefFoundError.

size_t defSize = 0;
...
defSize = sizeof(format) - 2 + JLI_StrLen(s);
def = JLI_MemAlloc(defSize);
snprintf(def, defSize, format, s);

After some tests, I can confirm that atoe_vsnprintf does not perform as vsnprintf is stated to in the man page. The len returns as the BUFLEN in https://github.com/eclipse/omr/blob/master/util/a2e/atoe.c#L1350. This is due to the nature of the atoe_vsnprintf function, where the length passed in is used accordingly: https://github.com/eclipse/omr/blob/master/util/a2e/atoe_utils.c#L430. I can edit the function to, in turn, call vsnprintf on the incoming arguments to find the true length, but this does not change our performance result.

Are you suggesting to grab the length of string s in libjli/java.c#L996 and use this to pass onto atoe_sprintf? I'm not sure I understand this comment

A null check can be added for buf, is there an error message that is preferred for these methods? i.e. return -1?

@babsingh
Copy link
Contributor

atoe_vsnprintf does not perform as vsnprintf is stated to in the man page. The len returns as the BUFLEN

Can you elaborate? Does it return BUFLEN always or just when the string is truncated?

Based on atoe_utils.c#L587-L588, it should return the correct size of the string unless it is truncated. strlen derives the length based on where the null-terminator (\0) is placed.

For tests, did you add print statements in atoe_sprintf to monitor len returned by atoe_vsnprintf for the truncated and valid scenarios?

Are you suggesting to grab the length of string s in libjli/java.c#L996 and use this to pass onto atoe_sprintf? I'm not sure I understand this comment

They already evaluate the length in java.c#L993. There are two different functions: sprintf == atoe_sprintf and snprintf == atoe_snprintf. snprintf takes the length of buf/def string as the second input parameter. Instead of using sprintf, we can explore using snprintf in libjli/java.c#L996. Over there, snprintf is also defined as JLI_Snprintf.

A null check can be added for buf, is there an error message that is preferred for these methods? i.e. return -1?

Ref: https://cplusplus.com/reference/cstdio/sprintf/. In the glibc implementation, sprintf segfaults if the first input parameter (buf/str) is NULL. Our current implementation matches this behaviour. We don't need the null check since we want to match the glibc implementation.

@danjapapajani
Copy link
Author

Can you elaborate? Does it return BUFLEN always or just when the string is truncated?
Based on atoe_utils.c#L587-L588, it should return the correct size of the string unless it is truncated. strlen derives the length based on where the null-terminator (\0) is placed.

Yes, sorry for misspeaking. To clarify, atoe_vsnprintf returns a len of BUFLEN when the length of the arguments are greater than BUFLEN (i.e. when the string is truncated).

For tests, did you add print statements in atoe_sprintf to monitor len returned by atoe_vsnprintf for the truncated and valid scenarios?

I can confirm my test was adding print statements in atoe_sprintf to monitor len returned by atoe_vsnprintf for the truncated and valid scenarios.

They already evaluate the length in java.c#L993. There are two different functions: sprintf == atoe_sprintf and snprintf == atoe_snprintf. snprintf takes the length of buf/def string as the second input parameter. Instead of using sprintf, we can explore using snprintf in libjli/java.c#L996. Over there, snprintf is also defined as JLI_Snprintf.

Ah I see, let me run some tests regarding this suggestion.

Ref: https://cplusplus.com/reference/cstdio/sprintf/. In the glibc implementation, sprintf segfaults if the first input parameter (buf/str) is NULL. Our current implementation matches this behaviour. We don't need the null check since we want to match the glibc implementation.

Heard

@danjapapajani
Copy link
Author

@babsingh UPDATE: An internal PR was made regarding your suggestion of using snprintf in place of sprintf at libjli/java.c#L996. That being said, I would still like to address the issue presented in the PR regarding atoe_sprintf for completion.

There currently stands two possible options:

  1. Edit/Update atoe_vsnprintf to behave as vsnprintf, and return the number of characters which would have been written if enough space had been available. We can then use this len to allocate an array of the correct size
  2. Similar to your original suggestion, in atoe_sprintf, we can use a len return value of BUFLEN to signify truncation, and take the "slow path" which involves allocating a char array with the correct length found from invoking vsnprintf

What are your thoughts on these approaches? Do you think it's worth it to explore editing atoe_vsnprintf?

@babsingh
Copy link
Contributor

What are your thoughts on these approaches? Do you think it's worth it to explore editing atoe_vsnprintf?

I am in favour of the first approach.

  • The first approach is more efficient. We can avoid the extra va_list, and calls to va_copy and vsnprintf for determining the size.
  • #define vsnprintf atoe_vsnprintf: Also, we will match the widely accepted definition of vsnprintf (V2).

In an online search, I found two different definitions of vsnprintf: V1 and V2.

V1 is only IBM specific whereas all others follow V2. V1 may have been documented incorrectly.

If we decide to change our implementation from V1 to V2,

  • we will need to make sure that existing usages of atoe_vsnprintf do not break;
  • we will also need to account for changes to any related tests; and
  • we should also add a test for truncation, which may not exist at this point.

@joransiu Do you foresee any issues in changing our atoe_vsnprintf implementation to match V2?

@joransiu
Copy link
Contributor

The first approach is more efficient. We can avoid the extra va_list, and calls to va_copy and vsnprintf for determining the size.

Just to make sure I'm following..... Even with the first approach, if the returned size on the atoe_vsnprintf call is greater or equal to BUFLEN, we'd still need a subsequent call to atoe_vsnprintf with the properly sized buffer. Won't this force an extra va_list / va_copy in the slow path?

Do you foresee any issues in changing our atoe_vsnprintf implementation to match V2?

No, I scanned the code calling atoe_vsnprintf, and don't foresee any issues. I agree with approach to update the definition to V2.

@babsingh
Copy link
Contributor

babsingh commented Jul 26, 2022

Just to make sure I'm following..... Even with the first approach, if the returned size on the atoe_vsnprintf call is greater or equal to BUFLEN, we'd still need a subsequent call to atoe_vsnprintf with the properly sized buffer. Won't this force an extra va_list / va_copy in the slow path?

It actually depends on the compiler as per https://stackoverflow.com/a/10627731. Not sure what the behaviour is on Z-platforms.

In atoe_vsnprintf, we are only iterating over the args via va_arg. We might also be able to reuse the list by repeating the va_start and va_end calls. Ref: https://stackoverflow.com/a/15660085.

I will leave it to @danjapapajani to do the experiments and determine the most efficient approach that is feasible.

@danjapapajani danjapapajani changed the title Edit atoe_sprintf to handle arguments of any length WIP: Edit atoe_sprintf to handle arguments of any length Jul 28, 2022
@danjapapajani
Copy link
Author

danjapapajani commented Jul 28, 2022

UPDATE: committed new changes to atoe.c and atoe_utils.c. I have put the PR into a WIP due to the cleanup I will provide later. Relevant information:

  • Changes made in atoe_sprintf and atoe_vsnprintf are working as expected. It is correct that we can reuse args by calling va_start on the same va_list, as long as va_end is called first: https://stackoverflow.com/questions/15659859/how-can-i-get-my-va-list-arguments-to-repeat-itself/15660085#15660085
  • For functions atoe_vfprintf and atoe_vsprintf, the recommended way to reuse incoming va_list parameters is unfortunately to make a copy. This is because the va_list parameter is destroyed after it's first call to vsnprintf: https://stackoverflow.com/a/10627731
  • For functions atoe_vfprintf and atoe_fprintf, the return result regarding the classpath is correct, but when opening the file to read, is truncated. I used a .txt for testing. When opening the file the following appears at the bottom: "check.txt" 1 lines, 6166 characters 1 Lines too long - truncated

Overall, the only change I feel comfortable merging as of right now would be that of atoe_sprintf and atoe_vsnprintf. More thorough testing must be done on the other function changes. Let me know what you think about next steps- if we want to create a separate PR for the working changes while we pursue testing for the others, for example.

@knn-k
Copy link
Contributor

knn-k commented Jul 29, 2022

@danjapapajani Please rebase your commit so that unrelated changes don't appear in https://github.com/eclipse/omr/pull/6601/files . Thanks.

…rguments of any length by:

-Add variable in atoe_vsnprintf() to track the number of characters which would have been written to the final string if enough space had been available
-Edit atoe_sprintf, atoe_fprintf, atoe_printf,atoe_vfprintf, and atoe_vsprintf to create a new char array with the correct size

Signed-off-by: Danja Papajani <danja.papajani@ibm.com>

ebuf = a2e(buf, len);
va_end(args);
if (-1 == len) {
Copy link
Contributor

@joransiu joransiu Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_buf would need to be freed on this path prior to the return statement. There are similar cases in the other functions you updated.

@@ -496,6 +497,7 @@ atoe_vsnprintf(char *str, size_t count, const char *fmt, va_list args)
goto next_char;
case 's':
strvalue = va_arg(args, char *);
final_len += (int)strlen(strvalue);
Copy link
Contributor

@joransiu joransiu Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to review and handle the final_len calculations for all the other cases besides "%s".

@joransiu
Copy link
Contributor

For functions atoe_vfprintf and atoe_fprintf, the return result regarding the classpath is correct, but when opening the file to read, is truncated. I used a .txt for testing. When opening the file the following appears at the bottom: "check.txt" 1 lines, 6166 characters 1 Lines too long - truncated

@danjapapajani : That message suggests it's a limitation in the application you areusing to review the output file. What if you cat or offload the file to your desktop environment. Are all the expected characters present in the output? Might be good to add some explicit tests to OMR for lengths > BUFLEN to validate.. at least for the s*printf functions.

Similarly, atoe_printf returns a correct classpath length, but is cut off from viewing in the terminal.

Can you pipe the output into a file, and confirm the printf output is valid?

When trying to write to a file instead, atoe_vsnprintf returns a value of -1 which occurs in this check: https://github.com/eclipse/omr/blob/master/util/a2e/atoe_utils.c#L104

Why is that the case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants