Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some updates for PR 157 #171

Merged
merged 1 commit into from
Apr 20, 2023
Merged

some updates for PR 157 #171

merged 1 commit into from
Apr 20, 2023

Conversation

afni-rickr
Copy link
Contributor

  • put variable definitions at function blocks
  • print int64_t using PRId64
  • znz_off_t varies, so try not to print
  • extensions are int size, limit size and check in read_file_text()
  • call strlen() only after NULL check
  • modify_field(), check length before copy
  • act_disp_ci() limited to int number of elements
  • formatting

@@ -5021,7 +5022,7 @@ nifti_image* nifti_convert_n2hdr2nim(nifti_2_header nhdr, const char * fname)

nim->nifti_type = (is_onefile) ? NIFTI_FTYPE_NIFTI2_1 : NIFTI_FTYPE_NIFTI2_2;

int byteOrder = nifti_short_order() ;
byteOrder = nifti_short_order() ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why split the declaration and initial assignment? The codebase clearly doesn't support C89, as things like int64_t are used unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not required by the compiler, it is certainly the convention in the code, making location of types predictable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if that's the project's style, I can follow it in the future, and update my other pending PRs accordingly.

But I don't think that style is advantageous. It's not for nothing that C99 and C++ got rid of the old C requirement for having declarations at the top. It puts everything in a larger scope than necessary, see for example:

https://wiki.sei.cmu.edu/confluence/display/c/DCL19-C.+Minimize+the+scope+of+variables+and+functions

Tools like cppcheck and clang-tidy can warn about such things. I was going to work on such a scope reduction patch after my other cppcheck PR lands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do not alter the style of programming in this library. Save that for a new library if you wish to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will keep the C89 style in future changes.

long long new_ecode = strtoll(argv[ac], NULL, 10);
if ((new_ecode == 0 && errno != 0) || new_ecode <= INT_MIN || new_ecode >= INT_MAX) {
fprintf(stderr,"** invalid -add_ext parameter\n");
new_ecode = atoi(argv[ac]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why revert to atoi? It is obsolete. It results in undefined behaviour if the string can't fit into an int.

See for example:

https://wiki.sei.cmu.edu/confluence/display/c/MSC24-C.+Do+not+use+deprecated+or+obsolescent+functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, it looks like it is being pushed towards obsolescence, but of course, will never be. But I believe it returns 0 on error, not an undefined value. I wanted to remove the unnecessary use of long long (which is a very annoying type, since it's actual type varies). The ecode is question only has a small range of acceptable values, which will be checked. Using atoi() is not important, but going to large types is ugly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I believe it returns 0 on error, not an undefined value

C99 7.20.1 "If the value of the result cannot be represented, the behavior is undefined."

... use of long long (which is a very annoying type, since it's actual type varies)

That's true of all the types though, even the int that atoi uses.

but going to large types is ugly

Ugly is preferable to undefined behaviour though.

fprintf(stderr,"** RFT: file '%s' appears empty\n", filename);
return NULL;
} else if( len64 > INT_MAX ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INT32_MAX would be more correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I almost never use those terms, but would expect INT_MAX to be reasonably appropriate when comparing with ints. Either is fine. We require int to mean int32_t.

@@ -3829,8 +3834,7 @@ int modify_field(void * basep, field_s * field, const char * data)
if( g_debug > 1 )
fprintf(stderr,"+d modifying field '%s' with '%s'\n", field->name, data);

size_t dataLength = strlen(data);
if( !data || dataLength == 0 )
if( !data || strlen(data) == 0 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

{
fprintf(stderr,"** FAILURE for dataset '%s'\n", nim->fname);
if( data ) { free(data); data = NULL; }
err++;
} else if ( len64/nim->nbyper > INT_MAX ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INT32_MAX would be more correct.

} else if ( len64/nim->nbyper > INT_MAX ) {
fprintf(stderr,"** %" PRId64 " is too many values to display\n",
len64/nim->nbyper);
if( data ) { free(data); data = NULL; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if(data) check is unnecessary. free(NULL) is well-defined.

@@ -6710,7 +6721,8 @@ int act_disp_ci( nt_opts * opts )
fprintf(stdout,")\n");
}

disp_raw_data(data, nim->datatype, len / nim->nbyper, space, 1);
/* should we change disp_raw_data to allow for 64-bit nvalues? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, and I did in #165

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, though the counter was only of type int.
Still, it would be more clear to put functionality enhancement PRs separate from code cleanup ones. This looked more like "well, we've altered the type, so now it should propagate". It would be fine to go 64-bit here, but if you want that, maybe just make it a separate change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, it would be more clear to put functionality enhancement PRs separate from code cleanup ones.

I did split my -Wshorten-64-to-32 fixes into 2 PRs:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would like to make the 64-bit update for this, please do not change library APIs in that PR. Modifying types in nifti_tool is okay, modifying them in the library is a bigger deal. Let's keep them separate. Any library updates should be done in the context of heading to a 4.x.x release. We should make useful updates in the current 3.x.x version, first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are separate. That's what I meant when I said "I did split my -Wshorten-64-to-32 fixes into 2 PRs". The former is merged already, the latter I don't expect to be merged in the short term.

@@ -9258,7 +9261,8 @@ static int rci_read_data(nifti_image * nim, int64_t * pivots, int64_t * prods,

/* make sure things look good here */
if( *pivots != 0 ){
fprintf(stderr,"** NIFTI rciRD: final pivot == %lld!\n", *pivots);
fprintf(stderr,"** NIFTI rciRD: final pivot == %" PRId64 "\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to remove the exclamation point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seemed like a mistake to be there. I am not a fan of programs yelling at the users. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, way to pay attention to the little details... :)

@@ -4405,15 +4405,15 @@ static int nifti_read_extensions( nifti_image *nim, znzFile fp, int remain )
return -1;
}

znz_off_t posn = znztell(fp);
posn = (int)znztell(fp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

znztell returns znz_off_t. The cast you've added could cause information loss.

You mentioned "znz_off_t varies, so try not to print" which I guess is why you made this change...

But what's wrong with using %lld for znz_off_t? If znz_off_t is smaller than long long it'll still get printed correctly.

With this change, if znztell returns a huge number, your error message won't even print the actual thing znztell returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

posn is required to be 358, anything else is a fatal error. And %lld will produce compiler warnings on many systems (since znz_off_t varies). Using int64_t would be okay, but seems like overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it is not a fatal error, just a warning (since we don't care soooo much about extensions, I guess)...

Copy link
Contributor

@seanm seanm Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And %lld will produce compiler warnings on many systems (since znz_off_t varies)

Ah true, -Wformat will warn about that. Then the most correct thing is to:

  1. not do the truncating cast you added
  2. cast the %lld param to long long to match %lld
znz_off_t posn = znztell(fp);

fprintf(stderr,"-d nre: posn = %lld, offset = %d, type = %d, remain = %d\n",
    (long long)posn, nim->iname_offset, nim->nifti_type, remain);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will do. I sill worry about getting warnings, we will see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Wformat won't warn then, I tried.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These things can vary across systems, as the sizes vary. "long" types do not seem to be so reliable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their size varies, but "%lld" means "long long" regardless of what the size of long long is.


if( !dp ) return 0; /* nothing to clear */

size_t len = strlen(dp);
len = (int)strlen(dp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use size_t here?

Sure, it's super-unlikely that any string passed to this function will be longer than INT_MAX, but still... are you trying to save 4 bytes of stack space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason size_t was added was because of warnings, and casting is generally the more simple and less impactful way to do that. The function is clearing some number of zeros from a maximum 32 character string. Unnecessarily promoting variables to higher sizes confuses the purposes of the functions. int isn't better than size_t here, but it would have been the preferred change in the first place. This is just more in line with how overall changes should be made to this mature (but not perfect) library. Unless something is actually broken, have as little impact as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason size_t was added was because of warnings

I turned on -Wshorten-64-to-32 to find bugs, not just to shut the compiler up. This function clear_float_zeros is buggy. It does not do what it's supposed to do when the string is huge. The warning found this bug for us.

Unnecessarily promoting variables to higher sizes...

There is no promotion to higher sizes. strlen itself returns size_t. strlen will correctly compute the length of a huge string. My changing the len variable from int to size_t only makes it match what strlen is already giving us. Your changing it back to int here is discarding 50% of strlen's result.

We can even compare the assembly of the int vs size_t variants. The truncation to int takes extra work, on top of just being incorrect.

Unless something is actually broken, have as little impact as possible.

I understand not wanting to add regressions to a mature codebase. We indeed have to be careful of that. But something is actually broken here, albeit not something likely to be encountered in normal use.

The fix (changing int to size_t) is extremely safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bring this back to size_t. I appreciate you being open to argument!

@@ -3885,7 +3885,9 @@ int disp_raw_data( void * data, int type, int nvals, char space, int newline )
{
float temp;
memcpy(&temp, dp, sizeof(temp));
snprintf(fbuf,sizeof(fbuf),"%f", temp);
snprintf(fbuf, NT_LOC_MAX_FLOAT_BUF,"%f", temp);
/* terminate, since some versions might not */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never heard of any snprintf with such a disastrous bug...

Are you perhaps thinking of the old Microsoft _snprintf (note the leading underscore)? See:

https://stackoverflow.com/questions/7706936/is-snprintf-always-null-terminating

As we are not calling _snprintf this change should not be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Bug" is not for us to decide regarding this, the question is what does the doc say vs what does the function do. Reading the man page many times, it is not 100% clear that it always does the termination, the help seems a bit ambiguous. But yes, I guess it was the _ version that raised the flag.
So I don't really care about this either way. Being safe is good, not mentioning M!cr*s@ft is very good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Bug" is not for us to decide regarding this

The C standard requires snprintf to always null terminate. I've updated sprintf to snprintf in many open source libraries recently, and have never seen the MS version having any problem. It was only the old non-standard underscore version that was busted.

I would remove this change, it an unnecessary complication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is the last question remaining for this PR. You can choose: keep the termination or remove it, what do you prefer? I am happy keeping it. If I have to do google searches to verify the functionality, it is preferable not to make "the next programmer" have to as well, so clarity is nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will remove this, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I did look at the C standard, and indeed it is quite clear about the null termination, while the man page is not quite so clear. That should by my go-to reference, rather than the man pages. Thanks.

@seanm
Copy link
Contributor

seanm commented Apr 20, 2023

@afni-rickr can you squash the 3 commits? Otherwise history will be weird with changes done and undone.

@afni-rickr
Copy link
Contributor Author

I was going to just do this as a new PR and a single commit, unless you know of a sneakier way to go.

@seanm
Copy link
Contributor

seanm commented Apr 20, 2023

I was going to just do this as a new PR and a single commit, unless you know of a sneakier way to go.

You can transform this PR into a single commit by "squashing the commits". Is that something you are unfamiliar with? If so, it's not that hard, I can walk you through it...

@afni-rickr
Copy link
Contributor Author

I was going to just do this as a new PR and a single commit, unless you know of a sneakier way to go.

You can transform this PR into a single commit by "squashing the commits". Is that something you are unfamiliar with? If so, it's not that hard, I can walk you through it...

Sure, I have generally avoided that. My concern is that committing a git reset --soft HEAD~3 result would require a force push, diverging with other versions of this branch (e.g. yours). How would you personally perform the squash?

@seanm
Copy link
Contributor

seanm commented Apr 20, 2023

I would do git rebase -i HEAD~3 and mark the last 2 commits to be squashed. Then git push -f origin HEAD to update this PR. The force is no problem because this PR isn't merged to master yet, so nothing else is based on these commits.

@afni-rickr
Copy link
Contributor Author

afni-rickr commented Apr 20, 2023

Okay, will do. But to be sure, this would cause your local branch of update_157 to diverge, is that right?
Thanks for the info.

@@ -219,7 +218,7 @@ static int g_debug = 1;
/* local prototypes */
static int free_opts_mem(nt_opts * nopt);
static int num_volumes(nifti_image * nim);
static char * read_file_text(const char * filename, int64_t * length);
static char * read_file_text(const char * filename, int * length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why revert this one (and some others) back to int?

Is it because you still consider this part of the API?, even though it is:

  1. static and thus private to the library
  2. has no prototype in the header file

@seanm
Copy link
Contributor

seanm commented Apr 20, 2023

Okay, will do. But to be sure, this would cause your local branch of update_157 to diverge, is that right? Thanks for the info.

I only have a local branch to do code review on my own computer, for big changes I find the github UI not great.

@afni-rickr afni-rickr merged commit 848913f into master Apr 20, 2023
@seanm
Copy link
Contributor

seanm commented Apr 22, 2023

Ah, there were a few other changes I was going to suggest, but as you've already merged, I'll make a new PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants