Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support files with 2^31 or more PRNs #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rabauke
Copy link

@rabauke rabauke commented Apr 14, 2024

Given the typical storage capacities of recent hardware, dieharder should support reading PRNs from files with more than 2^31 numbers.

@@ -325,12 +325,12 @@ static void file_input_set (void *vstate, unsigned long int s)
}
}
if(strncmp(splitbuf[0],"count",5) == 0){
state->flen = atoi(splitbuf[1]);
state->flen = atoll(splitbuf[1]);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change. count might be larger than INT_MAX, thus atoll is preferred.

@@ -170,7 +170,7 @@ static unsigned long int file_input_get (void *vstate)
state->rptr++;
state->rtot++;
if(verbose){
fprintf(stdout,"# file_input() %lu: %lu/%lu -> %u\n", state->rtot, state->rptr,state->flen,(uint)iret);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the original off_t is an alias for the signed type long int. Furthermore, one can never know what off_t actually is. Could be any signed 64-bit-integer, e.g., long int or long long int on Linux on x64, which are distinct types with distinct format strings.

@@ -274,9 +274,9 @@
*/
typedef struct {
FILE *fp;
off_t flen;
off_t rptr;
off_t rtot;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no good reason for using off_t. No POSIX api is used, which requires off_t.

@eddelbuettel
Copy link
Owner

While we're at it, should we modernise this use of long vs long long and pin it down with int64_t ?

and improve error handling in parameter parsing
@rabauke
Copy link
Author

rabauke commented Apr 14, 2024

@eddelbuettel I updated the PR incorporating your suggestion.

Copy link
Owner

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice enhancement!

@eddelbuettel
Copy link
Owner

@mbroz Before I merge it, do you mind having a look at it too?

@mbroz
Copy link
Contributor

mbroz commented Apr 15, 2024

Well, I am afraid this will not be so simple. First, this is the ABI/API libdieharder break, which modifies exported prototypes in libdieharder.h (I guess the Debian build will scream for this).
And why not use uint64_t here? Also, IIRC that sscanf does not check for some conversion errors (we usually use strtoull() and check errno for ERANGE, but that's detail.)

Anyway, this will probably not be a simple place where a 32-bit count is used...
(We fight with side effects with various "simple" changes for 64-bit offset for years in storage applications, so be very careful here.)
I can check it, but not until next week, if it can wait.

@rabauke
Copy link
Author

rabauke commented Apr 15, 2024

If ABI/API stability is a concern, I could revise the change to use off_t again. The main issue with off_t is that it has been introduced for a rather specific use-case. It's not a general purpose integer type and there is no sane way to convert a string directly into an off_t. atoi (which is currently used) is certainly not a good choice.

I was not aware that the behaviour of sscanf is undefined if the result of the conversion cannot be represented in the object (e.g. integer overflow). I agree, strto* should be used instead of ato* or *scanf.

@mbroz
Copy link
Contributor

mbroz commented Apr 29, 2024

I think most of this patch is actually not needed.

dieharder.h alteady contains line
#define _FILE_OFFSET_BITS 64
that makes off_t type 64bit on all platforms.

Perhaps we can add AC_SYS_LARGEFILE macro to configure.ac that will switch off_t to 64bit in config.h globally.

But even without that, I see sizeof(off_t) 8-bytes (64bits) even on 32bit platforms. So why we need to redefine off_t to different type? Perhaps only CLI parsing fix is needed?

@eddelbuettel
Copy link
Owner

dieharder is clearly a child of another time and period, and 32-bit systems are mostly gone now.

And I generally like both to simplify, and to keep changes small and limited. @rabauke raised a valid point: we can / could / should support larger files. I still see as a bit of edge case. Maybe we can try to address that more narrowly?

@mbroz
Copy link
Contributor

mbroz commented Apr 29, 2024

I think that redefinition of _FILE_OFFSET_BITS already switches everything to large file support. Compilation on 32bit produces some warnings (some of them will be off_t in printf for sure), but these should be easily fixed. I am not sure anyone uses dieharder on 32bit systems, but with large file support it should work identically to 64bit systems.

So the only problem here is to properly handle 64bit off_t in terminal input/output. Or do I miss anything here?

}
}
if(strncmp(splitbuf[0],"numbit",6) == 0){
filenumbits = atoi(splitbuf[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more comment - filenumbits is int, so atoi is enough here. Just it should check for 0 as error (0 does not make sense anyway here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants