Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined memory functions #157

Open
notcancername opened this issue Aug 7, 2023 · 4 comments · May be fixed by #159
Open

User-defined memory functions #157

notcancername opened this issue Aug 7, 2023 · 4 comments · May be fixed by #159
Assignees
Labels
enhancement New feature or request

Comments

@notcancername
Copy link

notcancername commented Aug 7, 2023

tl;dr

user-specified memory functions are hyperbased for a whole host of reasons, we've thoroughly discussed it in the discord, and I will make a PR to implement them Soon™ unless someone objects, in which case we go back to the drawing board.

Status quo

Currently, concord uses the dynamic memory allocation routines provided by the OS. It does not check their return values, even though they can fail. This is probably fine -- not elegant or correct, but perfectly usable.

Proposed change

This backwards-compatible change would introduce a function ccord_global_init_memory, which would take as arguments the following five function pointers:

void *malloc_fn(size_t length), a malloc equivalent.
void *calloc_fn(size_t nmemb, size_t length), a calloc equivalent.
void *realloc_fn(void *old_ptr, size_t length), a realloc equivalent.
void free_fn(void *ptr), a free equivalent.
char *strdup_fn(const char *str), a strdup equivalent.

The names are irrelevant, please reply if you wish for them to be changed.

All functions must be thread-safe. They need not be async-signal-safe (Async-signal-safety would be impractical for many implementations).
The functions must behave as follows:

  • malloc_fn allocates exactly length bytes of memory and returns a pointer to it. It must return a valid pointer or abort execution. The returned pointer must be suitably aligned for any type. If length is 0 or exceeds PTRDIFF_MAX, the behavior is undefined.
  • calloc_fn allocates memory for exactly length items of size nmemb and returns a pointer to it. It must return a valid pointer or abort execution. The returned pointer must be suitably aligned for any type. If length or nmemb are 0, or the value of length * nmemb exceeds PTRDIFF_MAX, or the multiplication overflows, the behavior is undefined.
  • realloc_fn returns a pointer to a new allocation of length length for the data in old_ptr . It must return a valid pointer or abort execution. It need not (but may) return old_ptr. The returned pointer must be suitably aligned for any type. If length is 0 or exceeds PTRDIFF_MAX, the behavior is undefined.
  • free_fn releases the memory pointed to by ptr. It must succeed or abort execution. If ptr is a null pointer, it must return without any operation being performed.
  • strdup_fn returns a new allocation for the null-terminated string pointed to by str. It must return a valid pointer or abort execution. The returned pointer need not be aligned. If str is a null pointer, or the length of str exceeds PTRDIFF_MAX, the behavior is undefined.

The function ccord_global_init_memory would:

  • Initialize global variables with the specified function pointers.
  • Call curl_global_init_mem with callbacks derived from these function pointers, as specified by libcurl.
  • Perform all work previously performed in ccord_global_init.

The implementation of ccord_global_init would change as follows:

  • Call ccord_global_init_memory with callbacks derived from the system's implementation of these functions.

The interface and behavior of ccord_global_init would not change. This change is entirely backwards compatible.

To prevent decision fatigue and confusion for the library user, a note stating that ccord_global_init_memory is reserved for advanced use cases exclusively, and that ccord_global_init should be used, would be prominently displayed in the documentation.

All calls to malloc, calloc, realloc, free, and strdup would need to be converted to calls to internal functions with identical signatures. No other alterations to the calling code would be required, making porting trivial. These functions would, in debugging builds of concord, assert compliance with the interface specified above, verifying the existing assumption that memory allocation functions cannot fail.

Use cases

  • Using concord on memory-constrained systems, like SBCs.
  • Using concord with a limit on memory usage.
  • Using concord with extra memory allocation information or logging.
  • Using concord without dynamic allocation by allocating within a statically or automatically allocated buffer.
  • Using concord with bindings from programming languages that provide their own memory allocation routines or have standard mechanisms that allow the user to choose how to allocate memory.
  • Using concord with custom allocators, like jemalloc or mimalloc.
  • Using concord with a GC.

Additional context

I have sought critique on the discord server. This led to the following reasoning for the choices made:

  • These functions must not fail because the existing code relies on this behavior already and I intend to make this change as minimally invasive as possible.
  • The reasoning for the curl memory allocation functions to be set based on the concord memory allocation functions is to enable the user to control memory allocation for the entire program. A possible second reason would have been to resolve the "curl free bug", apparently caused by us failing to use curl_free to release memory acquired by libcurl. This is invalid, as curl's memory allocation debugging is implemented on top of the memory allocation functions provided to it.
  • Thread safety is required because it is assumed that a library user using this interface knows what they are doing and the performance penalty of implementing thread safety with a mutex on top of the user-provided memory allocation functions, possibly redundantly, may be too great.
  • Performance considerations should generally be taken seriously, however, according to this StackOverflow answer, an indirect call only takes approximately 7ns more than a direct call. Thus, I consider performance concerns to be unfounded until a benchmark indicates otherwise.
  • Concerns revolving around decision fatigue and confusion are addressed by documentation and complete backwards compatibility.

I would happily implement this. I may underestimate the scope of this change, but I do not expect it to be particularly technically challenging, difficult to implement cleanly and elegantly, or labor-intensive.

Thank you for considering this proposal.

@notcancername notcancername added the enhancement New feature or request label Aug 7, 2023
@lcsmuller
Copy link
Collaborator

Hello @notcancername, I think anything that adds customizability to concord while also being fully backwards compatible is a win in my book.

Not sure if it's in your plans, but to make sure this works well I think we should also implement some custom dynamic memory functions of our own for when concord is built on debug mode (memory tracking purposes). This should give us a good idea if the implementation makes sense.

I may underestimate the scope of this change

You might be a little bit, but we'll see how it goes!

@HackerSmacker
Copy link
Collaborator

I'm about a day delinquent on this, so, let me compose a long and supposedly eloquent response. I'll do this as a bulleted list:

malloc output isn't being checked!
Honestly, personally, I feel that if malloc is failing, there are now other issues that are out of the scope of Concord. It could be a memory exhaustion issue (in which case, something somewhere else is about to start suffering, the OS is paging aggressively, or there is some hardware fault with the memory (I don't know of any OSes that report this as a failed malloc, usually it'll just emit a message on the operator console and hope the system operator is smart enough to shut the system down and do hardware repair). If malloc fails for whatever reason, I think a message should be logged somewhere, as that could be implicative of some other issues somewhere else. This should definitely be checked.

Signal safety need not apply!
Why not be signal-safe? Granted, it's hard to implement, but, like all good things in life, one must consider if this is really necessary or not -- I personally don't think it'll be an issue (you don't have to use SIGALRM because you can use a Discord timer), but it could be a very specific use-case thing. Should someone require it, they can implement it on their end since I don't think this is a pressing matter.

Debugging purposes
This is an extremely good reason in my book. "Back in the day," there were (and still are, look at Android's libc) so-called "debugging malloc" libraries that would aid with debugging programs that don't work so well. With tagged debugging information within Concord, debugging information would be more sophisticated than some basic lowest-common-denominator stuff. I think this is reason enough to implement this proposal.

Other language support
Ideally, for other languages, you don't want to pull control out of their runtime library with random C library functions -- this could especially be an "issue" in languages that have more sophisticated memory-safety features that don't look too positively upon hidden-away memory allocation or freeing.

Using a GC with Concord
That'd be an interesting concept, but, I don't know if anyone's really looking for that per se, most C programmers look down upon the idea of garbage collection, since all of that was figured out ahead-of-time in the program implementation stage. Nonetheless, it'd be a cool technical feat!

@notcancername
Copy link
Author

malloc error checking

I agree. The system functions installed by ccord_global_init check the return value.

why no signal safety

mmap(2) is not AS-safe. fprintf, which is used for error reporting, is not AS-safe.

debug malloc

Seems reasonable, I probably won't implement this in this proposal though. It would also require some fairly unclean macro hackery.

other language support

I agree.

GC

fair

@HackerSmacker
Copy link
Collaborator

The printf family is a classic example because you can (quite easily) get two strings mixing together on the terminal. Fortunately, I don't think it'll be a massive issue for us at all, since we basically don't ever handle signals. I think that weeds out most of the possible issues!

@notcancername notcancername linked a pull request Aug 9, 2023 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants