Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Memory management: limited memory with dataClay GC

dgasull edited this page Dec 2, 2019 · 1 revision

At Problem section we stated:

We have specified the problem, next sections explain the solution. Our problem does not have only one possible solution. Actually, we have found two kinds of solutions:

  • Solutions based on limited access to resources: we establish some limits to users, threads, requests, … in order to ensure that there is no leak.
  • Solutions based on unlimited access to resources: we modify our core design, our use cases (dataClay threads, execution classes, models) or which services we provide to ensure that there are no leaks.

Limited solution

This solution will try to solve all ‘reference retention’ cases explained in the Problem section by specifying some limits to the access of resources.

Predicate 1: No memory leak caused by variables kept alive by the stack

Remote executions

Recall that when some threads need to execute some method on another node, they must wait some time T. We will specify a maximum time per request maxT so the waiting threads will have an answer in equal or less than maxT. This means that there will be a maximum time of retention of memory produced by remote calls. However, receiving the answer in a certain time does not mean that the memory will be released, it only means that the remote calls is not going to be responsible for memory leaks.

One possible design of this could be to send the ‘time spent by request’ all the way so if any request requires to produce a new one on another node, the new request will have maxT – spentT where spentT is the time spent by the first request. If a time out is produced, an exception must be raised.

Let’s explain some corner cases:

  • What if my code is calling another user’s code and the time out is caused due to his code? This could happen, but it is the responsibility of the provider of code to provide a code that does not cause a time out.
  • What if I need to register a method that uses N methods of other users and there is a time out since the accumulated time of the N methods is more than the maximum time? This solution forces you to modify your model to not spend too much time. The method to register could do this in many other ways (stages, …). The thing is to provide a usable method.

Specifying a maximum time per request is necessary and enough to solve the retention of memory caused by waiting threads. However it is not enough to solve the case of memory retention caused by the method itself.

Memory filled by objects kept alive by stacks

Here, we have N threads that are keeping alive too many objects in memory. Those objects cannot be cleaned since they are local variables of the threads. The time-out explained before is not enough because the threads could be creating many objects before the time out is produced and therefore our node could still have a memory leak.

In order to solve this problem we could think about establishing a maximum memory per request and raising an OutOfMemoryException if necessary. But this is not possible since objects are shared and could be retained by other requests even if we release some of them. Let’s think. Java Garbage Collector is the one responsible for releasing memory. As we explained in previous sections it cleans all objects that are unreachable:

An object is unreachable if and only if there is no path from a GC root to the the object

However, in dataClay all objects are persisted so we are able to unload and load them from disk if necessary. Then, Java GC should clean all objects except those that are associated to a GC root i.e.:

An object is unreachable if and only if it is not referenced from a GC root

All other objects can be removed from memory and loaded from disk on demand. Since all Java GC algorithms accomplish the first statement (there is no path to…) they are not useful for us. So, if we are able to clean those objects that are not being referenced from a GC root, could we solve our problem?

An unused object is an object that is not being used by any thread. Cleaning all objects that are not being used by a thread means that threads are actually having only primitives objects, immutable objects or ObjectIDs (dataClay representation of references) on their stacks. Therefore, the thread will only consume the memory needed for the object the thread is using for the execution and S elements in the stack. A memory leak could be caused if there are too many elements in the stacks:

  • One thread creates N local variables (Primitives, Immutables or References) exhausting all memory available. This case is solved by Java, StackOverflow exception is raised if there are too many elements declared in a single method. Otherwise we could establish a limit while registering the method. ✔
  • X threads creates M local variables where M < N but M*X is exhausting all memory available. We must solve this. ✗
  • X threads create M collections or arrays and add all the local variables created there. We must solve this. ✗

The problems explained above are memory leaks caused by elements in the stack. Again, a thread must only retain in memory the object being used for execution and the values in the stack. Later, we will talk about memory leaks caused by too many threads (too many objects under execution). Now, we should design a system that is able to:

  1. Unload unused objects
  2. Solve the stack memory leaks explained here.

First, let’s design a system that allows us to unload unused objects. We have two main solutions for that:

Weak references per any variable

First, let’s recall something: In order to provide execution following references and access of shared objects in memory, dataClay has a ‘heap’ of objects, implemented as a map of Weak references. A weak reference is a reference that the GC does not count.

Imagine that we modify all our registered classes so their fields and variables are all weak references. Therefore, GC will be able to remove any object that is no referenced by a GC root. This solution requires to have a ‘finalize’ method in each object. The finalize method should be the one responsible for storing the object in disk when collected. Current version of dataClay is using a finalizer method like this. Also, current version of dataClay is using weak references but from its ‘heap’ or objects cache.

Also, making all references weak implies:

  • Any access to a variable or field must have more instructions (weakRef.getActualRef()…).
  • Weak references can point to null when the object is being collected. We should add an ‘if-else’ statement to control this for each access to a variable. If the actual reference is null, wait…

In conclusion, the bytecode intervention would not be easy and it would imply a serious performance penalty. Finalizers in Java are also less performant since:

“When you have finalizers in the mix, though, the GC can’t simply wipe an entire generation at once. Instead, it needs to figure out all the objects in that generation that need to be finalized, and queue them on a thread that actually executes the finalizers. In the meantime, the GC can’t finish cleaning up the objects efficiently. So it either has to keep them alive longar than they should be, or it has to delay collecting other objets, or both. Plus you have the arbitrary wait time of actually executing the finalizers”

Unload and Garbage Collector Hint

Let’s think more. In this solution we want to unload all objects that are not being referenced by a GC root and update the Database without any finalizer method.

Unload means ‘nullify’ all fields of the object and set the flag ‘isLoaded’ to false. Recall that at the beginning of each method, we inject a code that checks this flag, and if it is false, we load the object from disk. Unloading an object is actually a ‘Hint’ to the GC, since we decrement the number of references (due to nullification).

Our proposal is based on a MSC algorithm (Mark, sweep, compact):

  1. Mark: our objects will have a mark indicating if the object is being used by any thread or not.
  2. Sweep: every once in a while, or maybe only when there is memory pressure (we should define this) we will proceed to analyze our ‘object heaps’ (object cache) and see which objects are marked as not being used. Despite GC algorithms, this does not require a STW (Stop the world). In Java, it is very important that no new references are created during the analysis of the heap. In our case, we will just ‘block’ the object if it is not being used by any thread and we will unload and send it to disk (nullify the object, …)
  3. Compact: Each object that is not being used, is blocked and unloaded.

TODO: update this, no blocking system is needed anymore, we use dirty flags (penalty and injection still true) The ‘blocking’ process is not new. Current version of dataClay requires to block the object if it is being updated or stored to disk so new requests that need it must wait.

This solution allows us to influence the GC of Java. It adds some performance penalization due to the code injection but we avoid using finalizers. Avoiding finalizers should provide a better performance than using them (than the current version of dataClay).

Improvements: This solution could allow us to provide a new functionality: the manual hint. We could provide a function that users can use to force a hint. DataClayObject.unload() for example, means that, for the user, the object can be unloaded since it is not going to be used anymore. If this call is produced, we could improve the Sweep step of our algorithm (every once in a while…). As an example, for the iteration of a DataClayCollection we could add an unload after visiting each Chunk.

Java Garbage collector: It is very important to know that our JVM will continue to be the responsible for releasing the memory. So there will be STW produced by Java GC. In order to reduce the penalty of these STW and since our heaps are big and we have concurrency, we should use the algorithm G1.

Language dependency: This solution is completely agnostic off any language that has a Garbage Collector since what we actually propose is a way to influence it. Of course there will be some parts that are only Java dependant.

At this point, we are able to unload unused objects from memory. Now, we should solve the stack memory leaks explained before. A stack memory leak is the memory leak produced by elements (primitives, immutables or references) in the stacks but without any other object in memory except the ones being used by the threads. It is very important to understand the difference! Applying the solution explained, we are able to unload unused objects, so let’s suppose for each thread we only have elements in the stack and the object being used. Let’s solve the leaks:

X threads creates M variables where M<N but M*X is exhausting all memory Imagine that our available memory only allows 100 integers. Imagine that 100 threads are creating one integer each. How are we going to solve this? We need a new limit. We specify a maximum number of concurrent threads in each node. So, in our example, if we allow 50 threads, they have enough memory to work and a maximum time (time out specified before).

X threads create M collections or arrays and add all the variables created in A collection or an array is actually a reference to an object. However, collections and arrays are not DataClayObjects in the current version of dataClay. This problem is related to another issue, the ‘Big objects’ problem. The solution to the Big Objects problem allows us to turn arrays and collections into DataClayObjects and therefore this case is solved. Let’s explain it then:

Big objects

Our files, database rows or others have a maximum size. This implies that our objects should also have a maximum size. Before explaining the solution we should specify that:

Any object that can be considered big contains at least one collection or array

We understand that:

  • String are arrays of characters
  • Maps are collections of key – values.

This statement is true since in Java any field (that is not primitive) is a reference. So, if a class does not contain any collection or any array it will only have primitives or references to other objects (ObjectIDs for us). You could say that if I add thousands of primitive fields in a class the object could be very big, but currenty languages have a limit of fields in a class. Only classes containing collections or arrays can be very big since these structures can have MAX_INTEGER references or primitives inside.

The current version of dataClay is treating all arrays and collections as ‘immutable’ objects and they are embedded into the object they belong to. This is not correct since the object can be big and the arrays and collections cannot be shared.

The solution to this requires that dataClay is be able to ‘wrap’ those collections and arrays into DataClayObjects. This wrapping could imply a penalty since some bytecode modification might be necessary.

Having collections and arrays in wrappers allow us to be able to unload them and split them in chunks solving the ‘stack memory leak’ explained before.

Improvement: Using our Garbage Collector Hint algorithm we could improve the cleaning of these arrays and collections.

Embedded objects: Remember that the users have the ability to specify their own serialization and therefore they can ‘embed’ objects. This process is the user’s responsibility but we should specify some limits to its user-defined serialization to avoid serializing big objects.

Language dependency: This solution is very conceptual and therefore is language agnostic. Each language should be able to ‘wrap’ those collections or arrays into DataClayObjects.

Discarded option: There was another solution for big objects that was discarded. The splitting of classes. Splitting a class could reduce the size of the object (only some parts of the objects are loaded) but the problem of collections and arrays would still be present.

At this point, we have solved the Predicate 1 in our Limited Solution!

Predicate 2: No memory leak caused by alive threads

Remember that this memory leak is caused because there are too many Thread objects, and since they are GC roots they cannot be cleaned. Our solution here is easy and it already appeared in the previous Predicate, we establish a limit of concurrent requests per node and therefore it is not possible to have too many Thread objects that cause a memory leak.

The worst case here is N threads that do not finish due to remote executions or long executions (while true). It sounds familiar right? Yes, the solution to the first part of the Predicate 1 is also valid here, a time out.

Finally, we have also considered specifying a maximum number of nested requests. Given that we could make sure that users do not register classes that could cause ‘too many jumps’. This limitation is not necessary since we have a time-out.

We have formally proved that our limited solution satisfies both predicates.

Clone this wiki locally