Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Virtual Threads #2055

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

bdeneuter
Copy link

It seems that synchronized will still pinn carrier threads in JDK 21 for the moment. This is the draft JEP for JDK21:
https://openjdk.org/jeps/8303683

So I'm reoping the PR for using ReentrantLock.

@lfbayer
Copy link
Collaborator

lfbayer commented Apr 5, 2023

Reading this page seems to imply that having the synchronized blocks is supported in Java 19.
https://neiljbrown.com/2022/07/23/preparing-for-java-virtual-threads/

However, currently there is a limitation which means code executing in a synchronized block may in some cases hinder the throughput benefits of virtual threads.

This statement in that page implies that the synchronized blocks might not be a problem. Do you have any benchmarks or tests that can show that there is a real benefit to changing to using the locks? Or is there some evidence that the synchronized blocks are in fact not supported?

…atements can throw an exception and and as a consequence won't release the lock
@bdeneuter
Copy link
Author

@lfbayer Virtual threads use a small number of carrier threads (OS Threads) to execute. Normally this will be the same amount as the number of CPUs. If the virtual thread runs in a synchronized block, the carrier thread won't be available to execute other work when the virtual thread performs blocking IO. So it depends. When no IO is being performed, synchronized is fine to use. But when IO is being performed it will be better to use a ReentrantLock as the carrier thread can perform other work in the meantime.

I was hoping that this limitation would be gone in JDK 21 but as the draft JEP is being prepared, it currently seems the limitation will still be present in JDK 21. It might still change of course in the coming months.

@slutmaker
Copy link

slutmaker commented Jul 7, 2023

I was hoping that this limitation would be gone in JDK 21 but as the draft JEP is being prepared, it currently seems the limitation will still be present in JDK 21. It might still change of course in the coming months.

There is already confirmed information that in Loom GA in JDK 21 this problem still persists. So I think we should accept this pull request.

https://openjdk.org/jeps/444
изображение

Copy link

@slutmaker slutmaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock() call must be in try {} block

@bdeneuter
Copy link
Author

lock() call must be in try {} block

I don't think it must be in the try block. This is documented in the ReentrantClass:
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/locks/ReentrantLock.html

class X {
   private final ReentrantLock lock = new ReentrantLock();
   // ...

   public void m() {
     lock.lock();  // block until condition holds
     try {
       // ... method body
     } finally {
       lock.unlock();
     }
   }
 }

@slutmaker
Copy link

lock() call must be in try {} block

I don't think it must be in the try block. This is documented in the ReentrantClass: https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/locks/ReentrantLock.html

class X {
   private final ReentrantLock lock = new ReentrantLock();
   // ...

   public void m() {
     lock.lock();  // block until condition holds
     try {
       // ... method body
     } finally {
       lock.unlock();
     }
   }
 }

you're right

@cj848
Copy link

cj848 commented Jul 19, 2023

When can this PR be merged and released? JPA
I'm using it through Virtual Thread and I'm really looking forward to it.

@enragedginger
Copy link

JDK 21 has released. Virtual threads are no longer an experimental feature. Can we get a new release with this change included?

@XcrigX
Copy link

XcrigX commented Oct 19, 2023

This is somewhat of a philosophical question related to this PR.

In previous multi-threaded usages involving Hikari, there was likely a threadpool of platform threads which effectively throttled the number of requests to the connection pool.
If applications simply switch from a fixed threadpool to virtual threads this throttling goes away.
One can imagine then that there might be higher contention over connections in the connection pool, resulting in (likely) timeout exceptions waiting for a free connection.

I'm trying to figure out where/how to handle this?

Applications can implement their own counting semaphores around DB-bound operations to re-introduce throttling - but such operations could be sprinkled all over an application. It's probably just cleaner to stick with platform threads in a threadpool at that point.

Applications could increase the connection pool timeout, but this may mask actual errors creating DB connections and seems iffy regardless.

Should Hikari itself put counting Semaphores in getConnection() outside of the timeout checking?

Nothing seems quite right.

Thoughts?

@bdeneuter
Copy link
Author

bdeneuter commented Oct 20, 2023

Hi, my pull request was not trying to solve philosophical questions or determine what is the best way to use HikariCP with virtual threads. The goal is to make HikariCP virtual thread friendly and avoid pinning of the carrier thread when it is being called from a virtual thread.
With virtual threads, a small set of carrier threads, which are OS threads, are being used. These few carrier threads are used by all virtual threads running in the Java process. So also virtual threads that don't call HikariCP (Tomcat, http client, ...) are using the same carrier threads.
If HikariCP pins the carrier threads during IO then these carrier threads are not available for other virtuals threads and the service becomes unresponsive. This can happen when HikariCP opens a DB connection and has the virtual thread pinned for example.

Pinning the carrier thread is a limitation from virtual threads in JDK 21. The developers from Loom said that they will work on avoiding pinning the carrier thread when synchronize is used. So in future versions of the JDK it is a problem that will go away but I'm not sure if they will backport their fix to JDK 21.

@enragedginger
Copy link

@XcrigX The Oracle docs on virtual threads specifically address this use case. They recommend "unlearning this pattern" and using a counting semaphore.

(Look specifically for the "Use Semaphores to Limit Concurrency" section in the link above)

@XcrigX
Copy link

XcrigX commented Oct 20, 2023

@bdeneuter - understood - it's a welcome PR. I'm just trying to spark some discussion/thought around "what's next".

@enragedginger - yes, counting semaphores are the likely answer, but from a client perspective this can be messy. If your data access operations are spread out in many places it's pretty ugly to go wrap each in a try/finally block, deal with InterruptedExceptions acquiring the semaphore, etc.

getConnection() in Hikari is a potentially clean place to do this but may break existing contracts with the timeout.
Actually, getConnection() is already using a semaphore in SuspendResumeLock which is acquired prior to the timeout clock starting, but it has 10000 permits. If this number could be adjusted to be based on the connection pool max - it may solve the problem I'm alluding to.

@svendiedrichsen
Copy link

svendiedrichsen commented Oct 20, 2023

IMHO HikariCP already conceptually provides a (timeout-based) semaphore to guard access to pooled connections. This PR only prevents virtual threads to pin to platform threads while accessing the pool.

@cj848
Copy link

cj848 commented Oct 20, 2023

I'm really curious, when will this PR be merged?

@lfbayer
Copy link
Collaborator

lfbayer commented Oct 22, 2023

As far as I understand, applications can continue to use virtual threads in their applications while using HikariCP, but it just means that there is a potential for the JVM to spawn non-virtual threads.

So that means that the purpose of this pull request presumably isn't specifically to be able to use HikariCP in an environment that uses virtual threads but rather, the intent is to get some performance improvement by having HikariCP itself use the virtual threads.

In my very first comment I asked for benchmarks that demonstrated the performance impact of these changes.

These changes are not trivial, and the risk to existing users is still there. I would want assurance that these changes are both correct and meaningful. And meaningful beyond just a cargo-cult thought process of "syncrhonized = bad". I don't entirely disagree with the change, but I also think, if semantically a lock and a synchronize block are the same, shouldn't it be on the JVM itself to add support for this? If they are not semantically the same, then clearly there are other risks of change from synchronize to locks.

This is the kind of change that I would want to truly understand myself before merging. To be perfectly honest, I don't have any immediate plan to put the work in to understand the change and its impact.

I think the "philosophical questions" are actually very much worth having. Part of me thinks that the model for a connection pool in the virtual thread use-case is likely a bit different from HikariCP's implementation. My gut tells me that simply changing the synchronize blocks to locks wouldn't actually solve as much as it might first seem.

To answer the question of "when will this be merged": I don't have any specific plan to do so. My gut tells me it would take two days of full time work to build the confidence that this change is safe. And I don't see myself spending those two days. If someone else can provide the evidence and convincing argument for why this specific change is safe, then I would be happy to merge it. And I mean this specific change, not just an argument for virtual-threads in general. I would also like to hear a good explanation of the risks and impact of the change, as well as how conceptually a database connection pool should work in virtual-thread environment.

I'm also really curious about how the current implementation actually behaves in a virtual-thread environment. Was the impetus for this PR in response to real experience using HikariCP with virtual threads? I think almost all of the current hot-path synchronize blocks aren't expected to ever really block anyway. For example, the getConnection() method actually uses zero synchronize blocks (it only uses a synchronize if the pool isn't already initialized). So if none of the hot-paths synchronize then is there any performance benefit to these changes?

@enragedginger
Copy link

@lfbayer Thanks for the thorough explanation. It's helpful to understand where you're at on all of this.

As far as I understand, applications can continue to use virtual threads in their applications while using HikariCP, but it just means that there is a potential for the JVM to spawn non-virtual threads.

I don't think this assumption is correct. Check out the section from the Oracle docs on virtual threads titled Scheduling Virtual Threads and Pinned Virtual Threads.

If a virtual thread hits a synchronized block of code, it remains pinned to its OS carrier thread and cannot be unpinned until that block of code finishes.

Deadlocks due to resource starvation due to pinned virtual threads is the situation we're all trying to avoid.

I've already encountered this situation with virtual threads when using pooling with the Apache commons HTTP library and it sucks. Switching to using ReentrantLock / Semaphore is simple if you're in control of the offending library, but is annoying to deal with if you're an application developer like me who just free loads on open source libraries others have created. I ultimately just rolled my own virtual thread safe object pooling code and switched over to the internal HttpClient provided by the JDK. It was annoying, but now I get the benefits of virtual threads for my IO and don't have deadlocks in my system.

FWIW, the section of the docs on Avoid Lengthy and Frequent Pinning states:

Pinning may adversely affect the throughput of the server if the blocking operation is both long-lived and frequent. Guarding short-lived operations, such as in-memory operations, or infrequent ones with synchronized blocks or methods should have no adverse effect.

After reading this statement and re-reviewing this PR, I don't think any of the current usages of synchronized classify as both long-lived and frequent and therefore using synchronized should have no adverse effect. If synchronized was being used to wrap the execution of SQL statements, then we'd have a problem, but that doesn't appear to be the case here (someone please tell me if I'm wrong on this though).

if semantically a lock and a synchronize block are the same, shouldn't it be on the JVM itself to add support for this?

This is also an interesting statement that I hadn't initially considered. That aforementioned section of the docs has this too:

A current limitation of the implementation of virtual threads is that performing a blocking operation while inside a synchronized block

I'd largely skipped over the word "current" there; it implies that this might change in the future. The docs advocate for switching to use explicit ReentrantLock / Semaphore instead of synchronized--implying it's a suitable drop-in replacement. However, if developers who wish to use virtual threads have to wait for all of their favorite IO libraries to move away from synchronized until they can safely use virtual threads, the adoption rate for virtual threads is going to be pretty low. Therefore, I'd like to think that some future release of the JVM will remove this synchronized limitation, but that's wishful thinking at this point in time.

In summary, I don't see how the changes from synchronized over to ReentrantLock in this PR would cause any kinds of troubles for existing users as the docs imply that ReentrantLock is a suitable drop-in replacement. The tests still pass, so that should be sufficient to guarantee we're not breaking existing users, no?

However, before advertising that "HikariCP is virtual thread compatible" it'd be good to add JDK 21 to the testing grid and add some virtual thread executor related test cases. IMO it doesn't make sense to do that until after this lands though.

Thoughts?

@lfbayer
Copy link
Collaborator

lfbayer commented Oct 22, 2023

@enragedginger thank you for your detailed reply.

I also wouldn't go as far as saying HikariCP is specifically "virtual thread compatible" at this point. Although I don't feel confident saying it isn't either. It seems like there is potentially other issues than just the use of synchronize that might mean we don't play well. Since I don't have ANY experience using virtual threads yet it's hard for me to feel comfortable with any judgement on whether this change is sufficient.

My main concern with using the ReentrantLock is that even if it doesn't change anything functionally, it might have a negative impact on performance somehow.

For context: when I write code these days I always prefer java.util.concurrent over synchronized. So I have nothing against ReentrantLock and no specific preference for synchronized. What I don't like is change without full comprehension of the impact of the change.

I would absolutely expect Reentrant locks to not perform worse than synchronize, but I still want to see some benchmark numbers to verify this before merging. Just because junit passes doesn't mean there is no negative impact.

I also feel like the issue of pinning a virtual thread still wouldn't be enough to cause a deadlock. It would make me feel more comfortable if I could see a concrete case where this actually is a problem.

Also, I'm still not satisfied with the answer related to locks and synchronize blocks being semantically the same. If the JVM doesn't support something with synchronize that it does with Reentrant locks, then that implies there is something about synchronize that is different. So what are the reasons that synchronize support isn't available. Knowing these underlying reasons would be helpful for us to analyze whether those same reasons would have any impact on us.

For me to be satisfied I need the following questions to be resolved:

  1. What is a specific example of code that causes a virtual thread to be pinned? A JUnit case that uses virtual-threads and can force this scenario would be appreciated, because then we could be guaranteed no future change could cause a regression without us noticing.
  2. If the code paths that can cause pinning are only in the initial DataSource setup code, then is this really an issue that needs to be solved?
  3. For a user to get support for virtual-threads with this pull request, are they required to provide their own ThreadFactory to the HikariConfig? If so, is there a more idiomatic way a library is expected to be implemented so that the library doesn't create non-virtual-threads when run within a vritual-thread environment? How is the library user supposed to know they need to do this. Or is HikariCP's creation of its own threads not a problem?
  4. I would like to see the existing benchmarks run pre- and post- PR and see a comparison of the performance to show that it is at least unchanged if not better.

…ehaviour when the class is loaded multiple times in different ClassLoaders.
@bdeneuter
Copy link
Author

Also, I'm still not satisfied with the answer related to locks and synchronize blocks being semantically the same. If the JVM doesn't support something with synchronize that it does with Reentrant locks, then that implies there is something about synchronize that is different

The pinning of the OS carrier thread occurs when calling native code from Java. It seems that Java is calling native code in the implementation for Java monitors and this seems to be the root cause that synchronized pins the carrier thread. That is why in the JEP it is specified that both native calls and synchronized pins the carrier thread.

I found this BUG report for OpenJDK where they mention that they are reimplementing Java monitors:
https://bugs.openjdk.org/browse/JDK-8305069

So I'm fine to wait for this new implementation and to not make code changes to avoid pinning. Not sure if they would backport this to JDK 21 though. I would expect not.

@lfbayer
Copy link
Collaborator

lfbayer commented Oct 22, 2023

@bdeneuter Your answer satisfies me as to how synchronized and ReentrantLock are different.

I think looking forward, even if support for synchronized in virtual-threads is added, HikariCP likely needs to be changed somehow. Do you have any thoughts on my questions above? Specifically 1 and 3.

My comments up until now might sounds like I do not want to fix this problem. But the opposite is true, I just want to be absolutely sure the answer is the correct one.

@bdeneuter
Copy link
Author

bdeneuter commented Oct 23, 2023

Each time the code enters a synchronized block the underlying carrier thread gets pinned. If this is a bad thing depends on what happens in the synchronized block. If the virtual thread yields and gets parked, the carrier thread stays pinned and won't pick up other virtual threads that want to continue. If all carrier threads are pinned, the system can block and stops working. If the virtual thread doesn't yield and just leaves the synchronized block after it did its work everything is fine.

This is a nice example that is not complex and where the pinning can cause a deadlock:
https://www.reddit.com/r/java/comments/1512xuo/virtual_threads_interesting_deadlock/

This is their question on the loom-dev mailing list:
https://mail.openjdk.org/pipermail/loom-dev/2023-July/005989.html

What happens in this example:

  • Virtual threads entering a synchronized block and each virtual thread will pin its carrier thread while in the synchronized block
  • It appears that System.out.println uses a ReentrantLock underneath
  • When a virtual thread calls the System.out.println before the synchronized block, this virtual thread can get the lock on the internal ReentrantLock. This virtual yields because it has the lock and can perform its IO.
  • The other virtual threads that entered the synchronized block are all waiting for getting the internal lock.
  • But when the virtual thread that has the internal lock wants to continue after it performed its IO, it can not continue as all carrier threads are pinned
  • And the system is in a deadlock

To me it starts to look like it is safer to wait until they solved the pinning of the threads.

To answer your other question. I don't think it would break the users application if HikariCP creates OS threads. Of course it would be better if the different libraries avoid creating their own OS threads or allow the user to configure the library to use virtual threads if they want them.

My comments up until now might sounds like I do not want to fix this problem. But the opposite is true, I just want to be absolutely sure the answer is the correct one.

I understand and don't worry. Everybody is searching on how to use this technology with its current limitations. I prefer to not rush it in if there are doubts.

@XcrigX
Copy link

XcrigX commented Oct 23, 2023

I can add some color to my own experience trying to use virtual threads with Hikari.

I have a batch processing app that takes a set of data in one format, and converts to a flattened SQL model.
The app uses a regular thread pool i.e. (Executors.newFixedThreadPool(8)).

It receives a an array of records, and for each it creates a task and submits to the ExecutorService to schedule and run on a thread, then blocks for them all to complete.
Each task will do some conversion on the source data, and eventually insert or update a record in the DB.

I switched to using virtual threads to see how it would work, pretty much by simply replacing the fixed threadpool with:
Executors.newVirtualThreadPerTaskExecutor();

So suppose I receive 10000 records in a batch. Before at most 8 (threadpool size) could ever be executing concurrently.
Hikari's default connection pool size is 8, so there would always be a free connection. A connection timeout would mean an actual problem connecting to the DB, not just an timeout waiting for a free connection from the pool.

When switching to virtual threads there is no limit on the number of threads that can be running concurrently. Since the DB IO is the main bottleneck, many more than 8 virtual threads will spin up. Eventually they all start to stack up waiting for a connection from Hikari. And eventually, one of the tasks will wait > 30 seconds and timeout.

Again, my suggestion to alleviate this problem would be to have a counting semaphore in getConnection() that re-introduces some throttling based on the connection pool size before the timeout timer starts. This may change the existing contract with clients however, so there may be more to it - some additional explicit configuration or something so clients can "opt-in" to this contract change.

@svendiedrichsen
Copy link

@XcrigX If HikariCP would allow to disable the connection timeout you essentially would end up with a counting semaphore.

@XcrigX
Copy link

XcrigX commented Oct 23, 2023

@svendiedrichsen I suppose one could set the connection timeout to a really high value to approximate it. My thought was that I'd still want to know about problems actually getting a connection from the DB vs just waiting for a connection to become available from the pool.

@svendiedrichsen
Copy link

@XcrigX But this is more a question on how HikariCP differentiates those two cases. i.e. By using different exceptions.

@dpiliouras-ut
Copy link

I'm certainly not familiar with the hikariCP internals, but it sounds like there are two topics here:

  1. Allowing/enabling upstream code to use virtual-threads (w/o pinning). Think of the canonical web-server use-case, where your server is configured to use virtual-threads, and the actual handlers that will run on these threads, talk to the DB. You really don't want these handlers to pin the carrier thread. If I'm understanding correctly, this PR addresses, or is attempting to address, this part. A good/similar example would be the native HttpClient, which was tweaked to not pin (in java-21) in a similar fashion.
  2. Allowing/configuring hikariCP itself to use virtual-threads (i.e. internally). This may be a nice enhancement, but is somewhat unrelated to this PR. Yes, a semaphore will be needed if this 'enhancement' goes forward (because virtual executors are unbounded, and you're not supposed to pool virtual-threads), but as far as I understand, there is nothing wrong with hikariCP spawning native threads (e.g. via a pooled-executor) internally. If these threads are doing a lot of blocking IO, then perhaps this increases the value proposition of the aforementioned enhancement, but still, the two should not be conflated. In fact, the example with the native HttpClient applies here too! It also provides support for a custom executor (or thread-factory), for the threads it may spawn.

Finally, i think the deadlock behaviour (with println), can only occur under somewhat unique circumstances. As Ron (the tech-lead for project loom) states here:

The problem occurs when some shared j.u.c lock is acquired both when some virtual thread is pinned (i.e. inside a synchronized block/method) as well as unpinned

Hope my understanding is correct, and that this helps someone 👍 . Thanks for the great work - eagerly awaiting on a resolution...

@jimpil
Copy link

jimpil commented Mar 19, 2024

Hello,

I was just watching the java 22 launch live stream, and apparently, synchronized still pins the carrier thread :( . Is there any news/consensus around replacing with ReentrantLock, or is this PR considered 'stale'? thanks in advance...

@sureshg
Copy link

sureshg commented Mar 19, 2024

@jimpil Sorry, this isn't answering your question. I'd like to let you know that the short-term fix for pinning is available on the Loom-ea build (https://mail.openjdk.org/pipermail/loom-dev/2024-February/006433.html). Hopefully, this will be ready for production use on JDK 23.

@sinsuren
Copy link

@sureshg given that java 21 is GA, are we taking a stand that users of HikariCP lib should not use virtual thread and have to wait for java 25 (optimistic that issue of pinning gets fixed in java 25).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet