Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #6

Open
Marko19907 opened this issue Oct 25, 2023 · 29 comments
Open

Memory leak #6

Marko19907 opened this issue Oct 25, 2023 · 29 comments
Assignees

Comments

@Marko19907
Copy link

There's a memory leak somewhere, the RAM usage climbs to about 1.6 GB after around 40 minutes and all the connected devices get lost.

image

image

@Valkryst
Copy link
Owner

Valkryst commented Oct 25, 2023

It's likely an issue with something in JInput, as I was just reading a similar issue yesterday. I'll poke around the source and see if there's anything solvable with reflection 🤔

@Valkryst
Copy link
Owner

As far as I could tell, the issue seems to be that this class was spawning a new thread every time HotSwapPoller#getControllers re-initialized the DefaultControllerEnvironment.

If you switch your version to 74eaae30ec931807ee39ba7f854403cfad65a6df, the issue should be resolved.

Though I can't currently test the library on OSX/Linux, I didn't see any obvious issues in their code

@Marko19907
Copy link
Author

Thanks! Unfortunately, that breaks the HotSwapExample at least. It seems like it's stuck in thread.join(); here 🤔

@Valkryst
Copy link
Owner

Documenting a few attempted solutions from today

QueueThread with Thread.interrupt()

The straightforward way of eliminating all of the "dead" threads seemed to be something like this:

final String[] JINPUT_THREADS_TO_KILL = {
    "net.java.games.input.RawInputEventQueue$QueueThread"
};

for (final var thread : Thread.getAllStackTraces().keySet()) {
    if (Arrays.asList(JINPUT_THREADS_TO_KILL).contains(thread.getClass().getName())) {
        thread.interrupt();
        thread.join();
        break;
    }
}

You would think that this would work, and maybe it does under certain configurations as litiengine also uses the same technique. However, the thread gets stuck on a poll call and the isInterrupted check is never run.

I also attempted to call DummyWindow.destroy on the QueueThread#window variable, thinking it might cause poll to throw an exception and allow the isInterrupted check to work. This causes an exception to be thrown from the native function. I believe the error was due to calling DestroyWindow from outside of the thread that created the window (e.g. QueueThread).

Note: Need to double-check that last solution. Maybe I can catch the error and continue to join the thread? I forget whether it was crashing the whole program or just causing an exception in the code.

QueueThread with Forced Exception

Attempted the following:

  1. Setting the QueueThread#exception variable. There is a check which causes it's run function to quit if exception is set.
  2. Setting the QueueThread#window variable to null. This should have caused an exception when accessing it.
  3. Setting the QueueThread#monitor variable to null. This should have caused an exception when accessing it.

None of these solutions had an affect as QueueThread was stuck on the poll call and never ran code which would cause the thread to hit an exception.

QueueThread with Thread.stop()

Against all advice, I attempted to use the deprecated Thread.stop() function to forcefully terminate the "dead" threads. This had no affect.

Manually Update QueueThread#devices

As you can see on the hotswap-fix-attempt-2 branch, I tried copying some of the code from RawInputEventQueue (it's a mess of reflection and a bit tough to read). The intent was to retrieve an updated list of connected devices, without deleting and re-creating the DefaultControllerEnvironment, and then access the existing QueueThread instance and update it's devices instance variable.

For some reason, I havent had a chance to look into it yet, the returned devices are always the same. Though unplugging my controller caused some sort of reflection exception.

@sunsigne
Copy link

sunsigne commented Oct 25, 2023

it looks like thread.join(); is causing some issues, while thread.interrupt() alone seems enough to do the trick... I don't really have time to push deep tests but I'm exhibitor in a video game event mi-november, all a saturday long ! It will be the perfect occasion to see if it's all ok
(EDIT : my bad, didn't checked correctly, it does NOT do the trick D: !)

@sunsigne
Copy link

I mean, it just seems to work fine, writting this :

Capture

@Valkryst
Copy link
Owner

Which operating system are you testing on @sunsigne?

I believe each native implementation works a little differently, and the one that Windows 11 defaults to is the one with an issue. When I test that approach, you can see a new thread being created on every call:

image

@Marko19907
Copy link
Author

I mean, it just seems to work fine, writting this :

Capture

I've tried this myself but no luck, still creates a new thread and doesn't kill off the old ones

@sunsigne
Copy link

Which operating system are you testing on @sunsigne?

I believe each native implementation works a little differently, and the one that Windows 11 defaults to is the one with an issue. When I test that approach, you can see a new thread being created on every call:

image

Ho dang ! You're right ! It was all an illusion, it still create some threads ... my bad, didn't checked correctly (I'm on windows 10 btw)

@Valkryst
Copy link
Owner

Checking JInput's source it looks like almost every Windows version uses DirectAndRawInputEnvironmentPlugin, which in-turn uses the QueueThread that we're trying to work around. I also checked litiengine's PRs to see if anyone has complained about their implementation, but I didn't see anything at a glance.

I'll DM their devs, if I can, and ask if they did anything more to bypass the issue 🤔

@Valkryst Valkryst self-assigned this Oct 25, 2023
@sunsigne
Copy link

sunsigne commented Oct 25, 2023

It looks like it's even a bigger problem than I thought ... Any gamepad input CREATE a new thread, even without ours hacking-patches : you clicked 100 times ? You created 100 thread and they all are living in the same time ... (i might be wrong, but my console is pretty clear on how many of them was created) :

Capture

EDIT : i might be dumb, if for (final var thread : Thread.getAllStackTraces().keySet()) informs you about all thread created, even dead ones, so it's ok

@Valkryst
Copy link
Owner

Gave that a shot with the ControllerExample from the README, but I can't reproduce that issue:

image

Just saw your edit, that makes sense 👍

@Valkryst
Copy link
Owner

One possible solution, and this should be a last resort due to how wasteful it is, would be to refactor this library into a client-server architecture where:

  • The client would listen for events on a local port.
  • The server would listen for events from JInput and publish them to a local port.
  • The client would start the server on a separate JVM (possibly using ProcessBuilder) and allow the memory leak to happen for some set period of time (a few minutes?) before being shut down and restarted.
    • When the JVM shuts down, it would free up any leaked resources.

Another would be to look into why this PR was never merged into JInput, as it adds hot swap functionality 👀

@Marko19907
Copy link
Author

I'll take a look at that PR today, seems like someone there had trouble getting it to compile

@Marko19907
Copy link
Author

Marko19907 commented Oct 26, 2023

I got it to build on Windows with the command mvn clean install from the VS Command Prompt and setting the dependencies to this since the result is not an uber jar anymore, Ant must also be installed:

<dependency>
    <groupId>net.java.jinput</groupId>
    <artifactId>coreapi</artifactId>
    <version>2.0.10-SNAPSHOT</version>
</dependency>
<dependency>
    <groupId>net.java.jinput</groupId>
    <artifactId>windows-plugin</artifactId>
    <version>2.0.10-SNAPSHOT</version>
</dependency>

Also, the PR doesn't solve the issue as it's still behaving the same as your solution 😬

@Marko19907
Copy link
Author

I've tried multiple approaches to kill the threads but they're all stuck here private final native void nPoll(long hwnd_handle) throws IOException; in the RawInputEventQueue. I'm not sure I know enough about C and the API to debug the native code 😅

@Valkryst
Copy link
Owner

Valkryst commented Oct 26, 2023

I'm with you there. I can work with C to an extent, but likely not well enough to dive straight into debugging JInput's native plugins 🥲

Currently looking into the RMI API to create a client-server version of HotSwapPoller. If the latency and resource usage aren't too bad, then it might be an okay solution.

Edit: Tried doing it with a client/server, but it ran into a lot of issues. JVM startup times, convoluted multithreading, and hard to read/understand code.

One of the litiengine devs responded to my question, but we're waiting on the dev who actually worked on their JInput code to respond now. They also hinted at having a new solution for input management, so maybe they have their own controller library in the works?

I also prepared two smaller PRs for JInput, just to see whether Endolf is open to reviewing/merging changes if we do end up needing to work on it

@Marko19907
Copy link
Author

Nice, looking forward to see what the litiengine guys say

Also, I may have spotted the issue in the C part of the code by reading the Windows API, I'll try it out tomorrow morning 👌

@Valkryst
Copy link
Owner

Valkryst commented Oct 26, 2023

I spent some time Googling the issue, to see if anyone had hints or solutions. It seems like some people gave up, had similar broken (or unknowingly broken) solutions, and Endolf repeatedly mentions that he's tried asking people to implement it and that he'd review the changes.

Unimplemented Interface

It looks like the resource leak was noticed a long time ago.

Based on Endolf's comment, it seems as though it can be implemented VIA implementing an existing interface. This post contains another reference to the unused/unimplemented interface.

Possible Solution #1

In another post, Perfect Slayer posted a link to his patch for fixing the issue on Windows.

I'll try testing this when I get a chance.

Possible Solution

This post suggests using an alternative method for retrieving the controllers.

I tried testing this, but it also resulted in "dead" threads leaking resources.

@Valkryst
Copy link
Owner

I spent a few hours trying to get the entire project to build with Maven, but the uberjar kept running into build issues.

Though mvn clean install didn't work on my machine, I was able to compile on both Windows and Linux (I didn't have a chance to test the compiled JARs) with:

mvn --batch-mode --update-snapshots -Dmaven.javadoc.skip=true verify --file pom.xml --projects applet,coreAPI,plugins,tests --also-make install

If the build fails, possibly due to previous build attempts which included uberjar, a quick mvn clean was able to resolve the issue.

This should be enough for us to poke around JInput's source and test any changes.


It looks like there's a Jenkinsfile which I attempted running VIA a Dockerized Jenkins instance, but I wasn't familiar enough with it to get it running properly.

@Marko19907
Copy link
Author

I managed to fix the "dead threads" issues. This is basically what it comes down to, in here:

// net_java_games_input_RawInputEventQueue.c
if (GetMessage(&msg, hwnd, 0, 0) != 0) {

The GetMessage function is blocking, meaning that the thread is unable to exit out of this poll here:

// RawInputEventQueue.java
while (!isInterrupted()) {
	poll(window);
}

That then calls this private final native void nPoll(long hwnd_handle) throws IOException; which is where it gets stuck due to GetMessage

There's two possible solutions to this:

1. Use PostMessage to send a "dummy" message that breaks GetMessage

To implement this solution, add the following code underneath the GetMessage call in net_java_games_input_RawInputEventQueue.c

// net_java_games_input_RawInputEventQueue.c
if (msg.message == WM_USER) {
    // Dummy message used to break GetMessage, just return
    return;
}

The final code should look like this:

// net_java_games_input_RawInputEventQueue.c
if (GetMessage(&msg, hwnd, 0, 0) != 0) {
    if (msg.message == WM_USER) {
        // Dummy message used to break GetMessage, just return
        return;
    }
    if (msg.message != WM_INPUT) {
        DefWindowProc(hwnd, msg.message, msg.wParam, msg.lParam);
        return; // ignore it
    }

Additionally, add a new function in the same file:

JNIEXPORT void JNICALL Java_net_java_games_input_RawInputEventQueue_nPostMessage(JNIEnv *env, jobject self, jlong hwnd_handle) {
    HWND hwnd = (HWND)(INT_PTR)hwnd_handle;
    PostMessage((HWND) hwnd, WM_USER, 0, 0);
}

On the Java side, add the following:

// RawInputEventQueue.java
private final void postMessage(DummyWindow window) {
	this.nPostMessage(window.getHwnd());
}
private final native void nPostMessage(long hwnd_handle);

I've also used that PR you mentioned above:

// ControllerEnvironment.java
public static ControllerEnvironment refreshDefaultEnvironment() {
    DefaultControllerEnvironment newControllerEnvironment = new DefaultControllerEnvironment();
    int oldControllerListLength = defaultEnvironment.getControllers().length;
    int newControllerListLength = newControllerEnvironment.getControllers().length;
    // If a controller is removed, the remaining controllers in the old environment should still be valid
    // so the new environment can simply be discarded. Otherwise, the controllers in the old environment
    // should be disposed before returning the new environment.
    if (newControllerListLength < oldControllerListLength) {
        for (int i = 0; i < newControllerListLength; i++) {
            if (newControllerEnvironment.controllers.get(i) instanceof DisposableDevice) {
                try {
                    ((DisposableDevice) newControllerEnvironment.controllers.get(i)).close();
                } catch (IOException e) {
                    log("Device file descriptor is already closed: " + e.getMessage());
                }
            }
        }
    } else {
        for (int i = 0; i < newControllerListLength; i++) {
            if (i < oldControllerListLength) {
                if (!((DefaultControllerEnvironment) defaultEnvironment).controllers.get(i).poll()) {
                    if (((DefaultControllerEnvironment) defaultEnvironment).controllers.get(i) instanceof DisposableDevice) {
                        try {
                            ((DisposableDevice) ((DefaultControllerEnvironment) defaultEnvironment).controllers.get(i)).close();
                        } catch (IOException e) {
                            log("Device file descriptor is already closed: " + e.getMessage());
                        }
                    }
                }
            }
        }
        // defaultEnvironment = newControllerEnvironment; // Not sure why this was placed here???
    }
    defaultEnvironment.release();
    defaultEnvironment = newControllerEnvironment;
    return defaultEnvironment;
}

I've also added a function called release() to the ControllerEnvironment.java that allows me to call release and propagate that down all the way to the RawInputEventQueue.java.

The bad news is that while this works, the threads can stop and get cleaned, the hidden windows that get spawned to get input aren't cleaned. You can see that with Spy++ or WinLister.

I've tried to solve this with this addition here in nDestroy that tries to resolve all messages in the window before killing them:

// net_java_games_input_DummyWindow.c
JNIEXPORT void JNICALL Java_net_java_games_input_DummyWindow_nDestroy(JNIEnv *env, jclass unused, jlong hwnd_address) {
	HWND hwndDummy = (HWND)(INT_PTR)hwnd_address;
	MSG msg;

	while (PeekMessage(&msg, hwndDummy, 0, 0, PM_REMOVE)) {
	    TranslateMessage(&msg);
	    DispatchMessage(&msg);
	}

	BOOL result = DestroyWindow(hwndDummy);
	if (!result) {
		throwIOException(env, "Failed to destroy window (%d)\n", GetLastError());
	}
}

But no luck. 😬

The strange thing is that the DestroyWindow returns true, so success, but the windows are still there as you can see here:

image

2. Or we can swap GetMessage with PeekMessage and poll the controller on a time delay

The hint can be found here:

https://learn.microsoft.com/en-us/windows/win32/winmsg/using-messages-and-message-queues

The main difference between the two functions is that GetMessage does not return until a message matching the filter criteria is placed in the queue, whereas PeekMessage returns immediately regardless of whether a message is in the queue.

This is a bad approach due to busy waiting and wouldn't solve the window issue anyways. 🤔

I've also just got permission from work to create a fork so I'll push those changes later today! 🥳

@Valkryst
Copy link
Owner

Awesome work! I left some comments on your PR, hopefully to get ahead of some minor things that Endolf might notice as well 👌

Looking at the native code, it seems as though the windows should close. I followed the flows below and didn't see any obvious issues in the original code.

I also reviewed the PeekMessage code and none of Microsoft's documentation led me to believe that it would cause an issue with the subsequent DestroyWindow call.

There is one possible issue though. The thread that creates a window is the only one that can destroy it. You'd think that the DestroyWindow call would know to fail if a different thread calls it, but maybe it thinks it's succeeding when in reality it fails due to being called from the wrong thread?

Window Creation

  1. The DummyWindow() constructor is called.
  2. DummyWindow.createWindow() is called.
  3. Java_net_java_games_input_DummyWindow_createWindow is called.
  4. Within the native createWindow function, it looks as though the hwndDummy value is retrieved from CreateWindow.
  5. CreateWindow returns either NULL or a handle (type seems unspecified, but the code says it's a (jlong)(intptr_t)) to the new window.
  6. The returned handle is saved as a long hwnd_address instance variable on the DummyWindow object.

Window Destruction

  1. DummyWindow.destroy() is called.
  2. DummyWindoww.nDestroy() is called and passes the long hwnd_address instance variable to the native function.
  3. Java_net_java_games_input_DummyWindow_nDestroy is called.
  4. DestroyWindow is called.
  5. DestroyWindow returns either 0 (failure) or a positive value (success) if the window is destroyed. We'd get an IOException if there was a failure.

@Marko19907
Copy link
Author

There is one possible issue though. The thread that creates a window is the only one that can destroy it. You'd think that the DestroyWindow call would know to fail if a different thread calls it, but maybe it thinks it's succeeding when in reality it fails due to being called from the wrong thread?

https://stackoverflow.com/a/65458240
I tried to close the window from PostMessage by passing WM_CLOSE like this answer suggested and then later adding a DestroyWindow like this in nPoll:

if (GetMessage(&msg, hwnd, 0, 0) != 0) {
    if (msg.message == WM_USER) {
        // Dummy message used to break GetMessage, just return
        BOOL result = DestroyWindow(hwnd);
        if (!result) {
            throwIOException(env, "Failed to destroy window (%d)\n", GetLastError());
        }
        return;
    }
    if (msg.message != WM_INPUT) {
        DefWindowProc(hwnd, msg.message, msg.wParam, msg.lParam);
        return; // ignore it
    }

To my understanding, PostMessage just delivers a message to the owner thread, so the window should've closed, but none of those solutions I tried worked. 🤯

@Valkryst
Copy link
Owner

Following this explanation posted in the comments of that SO thread, it seems like the message might be discarded by the message loop of the receiving thread. So it would look as though it's being received, but it's never acted upon.

Noting the final paragraph in that explanation, I'm not certain if we actually have a hidden window. I think we might based on this SO answer and the list of Window Styles. If that's the case, then I think PostMessage should be working. But the fact that it doesn't may mean that we're not working with a hidden window 🤔

Below that comment is another suggesting PostMessageW. The docs are a tad confusing, but it seems like it works differently?

@Marko19907
Copy link
Author

Tools like WinLister and Spy++ all confirm that it's a hidden window. 🤔

image

This is one of them for example:

image


https://stackoverflow.com/a/7424550
It's the same function, just different input and output format

@Valkryst
Copy link
Owner

Thanks for double-checking! I wonder if Endolf will have any ideas when they review your PR 🤔

@Marko19907
Copy link
Author

Hopefully they'll have some tips, I've kinda reached the end of the road 🤷‍♂️

Did the litiengine people respond?

@Valkryst
Copy link
Owner

I haven't heard back from the other developer, which the first one @'d in their Discord. Guess we're stuck waiting to hear back from people for now

@Marko19907
Copy link
Author

Marko19907 commented Nov 1, 2023

It might be possible to work around this with LWJGL. We could use it to detect controller changes and only then refresh the Jinput environment. While this approach is functional, it may lead to potential memory leaks over extended periods, but it should keep it to a minimum because I think it's safe to assume that a regular user wouldn't add and remove controllers once every second for 40 minutes.

Here's the code for a slim example I made of how it could work:

public class Slim {

    // The window handle
    private long window;

    // Save the callback to free it later
    private GLFWJoystickCallback joystickCallback;

    public void run() {
        System.out.println("Hello LWJGL " + Version.getVersion() + "!");

        this.init();
        this.loop();

        // Free the window callbacks and destroy the window
        glfwFreeCallbacks(this.window);
        glfwDestroyWindow(this.window);

        // Terminate GLFW and free the error callback
        glfwTerminate();
        glfwSetErrorCallback(null).free();
        if (this.joystickCallback != null) {
            this.joystickCallback.free();
        }
    }

    private void init() {
        // Setup an error callback. The default implementation
        // will print the error message in System.err.
        GLFWErrorCallback.createPrint(System.err).set();

        // Initialize GLFW. Most GLFW functions will not work before doing this.
        if ( !glfwInit() )
            throw new IllegalStateException("Unable to initialize GLFW");

        // Configure GLFW
        glfwDefaultWindowHints(); // optional, the current window hints are already the default
        glfwWindowHint(GLFW_VISIBLE, GLFW_FALSE); // the window will stay hidden after creation
        glfwWindowHint(GLFW_RESIZABLE, GLFW_TRUE); // the window will be resizable

        // Create the window
        this.window = glfwCreateWindow(300, 300, "Hello World!", NULL, NULL);
        if (this.window == NULL )
            throw new RuntimeException("Failed to create the GLFW window");

        // Setup a key callback. It will be called every time a key is pressed, repeated or released.
        glfwSetKeyCallback(this.window, (window, key, scancode, action, mods) -> {
            if ( key == GLFW_KEY_ESCAPE && action == GLFW_RELEASE )
                glfwSetWindowShouldClose(window, true); // We will detect this in the rendering loop
        });

        this.joystickCallback = new GLFWJoystickCallback() {
            @Override
            public void invoke(int jid, int event) {
                if (event == GLFW.GLFW_CONNECTED) {
                    System.out.println("Joystick " + jid + " connected");
                } else if (event == GLFW.GLFW_DISCONNECTED) {
                    System.out.println("Joystick " + jid + " disconnected");
                }
            }
        };
        GLFW.glfwSetJoystickCallback(this.joystickCallback);

        // Get the thread stack and push a new frame
        try ( MemoryStack stack = stackPush() ) {
            IntBuffer pWidth = stack.mallocInt(1); // int*
            IntBuffer pHeight = stack.mallocInt(1); // int*

            // Get the window size passed to glfwCreateWindow
            glfwGetWindowSize(this.window, pWidth, pHeight);

            // Get the resolution of the primary monitor
            GLFWVidMode vidmode = glfwGetVideoMode(glfwGetPrimaryMonitor());

            // Center the window
            glfwSetWindowPos(
                    this.window,
                    (vidmode.width() - pWidth.get(0)) / 2,
                    (vidmode.height() - pHeight.get(0)) / 2
            );
        } // the stack frame is popped automatically

        // Make the OpenGL context current
        glfwMakeContextCurrent(this.window);

        // Don't make the window visible
        // glfwShowWindow(this.window);
    }

    private void loop() {
        // This line is critical for LWJGL's interoperation with GLFW's
        // OpenGL context, or any context that is managed externally.
        // LWJGL detects the context that is current in the current thread,
        // creates the GLCapabilities instance and makes the OpenGL
        // bindings available for use.
        GL.createCapabilities();

        // Set the clear color
        glClearColor(1.0f, 0.0f, 0.0f, 0.0f);

        // Run the rendering loop until the user has attempted to close
        // the window or has pressed the ESCAPE key.
        while ( !glfwWindowShouldClose(this.window) ) {
            glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // clear the framebuffer

            glfwSwapBuffers(this.window); // swap the color buffers

            // Poll for window events. The key callback above will only be
            // invoked during this call.
            glfwPollEvents();
        }
    }

    public static void main(String[] args) {
        new Slim().run();
    }
}

And the dependencies with only the stuff we need:

<dependencies>
    <dependency>
        <groupId>org.lwjgl</groupId>
        <artifactId>lwjgl</artifactId>
    </dependency>
    <dependency>
        <groupId>org.lwjgl</groupId>
        <artifactId>lwjgl-glfw</artifactId>
    </dependency>
    <dependency>
        <groupId>org.lwjgl</groupId>
        <artifactId>lwjgl-opengl</artifactId>
    </dependency>
    <dependency>
        <groupId>org.lwjgl</groupId>
        <artifactId>lwjgl</artifactId>
        <classifier>${lwjgl.natives}</classifier>
    </dependency>
    <dependency>
        <groupId>org.lwjgl</groupId>
        <artifactId>lwjgl-glfw</artifactId>
        <classifier>${lwjgl.natives}</classifier>
    </dependency>
    <dependency>
        <groupId>org.lwjgl</groupId>
        <artifactId>lwjgl-opengl</artifactId>
        <classifier>${lwjgl.natives}</classifier>
    </dependency>
    <dependency>
        <groupId>org.lwjgl</groupId>
        <artifactId>lwjgl-rpmalloc</artifactId>
        <classifier>${lwjgl.natives}</classifier>
    </dependency>
</dependencies>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants