Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOUNTY $25] JVM Crash: SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED #62

Closed
KIC opened this issue Oct 15, 2017 · 17 comments
Assignees
Labels
bounty $$$ Cash reward! bug Fix something that is broken

Comments

@KIC
Copy link

KIC commented Oct 15, 2017

Hmm when I create more then one Kernel from the same class then I suddenly run into such messages like below. I could not find anything in the documentation that this should not be supported. Basically I have an object pool of a limited number of Kernels and different threads can aquire those limited resources. But this results in a JVM crash.

Thanks
KIC

PS you could find a runnable version at: https://github.com/KIC/LPPL the problem arises at kic.lppl.SornetteTest

Connected to the target VM, address: '127.0.0.1:64263', transport: 'socket'
Oct 15, 2017 9:46:04 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
Oct 15, 2017 9:46:05 AM com.aparapi.internal.kernel.KernelDeviceProfile onEvent
SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffe652ebbdf, pid=18076, tid=0x0000000000004310
#
# JRE version: Java(TM) SE Runtime Environment (8.0_102-b14) (build 1.8.0_102-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [ntdll.dll+0x3bbdf]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\xxxx\sources\kic\dataframe\LPPL\hs_err_pid18076.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
@KIC
Copy link
Author

KIC commented Oct 15, 2017

hs_err_pid18076.log

@KIC KIC closed this as completed Oct 15, 2017
@KIC KIC reopened this Oct 15, 2017
@KIC KIC changed the title SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED JVM Crash: SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED Oct 15, 2017
@freemo
Copy link
Member

freemo commented Oct 18, 2017

@KIC Can you share a simple test case so I can investigate this further please? (feel free to submit it as a unit test to the project if you want).

@KIC
Copy link
Author

KIC commented Oct 18, 2017

This is pretty hard to tackle via simple unit test as the jvm not always crashes. Stilll you can see a lot of severe meesages by just running the test below. As I have mentioned you can also just clone https://github.com/KIC/LPPL and run the kic.lppl.SornetteTest test.

package kic.aparapi;

import com.aparapi.Kernel;
import com.aparapi.Range;
import org.junit.Test;

import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class JVMCrashTest {
    public static float[] t = new float[]{416667f, 416668f, 416669f, 416670f, 416671f, 416672f, 416673f, 416674f, 416675f, 416676f, 416677f, 416678f, 416679f, 416680f, 416681f, 416682f, 416683f, 416684f, 416685f, 416686f, 416687f, 416688f, 416689f, 416690f, 416691f, 416692f, 416693f, 416694f, 416695f, 416696f, 416697f, 416698f, 416699f, 416700f, 416701f, 416702f, 416703f, 416704f, 416705f, 416706f, 416707f, 416708f, 416709f, 416710f, 416711f, 416712f, 416713f, 416714f, 416715f, 416716f, 416717f, 416718f, 416719f, 416720f, 416721f, 416722f, 416723f, 416724f, 416725f, 416726f, 416727f, 416728f, 416729f, 416730f, 416731f, 416732f, 416733f, 416734f, 416735f, 416736f, 416737f, 416738f, 416739f, 416740f, 416741f, 416742f, 416743f, 416744f, 416745f, 416746f, 416747f, 416748f, 416749f, 416750f, 416751f, 416752f, 416753f, 416754f, 416755f, 416756f, 416757f, 416758f, 416759f, 416760f, 416761f, 416762f, 416763f, 416764f, 416765f, 416766f, 416767f, 416768f, 416769f, 416770f, 416771f, 416772f, 416773f, 416774f, 416775f, 416776f, 416777f, 416778f, 416779f, 416780f, 416781f, 416782f, 416783f, 416784f, 416785f, 416786f, 416787f, 416788f, 416789f, 416790f, 416791f, 416792f, 416793f, 416794f, 416795f, 416796f, 416797f, 416798f, 416799f, 416800f, 416801f, 416802f, 416803f, 416804f, 416805f, 416806f, 416807f, 416808f, 416809f, 416810f, 416811f, 416812f, 416813f, 416814f, 416815f, 416816f, 416817f, 416818f, 416819f, 416820f, 416821f, 416822f, 416823f, 416824f, 416825f, 416826f, 416827f, 416828f, 416829f, 416830f, 416831f, 416832f, 416833f, 416834f, 416835f, 416836f, 416837f, 416838f, 416839f, 416840f, 416841f, 416842f, 416843f, 416844f, 416845f, 416846f, 416847f, 416848f, 416849f, 416850f, 416851f, 416852f, 416853f, 416854f, 416855f, 416856f, 416857f, 416858f, 416859f, 416860f, 416861f, 416862f, 416863f, 416864f, 416865f, 416866f};
    public static float[] p = new float[]{2034.91f, 2042.03f, 2044.13f, 2063.15f, 2060.7f, 2056.29f, 2057.51f, 2058.18f, 2047.97f, 2001.26f, 2008.72f, 1999.9f, 1981.21f, 1933.88f, 1923.55f, 1939.44f, 1929.48f, 1893.06f, 1910.54f, 1930.46f, 1940.75f, 1929.77f, 1914.37f, 1872.92f, 1831.74f, 1857.52f, 1832.32f, 1830.06f, 1785.75f, 1784.81f, 1796.98f, 1809.07f, 1791.85f, 1768.04f, 1754.01f, 1741.92f, 1765.34f, 1785.87f, 1807.05f, 1797.9f, 1813.57f, 1758.5f, 1775.85f, 1768.07f, 1725.15f, 1716.43f, 1731.33f, 1761.02f, 1760.95f, 1742.41f, 1734.22f, 1731.33f, 1711.96f, 1710.56f, 1663.85f, 1656.57f, 1616.43f, 1611.78f, 1669.43f, 1651.7f, 1662.17f, 1707.62f, 1698.16f, 1669.59f, 1623.32f, 1622.89f, 1648.25f, 1685.88f, 1670.64f, 1694f, 1722.13f, 1706.31f, 1705.01f, 1735.95f, 1743.88f, 1775.36f, 1818.09f, 1812.17f, 1813.04f, 1775.78f, 1756.78f, 1756.94f, 1764.34f, 1812.26f, 1794.48f, 1802.72f, 1865.28f, 1863.09f, 1843.68f, 1893.35f, 1882.88f, 1915.17f, 1934.48f, 1912f, 1907.62f, 1910.38f, 1881.93f, 1914.18f, 1933.92f, 1952.46f, 1974.38f, 1954.74f, 1958.4f, 1956.65f, 1993.21f, 2000.73f, 2019.83f, 2045.57f, 2047.59f, 2002.27f, 2030.48f, 2037.54f, 2043.64f, 2069.43f, 2043.62f, 1992.19f, 2011.31f, 1964.96f, 1992.72f, 1985.37f, 2038.22f, 2018.29f, 2041.33f, 2053.25f, 2086.2f, 2079.23f, 2061.43f, 2060.43f, 2047.58f, 2027.91f, 2004.27f, 2042.32f, 2022.79f, 1999.28f, 1995.37f, 1989.62f, 1945.19f, 1961.19f, 1968.31f, 1971.8f, 1968.05f, 1980.25f, 2009.48f, 2026.87f, 2022.79f, 2023.39f, 2041.25f, 2021.78f, 2022.01f, 2026.15f, 2030.88f, 2041.01f, 2141.41f, 2190.04f, 2248.67f, 2221.63f, 2270.04f, 2259.56f, 2249.08f, 2280.56f, 2284.33f, 2348.77f, 2364.71f, 2438.89f, 2436.04f, 2370.25f, 2411.94f, 2341.18f, 2330.4f, 2377.23f, 2400.77f, 2372f, 2373.93f, 2324.76f, 2355f, 2328.78f, 2369.15f, 2392.48f, 2376.29f, 2359.96f, 2340.38f, 2339.7f, 2304.06f, 2286.95f, 2258.27f, 2268.42f, 2266.97f, 2317.76f, 2295.68f, 2299.64f, 2323.04f, 2340.8f, 2332.45f, 2316.51f, 2351.41f, 2362.82f, 2378.29f, 2404.29f, 2410.59f, 2443.15f};

    @Test
    public void test() throws InterruptedException {
        List<ABCCKernel> kernels = IntStream.range(0, 50).mapToObj(j -> new ABCCKernel(t, p)).collect(Collectors.toList());
        // repeat until we eventually crash
        for (int i=0; i<1000; i++) {
            ExecutorService executorService = Executors.newFixedThreadPool(kernels.size());
            kernels.forEach(k -> executorService.submit(() -> k.execute(Range.create(t.length))));

            executorService.shutdown();
            executorService.awaitTermination(1, TimeUnit.DAYS);
        }

        kernels.forEach(ABCCKernel::dispose);
    }

    public class ABCCKernel extends Kernel {
        public static final int FI = 0, GI = 1, HI = 2, YI = 3, FI2 = 4, FIGI = 5, GI2 = 6, YIFI = 7, YIGI = 8, FIHI = 9, GIHI = 10, HI2 = 11, YIHI = 12;
        public static final int v = 13;
        private static final int TC = 0, M = 1, W = 2;
        private float[] T, p;
        private float[] tcmw = new float[3];
        private float[] result;
        //public final int N;

        public ABCCKernel(float[] t, float[] p) {
            setExplicit(true);
            this.T = t;
            this.p = p;
            this.result = new float[t.length * v];
            put(this.T).put(this.p).put(result);
        }

        public void setNewTandP(float[] t, float[] p) {
            this.T = t;
            this.p = p;
            this.result = new float[t.length * v];
            put(this.T).put(this.p).put(result);
        }

        public void set_tcmw(float tc, float m, float w) {
            this.tcmw = new float[]{tc, m, w};
            put(this.tcmw);
        }

        @Override
        public void run() {
            int i = getGlobalId();
            int j = i * v;
            int fi = FI + j, gi = GI + j, hi = HI + j, yi = YI + j, fi2 = FI2 + j, figi = FIGI + j, gi2 = GI2 + j, yifi = YIFI + j, yigi = YIGI + j, fihi = FIHI + j, gihi = GIHI + j, hi2 = HI2 + j, yihi = YIHI + j;
            float tc = tcmw[TC];
            float w = tcmw[W];
            float m = tcmw[M];

            result[fi] = pow((tc - T[i]), m);
            result[gi] = result[fi] * cos(w * log(tc - T[i]));
            result[hi] = result[fi] * sin(w * log(tc - T[i]));
            result[yi] = p[i];
            result[fi2] = result[fi] * result[fi];
            result[figi] = result[fi] * result[gi];
            result[gi2] = result[gi] * result[gi];
            result[yifi] = result[yi] * result[fi];
            result[yigi] = result[yi] * result[gi];
            result[fihi] = result[fi] * result[hi];
            result[gihi] = result[gi] * result[hi];
            result[hi2] = result[hi] * result[hi];
            result[yihi] = result[yi] * result[hi];
        }

        public float[] getResult() {
            get(result);
            return result;
        }
    }

}

@freemo
Copy link
Member

freemo commented Oct 21, 2017

@KIC have you had any luck with this? I'll probably place a small bounty on this issue to see if i can get any takers unless you've already resolved it.

@freemo freemo self-assigned this Oct 21, 2017
@freemo freemo added bounty $$$ Cash reward! bug Fix something that is broken labels Oct 21, 2017
@freemo freemo changed the title JVM Crash: SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED [BOUNTY $25] JVM Crash: SEVERE: ProfilingEvent.START encountered without ProfilingEvent.EXECUTED Oct 21, 2017
@KIC
Copy link
Author

KIC commented Oct 22, 2017

No I was not able to solve or work around this so far. Currently the only workaroud is not using multiple threads. Thanks a lot for putting a bounty on it!!

@automenta
Copy link
Collaborator

i tried the JVMCrashTest and without going further into the actual aparapi code it looks like it might be solved by using an AtomicReference, or at least synchronization, to make the public void onEvent(ProfilingEvent event) procedure Atomic, or to make the Profiles local to each thread using ThreadLocal and then merge the per-thread results after. still i dont know how this works it just looks like an ordinary multithread race condition that should be solvable.

btw for the JVMCrashTest i tried:
kernels.forEach(k -> ForkJoinPool.commonPool().submit(() -> k.execute(Range.create(t.length))));

and also suggest making ABCCKernel a static class in case that matters

@freemo
Copy link
Member

freemo commented Oct 22, 2017

@automenta Would you be willing to try a solution out against @KIC 's unit test above? I'm sure you can work with him to get better unit tests if we need to but that one should suffice?

@automenta
Copy link
Collaborator

im working on it now, seems to sort of fix that one but then it started complaining about the sequence of events it was tracking. so im looking into completely multithreading all a Kernel's KernelRunner's if it's possible

https://gist.github.com/automenta/2d2d194976b2ed451cea5dae7e9bbffc#file-kerneldeviceprofile-java-L49

@freemo
Copy link
Member

freemo commented Oct 22, 2017

@automenta If we do that I think we want to make sure that the user can choose to run it asynchronously or synchronously when they run a kernel as well. We may be able to kill two birds with one stone and also solve issue #41 at the same time don't you think?

@automenta
Copy link
Collaborator

yeah i think it can solve that. what i'm trying to do is keep the exact same user-level API but transparently handle the case when another thread gets involved, by keeping KernelRunner thread-local they dont need to interact but can share bytecode and whatever JNI resources are safe to share.

@freemo
Copy link
Member

freemo commented Oct 22, 2017

@automenta Sure, and that should be what we do. Though I'm saying once we have that done we can add a similar api call for executing the kernel that is the async variety of it that uses many of the same threading your doing to handle the async flavor as well.

@automenta
Copy link
Collaborator

right it could be a wrapper around the Executors threadpool api that allows management and async callback on events etc. i keep seeing things id like to use java 8 and java 9 stuff for as well as other libraries. maybe after this ill make an experimental fork and call it aparapoop

@freemo
Copy link
Member

freemo commented Oct 22, 2017 via email

@freemo
Copy link
Member

freemo commented Apr 15, 2018

@automenta and @KIC Any movement on this ticket? Do I need to up the bounty and see if we have any other takers willing to finish it. I'll wait if you want to finish it though to give you a chance.

@CoreRasurae
Copy link
Collaborator

@KIC I've tested your JVMCrashTest and I am not getting a JVM crash, however I do get the SEVERE level log messages.
Tested with aparapi-1.4.1 and aparapi-1.7.0-SNAPSHOT along with aparapi-jni-1.2.0 on Ubuntu 16.04 LTS with an i7 CPU,a Xeon E5-2620v2, a NVIDIA Quadro NVS 4200M, a NVIDIA GTX 760 and an AMD RX460. JVM Never crashed.

@freemo @automenta
From a quick code analysis it looks like only the Profiling information is compromised, because currently a KernelProfile is kept per Kernel class, however since multiple threads run the same kernel, they will concurrently update the profiling data randomly at different stages, invalidating its details and producing the SEVERE level log messages.

@CoreRasurae
Copy link
Collaborator

CoreRasurae commented Apr 18, 2018

@KIC Can you confirm if the JVM crash still happens with aparapi-1.4.1 and aparapi-1.7.0-SNAPSHOT both with aparapi-jni-1.2.0 regarding the above JVMCrashTest?

freemo added a commit that referenced this issue Apr 21, 2018
Fix and Update: Fix issue #62 and provide new API for kernel profiling under multithreading (refs #62)
@freemo
Copy link
Member

freemo commented Apr 21, 2018

Issue is now resolved in master branch, closing ticket.

@freemo freemo closed this as completed Apr 21, 2018
freemo added a commit that referenced this issue Apr 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty $$$ Cash reward! bug Fix something that is broken
Projects
None yet
Development

No branches or pull requests

4 participants