A managed platform and language for GPGPU
Fetching latest commit…
Cannot retrieve the latest commit at this time
This project is copyright 2012 by Chris Jang (firstname.lastname@example.org). All source code is licensed under _The Artistic License 2.0_ of _The Perl Foundation_ except for the sample code (under /sample). Some samples are taken from PeakStream (TM) whitepapers and presentations. These are indicated in source code comments. =============================================================================== What is this? Chai is a clean-room implementation of the PeakStream (TM) managed platform for GPGPU. PeakStream was an array programming language embedded as a domain specific language in C++. It had a virtual machine and JIT which could target and schedule across heterogeneous devices (CPUs, GPUs and accelerators). It was still in a developmental state when the company was acquired in 2007 and the product line discontinued. Chai's JIT back-end generates OpenCL (which did not exist during PeakStream's time). This means it works with the OpenCL SDKs from AMD, Intel, and NVIDIA. Chai supports compute devices from all three vendors. =============================================================================== Why recreate it? PeakStream had a very practical approach to the GPGPU problem. It was a solution which minimized software development costs. It struck a good balance between old and new. GPGPU could be a managed platform embedded in C/C++ applications. Native and managed code could co-exist naturally and work together. We don't write applications software in low level languages. We use high level languages. Why should applications that use GPUs be any different? =============================================================================== How is Chai related to PeakStream? Other than the inspirational idea, there is no other relationship. I never used the PeakStream platform product in real life. All I have seen are marketing whitepapers, presentations, and a few research papers that compare competing GPGPU platforms circa 2007. To be honest, this lack of exposure to PeakStream's technology is not out of respect for clean-room development. Had I been able to find a copy of PeakStream, I would have looked at it. It is just that Google did a thorough job of discontinuing PeakStream's products after the acquisition. The engineering and design of Chai is almost certainly very different from PeakStream as a result of this independent history. =============================================================================== Where is this going? Write once and run anywhere. Code that adapts and optimizes across different kinds of computers. This is the heterogeneous computing problem. =============================================================================== Historical comments below: July 12th 2011 - pre alpha, development code checkpoint This code is not in a working state. The interpreter works. I believe memory management works. That includes (primitive but hopefully correct) management of host arrays and compute device memory buffers and images. Everything JIT comes next: 1) tracing JIT for glue; 2) auto-tuned high arithmetic intensity library kernels. Feb 12 2012 - alpha release, working but needs more work Code is in a working state with useful end-to-end functionality. Happy path functionality appears to be reliable. Autotuned GEMM and GEMV work with dynamically generated kernels from the JIT. Single threaded data parallel vectorization works for GPU compute devices. Multi-threaded gather/scatter vectorized scheduling works but is unstable (depends on compute device). Trace continuation (typically loops) with multiple readouts is working. Autotuning cold and warm start times are excessive. Mar 25 2012 - alpha 2 release, refactored JIT and first class integer arrays JIT code is reorganized into several subdirectories. The main new feature is unsigned and signed integer array type support. Mixed integer and floating point calculation is supported (includes autotuned GEMM and GEMV). Generated kernels make better use of private registers in generated code. Gathering now works in the kirch.cpp sample. The md5.cpp sample performs vectorized MD5 hash code calculation on the GPU. May 25 2012 - alpha 3 release, gathering and random number generation Gather operations with constant translation stencils (many image processing filters) are optimized to use images when possible. This takes advantage of the high speed L1 texture cache. Random number generation on the GPU is supported with the Random123 counter based PRNG. This includes uniform and normal distributions (using the Box-Muller transform). June 24 2012 - alpha 4 release, inline OpenCL kernels in managed code Mixed language programming with OpenCL kernels and PeakStream DSL. Very large .cpp files for API, enqueue trace, and memory manager broken up into more manageably sized pieces. Fixed one unhappy path: array variable width can be arbitrary, does not need to be a multiple of the underlying vector length. July 14 2012 - alpha 5 release, extensive refactoring, no functionality changes Source code restructuring and cleanup motivated by future embedded platform support (configurable build in Buildroot style? i.e. Buildchai). Interpreter and JIT are separate paths. Significant false sharing of common code removed - recognizes the interpreter is mostly for debugging as full language API through the JIT is supported.