A "Hello, World!" program generally is a computer program that outputs or displays the message "Hello, World!".
kernel():
Prints "Hello world!" from GPU.
A kernel function in CUDA is defined with __global__. NVCC picks it up and
generates (intermediate) GPU code for this function. It also generates a
placeholder (CPU code) that can be trigger the execution of this kernel
through CUDA runtime. Its return type is always "void".
1. printf is managed by CUDA driver (how?).
main():
Prints "Hello world!" from GPU & CPU.
A kernel is called with "kernel<<<blocks, threads>>>(arguments...)" syntax.
Each execution of kernel is called a thread. A number of threads are grouped
into thread blocks. All thread blocks of a kernel call are grouped into a
grid. Threads within a block can communicate and synchronize with each
other, but blocks execute independently of each other (though they can still
communicate through global GPU memory).
1. Kernel is called with 12 "threads".
2. Waits for GPU to finish executing the kernel.
3. CPU prints after GPU is already done.
$ nvcc -std=c++17 -Xcompiler -O3 main.cu
$ ./a.out
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# GPU: Hello world!
# CPU: Hello world!
See main.cu for code.