# Julia GPU Support

- https://juliagpu.gitlab.io/CUDA.jl/

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Julia-GPU-Support" data-toc-modified-id="Julia-GPU-Support-1">Julia GPU Support</a></span></li><li><span><a href="#Julia-Threads-(JULIA_NUM_THREADS)" data-toc-modified-id="Julia-Threads-(JULIA_NUM_THREADS)-2">Julia Threads (JULIA_NUM_THREADS)</a></span><ul class="toc-item"><li><span><a href="#Windows-10" data-toc-modified-id="Windows-10-2.1">Windows 10</a></span><ul class="toc-item"><li><span><a href="#Command-Prompt" data-toc-modified-id="Command-Prompt-2.1.1">Command Prompt</a></span></li></ul></li></ul></li><li><span><a href="#CUDA-Package" data-toc-modified-id="CUDA-Package-3">CUDA Package</a></span><ul class="toc-item"><li><span><a href="#Parallelization-using-CPU" data-toc-modified-id="Parallelization-using-CPU-3.1">Parallelization using CPU</a></span><ul class="toc-item"><li><span><a href="#Parallelization-using-Threads" data-toc-modified-id="Parallelization-using-Threads-3.1.1">Parallelization using Threads</a></span></li></ul></li></ul></li></ul></div>

# Julia Threads (JULIA_NUM_THREADS)

## Windows 10

### Command Prompt

- C:\Users\UkiDL>set JULIA_NUM_THREADS=6
- C:\Users\UkiDL>setx JULIA_NUM_THREADS=6  # set permanantly, persists after closing cmd window
- C:\Users\UkiDL>echo %JULIA_NUM_THREADS%
- 6

Unfortunately, this does not work when starting Jupyther from Anaconda prompt. 

- Threads.nthreads() 

returns 1

In [2]:
# ! set JULIA_NUM_THREADS=4
Threads.nthreads()

1

# CUDA Package

This package requires the CUDA driver to be already installed.

In [3]:
# Uncomment for the update
# using Pkg
# Pkg.add("CUDA")
# Pkg.test("CUDA")

In [4]:
size = 2^20             # 1,048,576
#x = fill(1.0f0, size)  # a vector filled with 1.0 (Float32)
x = fill(1.0, size)     # 1048576-element Array{Float64,1}
y = fill(2, size)       # 1048576-element Array{Int64,1}:

y .+= x                 # add each element of x to each element of y

using Test
@test all(y .== 3.0)

[32m[1mTest Passed[22m[39m

## Parallelization using CPU

In [5]:
function sequential_add!(y, x)
    for i in eachindex(y, x)
        @inbounds y[i] += x[i]
    end
    return nothing
end

fill!(y, 2)
sequential_add!(y, x)
@test all(y .== 3.0f0)

[32m[1mTest Passed[22m[39m

### Parallelization using Threads

In [6]:
function parallel_add!(y, x)
    Threads.@threads for i in eachindex(y, x)
        @inbounds y[i] += x[i]
    end
    return nothing
end

fill!(y, 2)
parallel_add!(y, x)
@test all(y .== 3.0f0)

[32m[1mTest Passed[22m[39m

In [7]:
using BenchmarkTools
@btime sequential_add!($y, $x)

  1.477 ms (0 allocations: 0 bytes)


In [8]:
@btime parallel_add!($y, $x)

  1.482 ms (6 allocations: 944 bytes)


In [9]:
;nvidia-smi

Wed Sep 16 21:11:01 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.15       Driver Version: 460.15       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 166... WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   50C    P0    20W /  N/A |    329MiB /  6144MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memor