diff --git a/docs/ML-handson.pdf b/docs/ML-handson.pdf index bd82877..ba67806 100644 Binary files a/docs/ML-handson.pdf and b/docs/ML-handson.pdf differ diff --git a/gpu/README.md b/gpu/README.md new file mode 100644 index 0000000..0e912f4 --- /dev/null +++ b/gpu/README.md @@ -0,0 +1,7 @@ +# How to Calculate FLOPS for CPU and GPU +Here is an example to calculate single precision FLOPS, +```bash +$ ipython notebook hardware.ipynb +``` +The GPU part is only for users with properly configured Nvidia GPU card. +A simple benchmark in the notebook requires using pytorch. diff --git a/notebooks/README.md b/notebooks/README.md new file mode 100644 index 0000000..78892fc --- /dev/null +++ b/notebooks/README.md @@ -0,0 +1,5 @@ +## Notebooks +* Computation Graphs and Back Propagation: `computation_graph.ipynb` +* Normalization flow for sampling: `nice.ipynb` +* Restricted Boltzmann Machine for image restoration: `rbm_generation.ipynb` +* Deep Neural Network as a Quantum Wave Function Ansatz: `rbm_ansatz.ipynb` diff --git a/parallelism/README.md b/parallelism/README.md new file mode 100644 index 0000000..6139bd5 --- /dev/null +++ b/parallelism/README.md @@ -0,0 +1,17 @@ +# A Simple Example of C level Acceleration Using Parallelism +To start, +```bash +$ make +$ ./cpu +$ ./avx2 +$ ./cuda +``` +It calculates saxpy function, and print system time elapse. + +1. Realization without parallelism: *cpu.cpp*. Here, you should not use `-O3` tag during compilation, otherwise, g++ uses avx2 automatically. Notice this automatic optimization is only achievable for simple functions. +2. CPU parallelism using AVX2 instruction set: *avx2.cpp*. +3. GPU parallelism using CUDA programming model: *cuda.cu*. +It requires a CUDA library, and compiles using `nvcc`. +Here, you will not see a GPU acceleration! +Because the data transfer between system memory and GPU memory has a lot overhead and the complexity of saxpy function is only $O(N)$. +To confirm this, time the excution part of program only please, you will see an amazing acceleration.