Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
GiggleLiu committed Feb 23, 2018
1 parent 807c953 commit 87f26bc
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 0 deletions.
Binary file modified docs/ML-handson.pdf
Binary file not shown.
7 changes: 7 additions & 0 deletions gpu/README.md
@@ -0,0 +1,7 @@
# How to Calculate FLOPS for CPU and GPU
Here is an example to calculate single precision FLOPS,
```bash
$ ipython notebook hardware.ipynb
```
The GPU part is only for users with properly configured Nvidia GPU card.
A simple benchmark in the notebook requires using pytorch.
5 changes: 5 additions & 0 deletions notebooks/README.md
@@ -0,0 +1,5 @@
## Notebooks
* Computation Graphs and Back Propagation: `computation_graph.ipynb`
* Normalization flow for sampling: `nice.ipynb`
* Restricted Boltzmann Machine for image restoration: `rbm_generation.ipynb`
* Deep Neural Network as a Quantum Wave Function Ansatz: `rbm_ansatz.ipynb`
17 changes: 17 additions & 0 deletions parallelism/README.md
@@ -0,0 +1,17 @@
# A Simple Example of C level Acceleration Using Parallelism
To start,
```bash
$ make
$ ./cpu
$ ./avx2
$ ./cuda
```
It calculates saxpy function, and print system time elapse.

1. Realization without parallelism: *cpu.cpp*. Here, you should not use `-O3` tag during compilation, otherwise, g++ uses avx2 automatically. Notice this automatic optimization is only achievable for simple functions.
2. CPU parallelism using AVX2 instruction set: *avx2.cpp*.
3. GPU parallelism using CUDA programming model: *cuda.cu*.
It requires a CUDA library, and compiles using `nvcc`.
Here, you will not see a GPU acceleration!
Because the data transfer between system memory and GPU memory has a lot overhead and the complexity of saxpy function is only $O(N)$.
To confirm this, time the excution part of program only please, you will see an amazing acceleration.

0 comments on commit 87f26bc

Please sign in to comment.