update docs

GiggleLiu · Feb 23, 2018 · 87f26bc · 87f26bc
1 parent 807c953
commit 87f26bc
Show file tree

Hide file tree

Showing 4 changed files with 29 additions and 0 deletions.
diff --git a/docs/ML-handson.pdf b/docs/ML-handson.pdf
diff --git a/gpu/README.md b/gpu/README.md
@@ -0,0 +1,7 @@
+# How to Calculate FLOPS for CPU and GPU
+Here is an example to calculate single precision FLOPS,
+```bash
+$ ipython notebook hardware.ipynb
+```
+The GPU part is only for users with properly configured Nvidia GPU card.
+A simple benchmark in the notebook requires using pytorch.
diff --git a/notebooks/README.md b/notebooks/README.md
@@ -0,0 +1,5 @@
+## Notebooks
+* Computation Graphs and Back Propagation: `computation_graph.ipynb`
+* Normalization flow for sampling: `nice.ipynb`
+* Restricted Boltzmann Machine for image restoration: `rbm_generation.ipynb`
+* Deep Neural Network as a Quantum Wave Function Ansatz: `rbm_ansatz.ipynb`
diff --git a/parallelism/README.md b/parallelism/README.md
@@ -0,0 +1,17 @@
+# A Simple Example of C level Acceleration Using Parallelism
+To start,
+```bash
+$ make
+$ ./cpu
+$ ./avx2
+$ ./cuda
+```
+It calculates saxpy function, and print system time elapse.
+
+1. Realization without parallelism: *cpu.cpp*. Here, you should not use `-O3` tag during compilation, otherwise, g++ uses avx2 automatically. Notice this automatic optimization is only achievable for simple functions.
+2. CPU parallelism using AVX2 instruction set: *avx2.cpp*.
+3. GPU parallelism using CUDA programming model: *cuda.cu*.
+It requires a CUDA library, and compiles using `nvcc`.
+Here, you will not see a GPU acceleration!
+Because the data transfer between system memory and GPU memory has a lot overhead and the complexity of saxpy function is only $O(N)$.
+To confirm this, time the excution part of program only please, you will see an amazing acceleration.