Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
29 additions
and
0 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# How to Calculate FLOPS for CPU and GPU | ||
Here is an example to calculate single precision FLOPS, | ||
```bash | ||
$ ipython notebook hardware.ipynb | ||
``` | ||
The GPU part is only for users with properly configured Nvidia GPU card. | ||
A simple benchmark in the notebook requires using pytorch. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
## Notebooks | ||
* Computation Graphs and Back Propagation: `computation_graph.ipynb` | ||
* Normalization flow for sampling: `nice.ipynb` | ||
* Restricted Boltzmann Machine for image restoration: `rbm_generation.ipynb` | ||
* Deep Neural Network as a Quantum Wave Function Ansatz: `rbm_ansatz.ipynb` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# A Simple Example of C level Acceleration Using Parallelism | ||
To start, | ||
```bash | ||
$ make | ||
$ ./cpu | ||
$ ./avx2 | ||
$ ./cuda | ||
``` | ||
It calculates saxpy function, and print system time elapse. | ||
|
||
1. Realization without parallelism: *cpu.cpp*. Here, you should not use `-O3` tag during compilation, otherwise, g++ uses avx2 automatically. Notice this automatic optimization is only achievable for simple functions. | ||
2. CPU parallelism using AVX2 instruction set: *avx2.cpp*. | ||
3. GPU parallelism using CUDA programming model: *cuda.cu*. | ||
It requires a CUDA library, and compiles using `nvcc`. | ||
Here, you will not see a GPU acceleration! | ||
Because the data transfer between system memory and GPU memory has a lot overhead and the complexity of saxpy function is only $O(N)$. | ||
To confirm this, time the excution part of program only please, you will see an amazing acceleration. |