<a href="https://colab.research.google.com/github/weedge/doraemon-nb/blob/main/my_colab_gpu_topk.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# task
Given 8.5 million big data files, each data is an integer id vector of up to 128 dimensions (called doc), and the id value range is 0-50000.
Given a integer id vector of up to 128 dimensions (called query), the data set can be spread for optimization

Find the average score topk (k=100) of the number of data intersections in query and doc; Here we define the intersection fraction of item as:
query[i] == doc[j] (0<=i<query_size, 0<=j<doc_size) calculates an intersection, the average number of query and doc intersections /max(query_size,doc_size)

``` shell
./bin/query_doc_scoring <doc_file_name> <query_file_name> <output_filename>
```

# optimize
note: just optimize stand-alone, for dist m/r arch to schedule those instances
1. currency(cpu thread pool) + parallel(cpu openMP + gpu warp pool): cpu(baseline) -> cpu thread currency -> cpu + gpu -> cpu thread currency + gpu => dist
2. find or filter: use hash/bitmap(bloom)
3. topk sort: heap sort (partial_sort) -> bitonic sort (gpu parallel)
4. search: need build index (list(IVF,skip),tree or graph), orderly struct/model
5. SIMD: for cpu arch instruction set (intel cpu sse,avx2,avx512 etc..)
6. IO stream pipeline: for r query/docs file, (batch per thread, parallel Accelerators) , w res file

# reference
- https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
- https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
- https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
- https://docs.nvidia.com/cuda/cuda-runtime-api/index.html
- https://docs.nvidia.com/cuda/thrust/index.html
- https://nvlabs.github.io/cub/index.html
- https://stotko.github.io/stdgpu/api/memory.html
- https://www.youtube.com/watch?v=cOBtkPsgkus
- https://www.csd.uwo.ca/~mmorenom/HPC-Slides/Many_core_computing_with_CUDA.pdf
- [Exploring Performance Portability for Accelerators via High-level Parallel Patterns](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=4Ab_NBkAAAAJ&citation_for_view=4Ab_NBkAAAAJ:hqOjcs7Dif8C), [PPT](https://pdfs.semanticscholar.org/b34a/f7c4739d622379fa31a1e88155335061c1b1.pdf)
- https://zhuanlan.zhihu.com/p/52344300
-
- https://passlab.github.io/OpenMPProgrammingBook/cover.html
  


## code
1. https://github.com/Funatiq/bb_segsort
2. https://github.com/anilshanbhag/gpu-topk
3. https://github.com/heavyai/heavydb/blob/master/QueryEngine/TopKSort.cu

## paper
1. [Fast Segmented Sort on GPUs.](https://raw.github.com/weedge/learn/main/gpu/Fast%20Segmented%20Sort%20on%20GPUs.pdf)
2. [Efficient Top-K query processing on massively parallel hardware](https://raw.githubusercontent.com/weedge/learn/main/gpu/Efficient%20Top-K%20Query%20Processing%20on%20Massively%20Parallel%20Hardware.pdf)

In [1]:
!nvidia-smi

Mon Oct 30 12:23:02 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   59C    P8    12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!nvidia-smi -q



Timestamp                                 : Sun Oct 29 09:32:19 2023
Driver Version                            : 525.105.17
CUDA Version                              : 12.0

Attached GPUs                             : 1
GPU 00000000:00:04.0
    Product Name                          : Tesla T4
    Product Brand                         : NVIDIA
    Product Architecture                  : Turing
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Disabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1561820004565
    GPU UUID                              : GPU-a2a31bfc-37ea-

In [None]:
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb
!apt update
!apt install ./nsight-systems-2023.2.3_2023.2.3.1001-1_amd64.deb
!apt --fix-broken install


In [3]:
!wget "https://bj.bcebos.com/v1/ai-studio-online/9805dd2d2e8e472693efac637628e16b9f9c5be0fe30438bb4a80de3b386781a?responseContentDisposition=attachment%3B%20filename%3DSTI2_1017.zip&authorization=bce-auth-v1%2F5cfe9a5e1454405eb2a975c43eace6ec%2F2023-10-18T12%3A42%3A27Z%2F-1%2F%2F6b5388dcd9013bc9b340bb1806476afa938ce0c65f2f595e1a75f529e90e4187" -O STI2_1017.zip

--2023-10-30 12:23:48--  https://bj.bcebos.com/v1/ai-studio-online/9805dd2d2e8e472693efac637628e16b9f9c5be0fe30438bb4a80de3b386781a?responseContentDisposition=attachment%3B%20filename%3DSTI2_1017.zip&authorization=bce-auth-v1%2F5cfe9a5e1454405eb2a975c43eace6ec%2F2023-10-18T12%3A42%3A27Z%2F-1%2F%2F6b5388dcd9013bc9b340bb1806476afa938ce0c65f2f595e1a75f529e90e4187
Resolving bj.bcebos.com (bj.bcebos.com)... 103.235.46.61, 2409:8c04:1001:1002:0:ff:b001:368a
Connecting to bj.bcebos.com (bj.bcebos.com)|103.235.46.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1005669898 (959M) [application/octet-stream]
Saving to: ‘STI2_1017.zip’


2023-10-30 12:24:09 (47.5 MB/s) - ‘STI2_1017.zip’ saved [1005669898/1005669898]



In [None]:
!rm -rf STI2 && unzip STI2_1017.zip && mv STI2\ 2 STI2

In [5]:
!sh STI2/build.sh

build success


In [None]:
!STI2/bin/query_doc_scoring STI2/translate/docs.txt STI2/translate/querys ./res_2.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [6]:
!nvcc STI2/src/main.cpp STI2/src/topk.cu -o STI2/bin/query_doc_scoring_gpu  \
	-ISTI2/src \
	-L/usr/local/cuda/lib64 -lcudart -lcuda \
	-std=c++11 \
	-O3 \
	-g


In [31]:
!STI2/bin/query_doc_scoring_gpu STI2/translate/docs.txt STI2/translate/querys ./res_3.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [None]:
!diff res.txt STI2/translate/res/result.txt
!diff res_1.txt STI2/translate/res/result.txt

1c1
< 2705
---
> 2990
1c1
< 2712
---
> 2990


In [12]:
!nvprof --print-gpu-trace STI2/bin/query_doc_scoring_gpu STI2/translate/docs.txt STI2/translate/querys ./res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

### run topk

In [None]:
!make -C topk/ BUILD_TYPE=Release

make: Entering directory '/content/topk'
mkdir -p bin
g++ ./main.cpp -o ./bin/query_doc_scoring_cpu  -I./ \
	-std=c++11 -Wall -march=native -pthread \
	-O3 \
	-g 
[01m[K./main.cpp:[m[K In function ‘[01m[Kvoid doc_query_scoring_cpu(std::vector<std::vector<short unsigned int> >&, int, std::vector<std::vector<short unsigned int> >&, std::vector<short unsigned int>&, std::vector<std::vector<int> >&, std::vector<std::vector<float> >&)[m[K’:
  160 |         for (int id = 0; [01;35m[Kid < docs.size()[m[K; ++id) {
      |                          [01;35m[K~~~^~~~~~~~~~~~~[m[K
  171 |         for (int id = 0; [01;35m[Kid < docs.size()[m[K; id++) {
      |                          [01;35m[K~~~^~~~~~~~~~~~~[m[K
  174 |             for (int j = 0; [01;35m[Kj < doc.size()[m[K; j++) {
      |                             [01;35m[K~~^~~~~~~~~~~~[m[K
g++ ./main.cpp -o ./bin/query_doc_scoring_cpu_concurency  \
	-I./ \
	-std=c++11 -Wall -march=native -pthread \
	-O3 \
	-DC

In [None]:
!topk/bin/query_doc_scoring_cpu STI2/translate/docs.txt STI2/translate/querys ./cpu_res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [None]:
!diff cpu_res.txt STI2/translate/res/result.txt

1c1
< 92697
---
> 2990


In [None]:
!topk/bin/query_doc_scoring_cpu_concurency STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [None]:
!diff cpu_concurency_res.txt STI2/translate/res/result.txt

1c1
< 67424
---
> 2990


In [63]:
!make -C topk/ build_cpu_gpu BUILD_TYPE=Release

make: Entering directory '/content/topk'
mkdir -p bin
nvcc ./main.cpp ./topk.cu -o ./bin/query_doc_scoring_cpu_gpu  \
	-I./ \
	-L/usr/local/cuda/lib64 -lcudart -lcuda \
	-std=c++11  \
	-O3 \
	-DGPU \
	-g
make: Leaving directory '/content/topk'


In [64]:
!topk/bin/query_doc_scoring_cpu_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_gpu_res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [55]:
!diff cpu_gpu_res.txt STI2/translate/res/result.txt

1c1
< 2647
---
> 2990


In [56]:
!nvprof --print-gpu-trace topk/bin/query_doc_scoring_cpu_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_gpu_res_1.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [48]:
!nsys profile  -o report_cpu_gpu.nsys-rep topk/bin/query_doc_scoring_cpu_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_gpu_res_1.txt


start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [49]:
!ncu --set full --call-stack --nvtx -o report_cpu_gpu topk/bin/query_doc_scoring_cpu_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_gpu_res_1.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [74]:
!make -C topk/ build_cpu_concurency_gpu BUILD_TYPE=Release

make: Entering directory '/content/topk'
mkdir -p bin
nvcc ./main.cpp ./topk.cu -o ./bin/query_doc_scoring_cpu_concurency_gpu  \
	-I./ \
	-L/usr/local/cuda/lib64 -lcudart -lcuda \
	-std=c++11  \
	-O3 \
	-DCPU_CONCURENCY \
	-DGPU \
	-g
make: Leaving directory '/content/topk'


In [75]:
!topk/bin/query_doc_scoring_cpu_concurency_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_gpu_res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [76]:
!diff cpu_concurency_gpu_res.txt STI2/translate/res/result.txt

1c1
< 3250
---
> 2990


In [77]:
!nvprof --print-gpu-trace topk/bin/query_doc_scoring_cpu_concurency_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_gpu_res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [78]:
!nsys profile  -o report_cpu_concurency_gpu.nsys-rep topk/bin/query_doc_scoring_cpu_concurency_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_gpu_res.txt


start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [79]:
!ncu --set full --call-stack --nvtx -o report_cpu_concurency_gpu topk/bin/query_doc_scoring_cpu_concurency_gpu STI2/translate/docs.txt STI2/translate/querys ./cpu_concurency_gpu_res.txt

start get topk
query1.txt:10, 11, 16, 17, 42, 60, 22524, 22546, 22590, 22784, 23212, 23427, 23485, 23525, 23554, 24129, 24133, 24645, 24804, 24875, 25129, 25242, 25502, 25705, 25994, 26000, 26045, 26046, 26077, 26114, 26247, 26338, 26407, 27263, 27468, 27513, 28100, 40111, 40228, 40388, 41700, 45156, 45946, 46367, 47181, 47460, 47672
query2.txt:10, 16, 18, 21, 22, 23, 30, 42, 43, 44, 45, 54, 22497, 22512, 22524, 22533, 22535, 22608, 22624, 22790, 22828, 22836, 22885, 23188, 23381, 23409, 23558, 24103, 24197, 24250, 24496, 24918, 24974, 24987, 25179, 25317, 25827, 25994, 25996, 26009, 26015, 26023, 26030, 26050, 26052, 26082, 26096, 26205, 26247, 27399, 27475, 40029, 40300, 40416, 40504, 40696, 40837, 41166, 41172, 41336, 41407, 41516, 43247, 43309, 44547, 44795, 45101, 48828
query3.txt:11, 12, 13, 14, 21, 22, 23, 33, 42, 53, 61, 1380, 1545, 1546, 1557, 1560, 1566, 1569, 1583, 1646, 1759, 1762, 1787, 1794, 1877, 1882, 1892, 2069, 2120, 2146, 2368, 2670, 2888, 3022, 3327, 3335, 22460, 22

In [57]:
!sleep 864000

^C


### just dev/test with colab vim ❤ 🐑 🐑 🐑

In [None]:
!nvcc STI2/src/main.cpp STI2/src/topk.cu -o ./query_doc_scoring -I./ -ISTI2/src -L/usr/local/cuda/lib64 -lcudart -lcuda  -O3 -g -std=c++17

In [None]:
!./query_doc_scoring ./docs.txt ./querys ./res.txt

In [None]:
!nvcc -g -O3 -o ./threadpool_example threadpool_example.cpp -std=c++11

In [None]:
!./threadpool_example

hardware concurrency:2
result:0
result:1
cost 1004ms
