# Características del GPU

El objetivo del ejemplo es mostrar las caracteristicas del GPU que disponibiliza colab.

## Listar el tipo de GPU que esta activa en el cuaderno:

In [2]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-d7c19abf-9992-15ea-19ff-7d7cccf652ac)


## Listar máxima cantidad de Grillas /Bloques soportados por el GPU:

In [25]:
!rm -rf cuda-samples
!git clone https://github.com/NVIDIA/cuda-samples.git
! cd cuda-samples/Samples/1_Utilities/deviceQuery; nvcc deviceQuery.cpp -I ../../../Common -o deviceQuery
!echo "------------------------------------------------------------------- "
!cuda-samples/Samples/1_Utilities/deviceQuery/deviceQuery

Cloning into 'cuda-samples'...
remote: Enumerating objects: 28487, done.[K
remote: Counting objects: 100% (13959/13959), done.[K
remote: Compressing objects: 100% (1522/1522), done.[K
remote: Total 28487 (delta 13269), reused 12437 (delta 12437), pack-reused 14528 (from 2)[K
Receiving objects: 100% (28487/28487), 135.47 MiB | 17.31 MiB/s, done.
Resolving deltas: 100% (24982/24982), done.
Updating files: 100% (2241/2241), done.
------------------------------------------------------------------- 
cuda-samples/Samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla T4"
  CUDA Driver Version / Runtime Version          12.4 / 12.5
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 15095 MBytes (15828320256 bytes)
  (040) Multiprocessors, (064) CUDA Cores/MP:    2560 CUDA Cores
  GPU Max Clock rate:                      

# Ejemplo Hola Mundo con GPU.

## Se instala el módulo de cuda para python

In [21]:
!pip install pycuda

Collecting pycuda
  Downloading pycuda-2025.1.tar.gz (1.7 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.7 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m60.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting pytools>=2011.2 (from pycuda)
  Downloading pytools-2025.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting siphash24>=1.6 (from pytools>=2011.2->pycuda)
  Downloading siphash24-1.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.3 kB)
Downloading pytools-2025.1.6-py3-none-any.whl (95 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.0/96.0 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading siphash24-1.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.w

## Reinicio el buffer de plataforma Colab, donde la GPU escribe en lugar de la consola.

In [30]:
!>/var/colab/app.log

## Ejecuto el ejemplo Hola Mundo

Se puede demotrar el comportamiento de la forma de planificación de hilos. Ademas que el kernel ahora soporta la función printf().

In [34]:
#!/usr/bin/env python
# --------------------------------------------
#@title Parámetros de ejecución { vertical-output: true }

cantidad_N =   15#@param {type: "number"}
# --------------------------------------------
import numpy
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

pycuda.tools.make_default_context()

# CPU - Defino la función kernel que ejecutará en GPU.
module = SourceModule("""
#include <stdio.h>
__global__ void kernel_HolaMundo( int N )
{
  int idx = threadIdx.x + blockIdx.x*blockDim.x;

  if( idx < N )
  {
    printf( "Hola Mundo desde el GPU - idx %d, Bloque id %d, Thread id %d \\n ", idx, blockIdx.x, threadIdx.x );
  }
  else
  {
    printf( "No saludo, porque soy un hilo planificado de mas - idx %d \\n ", idx );
  }

}
""")

# CPU - Genero la función kernel.
kernel = module.get_function("kernel_HolaMundo")

dim_hilo = 32
dim_bloque = int( (cantidad_N+dim_hilo-1) / dim_hilo )

#TODO: Ojo, con los tipos de las variables en el kernel.
kernel( numpy.int32(cantidad_N),  block=( dim_hilo, 1, 1 ),grid=(dim_bloque, 1,1) )

cuda.Context.synchronize()

print( "Hola Mundo desde el CPU => Thread x: ", dim_hilo, ", Bloque x:", dim_bloque )



Hola Mundo desde el CPU => Thread x:  32 , Bloque x: 1


Muestro el buffer de COLAB:

In [32]:
cat /var/colab/app.log

{"pid":7,"type":"jupyter","level":40,"msg":"Hola Mundo desde el GPU - idx 32, Bloque id 1, Thread id 0 ","time":"2025-06-04T16:24:15.445Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":" Hola Mundo desde el GPU - idx 0, Bloque id 0, Thread id 0 ","time":"2025-06-04T16:24:15.445Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":" Hola Mundo desde el GPU - idx 1, Bloque id 0, Thread id 1 ","time":"2025-06-04T16:24:15.445Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":" Hola Mundo desde el GPU - idx 2, Bloque id 0, Thread id 2 ","time":"2025-06-04T16:24:15.445Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":" Hola Mundo desde el GPU - idx 3, Bloque id 0, Thread id 3 ","time":"2025-06-04T16:24:15.445Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":" Hola Mundo desde el GPU - idx 4, Bloque id 0, Thread id 4 ","time":"2025-06-04T16:24:15.445Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":" Hola Mundo desde el GPU - idx 5, Bloque id 0, Thread id 5 ","time":"2025-06-04T

# Profiler en cuda.

In [None]:
!cd /usr/local/cuda/samples/0_Simple/vectorAdd/; make >/dev/null
#!/usr/local/cuda/samples/bin/x86_64/linux/release/vectorAdd
!/usr/local/cuda/bin/nvprof --csv --concurrent-kernels on --openmp-profiling on --print-gpu-trace --normalized-time-unit us --print-gpu-trace /usr/local/cuda/samples/bin/x86_64/linux/release/vectorAdd

[Vector addition of 50000 elements]
==346== NVPROF is profiling process 346, command: /usr/local/cuda/samples/bin/x86_64/linux/release/vectorAdd
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==346== Profiling application: /usr/local/cuda/samples/bin/x86_64/linux/release/vectorAdd
==346== Profiling result:
"Start","Duration","Grid X","Grid Y","Grid Z","Block X","Block Y","Block Z","Registers Per Thread","Static SMem","Dynamic SMem","Size","Throughput","SrcMemType","DstMemType","Device","Context","Stream","Name","Correlation_ID"
us,us,,,,,,,,B,B,KB,GB/s,,,,,,,
356140.739000,38.911000,,,,,,,,,,195.312500,4.786937,"Pageable","Device","Tesla K80 (0)","1","7","[CUDA memcpy HtoD]",114
356220.130000,28.767000,,,,,,,,,,195.312500,6.474937,"Pageable","Device","Tesla K80 (0)","1","7","[CUDA memcpy HtoD]",115
356281.793000,5.472000,196,1,1,256,1,1,8,0,0,,,,,"Tesla K

# Debug con  CUDA.

Ver el ejemplo desde:
https://wiki.tiker.net/PyCuda/FrequentlyAskedQuestions/#system-specific-questions


In [None]:
!cuda-gdb

NVIDIA (R) CUDA Debugger
11.0 release
Portions Copyright (C) 2007-2020 NVIDIA Corporation
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
(cuda-gdb) help
List of classes of commands:

aliases -- Aliases of other commands
breakpoints -- Making program stop at certain points
cuda  -- CUDA commands
data -- Examining data
files -- Specif