# YOLO Object Detection

Este notebook implementa detección de objetos usando YOLO v3.

**Nota importante**: El código ha sido actualizado para ser compatible con TensorFlow 2.x y versiones modernas de Keras.

In [1]:
import cv2
print(cv2.__version__)


4.12.0


In [2]:
# Verificar versiones de las librerías
import tensorflow as tf
import numpy as np

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {tf.keras.__version__}")
print(f"NumPy version: {np.__version__}")

2025-11-11 15:58:33.897467: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-11-11 15:58:34.038164: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-11 15:58:37.673845: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-11-11 15:58:37.673845: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.


TensorFlow version: 2.20.0
Keras version: 3.12.0
NumPy version: 2.2.6


## Cambios Realizados para Compatibilidad

El código original usaba `keras.backend` (TensorFlow 1.x), que ha sido deprecado. Se realizaron los siguientes cambios:

### Actualizaciones en `yolo_model.py`:
1. **Imports actualizados**: Se cambió de `keras.backend` a `tensorflow` y `numpy`
2. **Funciones reemplazadas**:
   - `K.reshape()` → `np.array().reshape()`
   - `K.variable()` → Eliminado (innecesario)
   - `K.sigmoid()` → Función `_sigmoid()` personalizada con NumPy
   - `K.exp()` → `np.exp()`
   - `K.get_value()` → Eliminado (trabajamos directamente con arrays)

3. **Nueva función**: `_sigmoid(x)` implementada con NumPy para la activación sigmoid

### Requisitos:
- TensorFlow >= 2.0
- NumPy
- OpenCV (cv2)

In [3]:
import os
import time
import cv2
import numpy as np
from model.yolo_model import YOLO

In [4]:
def process_image(img):
    """Resize, reduce and expand image.

    # Argument:
        img: original image.

    # Returns
        image: ndarray(64, 64, 3), processed image.
    """
    image = cv2.resize(img, (416, 416),
                       interpolation=cv2.INTER_CUBIC)
    image = np.array(image, dtype='float32')
    image /= 255.
    image = np.expand_dims(image, axis=0)

    return image

In [5]:
def get_classes(file):
    """Get classes name.

    # Argument:
        file: classes name for database.

    # Returns
        class_names: List, classes name.

    """
    with open(file) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]

    return class_names

In [6]:
def draw(image, boxes, scores, classes, all_classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        x, y, w, h = box

        top = max(0, np.floor(x + 0.5).astype(int))
        left = max(0, np.floor(y + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x + w + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y + h + 0.5).astype(int))

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(all_classes[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 1,
                    cv2.LINE_AA)

        print('class: {0}, score: {1:.2f}'.format(all_classes[cl], score))
        print('box coordinate x,y,w,h: {0}'.format(box))

    print()

In [7]:
def detect_image(image, yolo, all_classes):
    """Use yolo v3 to detect images.

    # Argument:
        image: original image.
        yolo: YOLO, yolo model.
        all_classes: all classes name.

    # Returns:
        image: processed image.
    """
    pimage = process_image(image)

    start = time.time()
    boxes, classes, scores = yolo.predict(pimage, image.shape)
    end = time.time()

    print('time: {0:.2f}s'.format(end - start))

    if boxes is not None:
        draw(image, boxes, scores, classes, all_classes)

    return image

In [None]:
def detect_video(video, yolo, all_classes):
    """Use yolo v3 to detect video.

    # Argument:
        video: video file.
        yolo: YOLO, yolo model.
        all_classes: all classes name.
    """
    video_path = os.path.join("videos", "test", video)
    camera = cv2.VideoCapture(video_path)
    cv2.namedWindow("detection", cv2.WINDOW_AUTOSIZE)

    # Prepare for saving the detected video
    sz = (int(camera.get(cv2.CAP_PROP_FRAME_WIDTH)),
        int(camera.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    fourcc = cv2.VideoWriter_fourcc(*'XVID')

    
    vout = cv2.VideoWriter()
    vout.open(os.path.join("videos", "res", video), fourcc, 20, sz, True)

    while True:
        res, frame = camera.read()

        if not res:
            break

        image = detect_image(frame, yolo, all_classes)
        cv2.imshow("detection", image)

        # Save the video frame by frame
        vout.write(image)

        if cv2.waitKey(110) & 0xff == 27:
                break

    vout.release()
    camera.release()
    

In [9]:
yolo = YOLO(0.3, 0.5)
file = 'data/coco_classes.txt'
all_classes = get_classes(file)

2025-11-11 15:58:41.433883: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
2025-11-11 15:58:42.695908: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 18874368 exceeds 10% of free system memory.
2025-11-11 15:58:42.718860: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 18874368 exceeds 10% of free system memory.
2025-11-11 15:58:42.728381: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 18874368 exceeds 10% of free system memory.
2025-11-11 15:58:42.820695: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 18874368 exceeds 10% of free system memory.
2025-11-11 15:58:42.853496: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:84] Allocation of 18874368 exceeds 10% of free system memory.


In [10]:
# Detectar imagenes

In [11]:
f = 'chicos.jpg'
path = 'imagenes/'+f
image = cv2.imread(path)
image = detect_image(image, yolo, all_classes)
cv2.imwrite('imagenes/res/' + f, image)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4s/step
time: 3.94s
class: person, score: 1.00
box coordinate x,y,w,h: [202.6034832  177.55987644 150.48674941 336.68389561]
class: person, score: 0.99
box coordinate x,y,w,h: [544.07720566 158.26020241 198.30849675 217.17637491]
class: person, score: 0.97
box coordinate x,y,w,h: [ 95.17945051 186.38505936 156.90134168 343.45874087]
class: person, score: 0.95
box coordinate x,y,w,h: [430.2491188  178.5872519  142.43227243 322.74839099]
class: person, score: 0.94
box coordinate x,y,w,h: [337.45839596 112.98629344 109.77636163 437.34764594]



True

In [12]:
# Detectar Videos

In [15]:
detect_video('video.mp4', yolo, all_classes)

OpenCV: FFMPEG: tag 0x6765706d/'mpeg' is not supported with codec id 2 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 711ms/step
time: 0.80s
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 782ms/step
time: 0.89s
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 792ms/step
time: 0.90s
class: car, score: 0.32
box coordinate x,y,w,h: [1458.15582275  760.99314451  135.93931537   74.96018206]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 971ms/step
time: 1.38s
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 772ms/step
time: 0.89s
class: truck, score: 0.36
box coordinate x,y,w,h: [1594.62238312  922.41105795  253.78778458  127.47312595]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 990ms/step
time: 1.41s
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 800ms/step
time: 0.89s
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 944ms/step
time: 1.04s
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 990ms/step
time: 1.10s
class: truck, score