¿Para qué sirve el modelo?
El modelo DETR ResNet-101 sirve para la detección de objetos en imágenes, identificando y localizando múltiples objetos dentro de una imagen.

¿Quién o qué empresa ha entrenado el modelo?
El modelo ha sido entrenado por Facebook AI (actualmente conocido como Meta AI).

¿Con qué datos ha sido entrenado el modelo?
El modelo ha sido entrenado con el conjunto de datos COCO 2017, que es un gran conjunto de imágenes etiquetadas para la detección de objetos, segmentación y más.

¿Qué posible salida puede devolver el modelo?
El modelo devuelve una lista de objetos detectados en la imagen, incluyendo sus clases (por ejemplo, "perro", "bicicleta") y las coordenadas de las cajas delimitadoras que indican su posición en la imagen.

In [1]:
#Se importan las librerías necesarias
import torch
from PIL import Image
import requests
from transformers import DetrImageProcessor, DetrForObjectDetection

# Se verifica uso de GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using {device}")

Using cuda


In [2]:
#Se carga el modelo y el procesador
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# you can specify the revision tag if you don't want the timm dependency
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-101", revision="no_timm")
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-101", revision="no_timm")

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# convert outputs (bounding boxes and class logits) to COCO API
# let's only keep detections with score > 0.9
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
    )

Detected cat with confidence 0.998 at location [344.06, 24.85, 640.34, 373.74]
Detected remote with confidence 0.997 at location [328.13, 75.93, 372.81, 187.66]
Detected remote with confidence 0.997 at location [39.34, 70.13, 175.56, 118.78]
Detected cat with confidence 0.998 at location [15.36, 51.75, 316.89, 471.16]
Detected couch with confidence 0.995 at location [-0.19, 0.71, 639.73, 474.17]


In [3]:
#Se seleccionan imagenes al azar para probar el modelo
url = "http://images.cocodataset.org/val2017/000000281759.jpg"
image = Image.open(requests.get(url, stream=True).raw)

#Se procesa la imagen
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

#Se convierten las salidas a la API de COCO
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]

#Se imprimen los resultados
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
    )

Detected person with confidence 0.999 at location [381.7, 136.93, 465.25, 417.76]
Detected umbrella with confidence 0.999 at location [194.92, 80.89, 327.29, 211.31]
Detected person with confidence 0.999 at location [207.91, 128.04, 286.73, 406.64]
Detected umbrella with confidence 0.998 at location [54.14, 72.38, 207.28, 207.45]
Detected person with confidence 0.995 at location [298.2, 143.06, 378.06, 413.69]
Detected umbrella with confidence 0.998 at location [346.83, 99.26, 478.11, 222.89]
Detected handbag with confidence 0.979 at location [264.1, 187.96, 284.92, 233.42]
Detected umbrella with confidence 0.99 at location [464.02, 90.11, 603.31, 216.48]
Detected person with confidence 0.999 at location [454.08, 140.33, 534.55, 421.04]
Detected umbrella with confidence 0.994 at location [258.02, 227.61, 426.74, 347.87]
Detected car with confidence 0.99 at location [33.45, 300.07, 105.07, 321.56]
Detected person with confidence 1.0 at location [96.12, 95.84, 197.52, 403.26]
