<a href="https://colab.research.google.com/github/davidlealo/sic_ai_2025_sept/blob/main/6_proyectos/clase_41.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Conversación con grok para ejercicio de detección de victimas y victimarios

https://grok.com/share/bGVnYWN5_c75023c4-13a0-493f-a185-82dadf1d0c80


¡Entendido! Quieres ejemplos de código usando MoveNet para procesar videos y detectar poses, lo cual es útil para identificar agresores y víctimas en tu caso de detección de crímenes en videos. MoveNet es un modelo ligero de estimación de poses de Google, ideal para analizar skeletons (puntos clave del cuerpo) en tiempo real, y puede ayudarte a inferir roles basados en movimientos (ej. un agresor levantando un brazo para golpear, una víctima retrocediendo). A continuación, te doy dos ejemplos prácticos en Python usando MoveNet: uno para procesar un solo frame (imagen) y otro para videos, integrando detección de personas y análisis básico de roles.

### Prerrequisitos
- **Instalaciones**:
  ```bash
  pip install tensorflow tensorflow-hub opencv-python numpy
  ```
- **Hardware**: MoveNet es ligero, funciona en CPU, pero GPU acelera (Colab es buena opción).
- **Modelo**: Usaremos MoveNet.SinglePose (versión "Lightning" para rapidez o "Thunder" para precisión). Disponible en TensorFlow Hub.

### Ejemplo 1: Detección de Poses en una Imagen
Este código detecta puntos clave (keypoints) en una imagen estática, útil para entender cómo MoveNet procesa una persona.

```python
import tensorflow as tf
import tensorflow_hub as hub
import cv2
import numpy as np

# Carga MoveNet desde TensorFlow Hub (Lightning para rapidez)
model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = model.signatures['serving_default']

# Preprocesa imagen
def preprocess_image(image):
    img = cv2.resize(image, (192, 192))  # MoveNet Lightning usa 192x192
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)  # Añade batch dimension
    return img

# Detecta poses
def detect_pose(image):
    input_image = preprocess_image(image)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]  # [persona, keypoints, (y, x, confianza)]
    return keypoints

# Visualiza keypoints
def draw_keypoints(image, keypoints, conf_threshold=0.3):
    height, width = image.shape[:2]
    for keypoint in keypoints:
        y, x, confidence = keypoint
        if confidence > conf_threshold:
            x = int(x * width)
            y = int(y * height)
            cv2.circle(image, (x, y), 5, (0, 255, 0), -1)  # Dibuja punto verde
    return image

# Carga y procesa imagen
image_path = 'tu_imagen.jpg'  # Reemplaza con tu imagen
image = cv2.imread(image_path)
keypoints = detect_pose(image)
image_with_keypoints = draw_keypoints(image.copy(), keypoints)

# Guarda o muestra resultado
cv2.imwrite('output_image.jpg', image_with_keypoints)
cv2.imshow('Pose Detection', image_with_keypoints)
cv2.waitKey(0)
cv2.destroyAllWindows()
```

**Explicación**:
- Carga MoveNet desde TensorFlow Hub.
- Procesa la imagen (resize a 192x192, formato requerido por MoveNet).
- Extrae 17 keypoints (nariz, hombros, codos, muñecas, etc.) con coordenadas (y, x) y confianza.
- Dibuja puntos en la imagen si la confianza es >0.3.
- Útil para pruebas iniciales. Para roles (agresor/víctima), analiza keypoints como distancia entre manos y torso de otra persona.

### Ejemplo 2: Detección de Poses en Video con Análisis de Roles
Este código procesa un video, detecta personas con YOLOv8 (para identificar múltiples actores), aplica MoveNet para estimar poses y hace un análisis básico de movimientos para inferir agresor/víctima.

```python
import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from ultralytics import YOLO  # Para YOLOv8

# Carga modelos
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')  # Modelo YOLOv8 preentrenado (instala ultralytics: pip install ultralytics)

# Preprocesa frame para MoveNet
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

# Detecta poses
def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]  # [persona, keypoints, (y, x, conf)]
    return keypoints

# Analiza movimiento para roles (heurística simple)
def analyze_roles(keypoints_list, person_boxes):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        # Ejemplo: Si mano derecha (keypoint 10) está alta y cerca de otra persona, posible agresor
        right_wrist = keypoints[10]  # [y, x, conf]
        if right_wrist[2] > 0.3:  # Confianza suficiente
            wrist_y, wrist_x = right_wrist[0], right_wrist[1]
            for j, other_box in enumerate(person_boxes):
                if i != j:  # Evita compararse consigo mismo
                    ox, oy, ow, oh = other_box
                    if abs(wrist_x * 192 - ox) < 50 and wrist_y * 192 < oy:  # Mano cerca y alta
                        roles.append((i, 'agresor'))
                        roles.append((j, 'victima'))
                        return roles
    return roles

# Procesa video
def process_video(video_path, output_path='output_video.mp4'):
    cap = cv2.VideoCapture(video_path)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Detecta personas con YOLOv8
        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if box.cls == 0:  # 0 es 'person' en COCO
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        # Aplica MoveNet a cada persona
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[y:y+h, x:x+w]
            if person_crop.size > 0:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)

        # Analiza roles
        roles = analyze_roles(keypoints_list, person_boxes)

        # Dibuja resultados
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            cv2.putText(frame, role, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)
            # Dibuja keypoints
            for kp in keypoints:
                if kp[2] > 0.3:  # Confianza
                    kx, ky = int(kp[1] * w + x), int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 5, (255, 0, 0), -1)

        out.write(frame)

    cap.release()
    out.release()

# Uso
process_video('tu_video.mp4', 'output_video.mp4')
```

**Explicación**:
- **YOLOv8**: Detecta personas y extrae bounding boxes.
- **MoveNet**: Estima poses (17 keypoints) para cada persona detectada.
- **Análisis de Roles**: Heurística simple: si una mano está alta y cerca de otra persona, se etiqueta como "agresor". La otra como "víctima". Puedes mejorar esto con:
  - Modelos ST-GCN para analizar secuencias de poses.
  - Detección de velocidad (optical flow) para confirmar movimientos agresivos.
- **Salida**: Video con bounding boxes, keypoints y etiquetas de roles.

### 5. **Mejoras y Notas**
- **Precisión de Roles**: La heurística es básica. Para mejor precisión:
  - Entrena un modelo (ej. ST-GCN o SlowFast) con datasets como RWF-2000, anotando agresores/víctimas.
  - Usa optical flow (cv2.calcOpticalFlowFarneback) para detectar dirección/velocidad de movimientos.
- **Escalabilidad**: MoveNet es rápido (10-20 FPS en CPU). Para tiempo real, optimiza con TensorRT.
- **Datos**: Fine-tunea con datasets como SCVD o UCF-Crime, anotando roles. Herramientas como CVAT ayudan.
- **Ética**: Anonimiza rostros (ej. con Mediapipe Face Detection) para privacidad.
- **Audio**: Si los videos tienen audio, combina con diarización (Librosa o pyannote.audio) para contexto (ej. gritos).

### 6. **Siguientes Pasos**
- **Fine-Tuning**: Anota un subconjunto de tus videos con CVAT y entrena un modelo custom (ej. en Kinetics o RWF-2000).
- **Modelos Avanzados**: Explora VideoMAE o TimeSformer para acción y roles end-to-end.
- **Código Adicional**: Si quieres integrar optical flow o ST-GCN, puedo dar otro snippet. ¿Detalles sobre tus videos (duración, resolución, tipo de crímenes)? ¿O prefieres enfocarte en algo específico (ej. datasets, entrenamiento)? ¡Dime!

In [1]:
!pip install tensorflow tensorflow-hub opencv-python-headless ultralytics

Collecting ultralytics
  Downloading ultralytics-8.3.225-py3-none-any.whl.metadata (37 kB)
Collecting ultralytics-thop>=2.0.18 (from ultralytics)
  Downloading ultralytics_thop-2.0.18-py3-none-any.whl.metadata (14 kB)
Downloading ultralytics-8.3.225-py3-none-any.whl (1.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m62.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading ultralytics_thop-2.0.18-py3-none-any.whl (28 kB)
Installing collected packages: ultralytics-thop, ultralytics
Successfully installed ultralytics-8.3.225 ultralytics-thop-2.0.18


In [2]:
import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from ultralytics import YOLO  # Para YOLOv8

# Carga modelos
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')  # Modelo YOLOv8 preentrenado (instala ultralytics: pip install ultralytics)

# Preprocesa frame para MoveNet
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

# Detecta poses
def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]  # [persona, keypoints, (y, x, conf)]
    return keypoints

# Analiza movimiento para roles (heurística simple)
def analyze_roles(keypoints_list, person_boxes):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        # Ejemplo: Si mano derecha (keypoint 10) está alta y cerca de otra persona, posible agresor
        right_wrist = keypoints[10]  # [y, x, conf]
        if right_wrist[2] > 0.3:  # Confianza suficiente
            wrist_y, wrist_x = right_wrist[0], right_wrist[1]
            for j, other_box in enumerate(person_boxes):
                if i != j:  # Evita compararse consigo mismo
                    ox, oy, ow, oh = other_box
                    if abs(wrist_x * 192 - ox) < 50 and wrist_y * 192 < oy:  # Mano cerca y alta
                        roles.append((i, 'agresor'))
                        roles.append((j, 'victima'))
                        return roles
    return roles

# Procesa video
def process_video(video_path, output_path='output_video.mp4'):
    cap = cv2.VideoCapture(video_path)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Detecta personas con YOLOv8
        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if box.cls == 0:  # 0 es 'person' en COCO
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        # Aplica MoveNet a cada persona
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[y:y+h, x:x+w]
            if person_crop.size > 0:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)

        # Analiza roles
        roles = analyze_roles(keypoints_list, person_boxes)

        # Dibuja resultados
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            cv2.putText(frame, role, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)
            # Dibuja keypoints
            for kp in keypoints:
                if kp[2] > 0.3:  # Confianza
                    kx, ky = int(kp[1] * w + x), int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 5, (255, 0, 0), -1)

        out.write(frame)

    cap.release()
    out.release()

# Uso
process_video('tu_video.mp4', 'output_video.mp4')

Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt': 100% ━━━━━━━━━━━━ 6.2MB 316.7MB/s 0.0s


In [4]:
# Instalar dependencias
!pip install tensorflow tensorflow-hub opencv-python-headless ultralytics

import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from ultralytics import YOLO
from sklearn.metrics import f1_score
import matplotlib.pyplot as plt
from google.colab import files

# Carga modelos
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')  # Modelo YOLOv8 preentrenado

# Preprocesa frame para MoveNet
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

# Detecta poses
def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]  # [persona, keypoints, (y, x, conf)]
    return keypoints

# Analiza movimiento para roles (heurística simple)
def analyze_roles(keypoints_list, person_boxes):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        right_wrist = keypoints[10]  # [y, x, conf]
        if right_wrist[2] > 0.3:  # Confianza suficiente
            wrist_y, wrist_x = right_wrist[0], right_wrist[1]
            for j, other_box in enumerate(person_boxes):
                if i != j:  # Evita compararse consigo mismo
                    ox, oy, ow, oh = other_box
                    if abs(wrist_x * 192 - ox) < 50 and wrist_y * 192 < oy:  # Mano cerca y alta
                        roles.append((i, 'agresor'))
                        roles.append((j, 'victima'))
                        return roles
    return roles

# Calcula métricas (F1-score para roles)
def calculate_metrics(true_roles, pred_roles):
    y_true = [r for _, r in true_roles]
    y_pred = [r for _, r in pred_roles]
    if len(y_true) == len(y_pred):
        f1 = f1_score(y_true, y_pred, labels=['agresor', 'victima'], average='weighted')
        return f1
    return 0.0

# Procesa video
def process_video(video_path, output_path='output_video.mp4', ground_truth_roles=None):
    cap = cv2.VideoCapture(video_path)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

    frame_count = 0
    all_pred_roles = []

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Detecta personas con YOLOv8
        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if box.cls == 0:  # 0 es 'person' en COCO
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        # Aplica MoveNet a cada persona
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[y:y+h, x:x+w]
            if person_crop.size > 0:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)

        # Analiza roles
        roles = analyze_roles(keypoints_list, person_boxes)
        all_pred_roles.append(roles)

        # Dibuja resultados
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            cv2.putText(frame, role, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx, ky = int(kp[1] * w + x), int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 5, (255, 0, 0), -1)

        out.write(frame)
        frame_count += 1

    cap.release()
    out.release()

    # Calcula métricas si hay ground truth
    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, all_pred_roles[0])  # Ejemplo con un frame
        print(f"F1-Score para roles: {f1:.4f}")

    return output_path

# Sube el video
uploaded = files.upload()
video_path = list(uploaded.keys())[0]

# Ground truth de ejemplo (ajusta con tus datos reales)
ground_truth_roles = [(0, 'agresor'), (1, 'victima')]  # Asumiendo 2 personas en el video

# Procesa el video y descarga el resultado
output_path = process_video(video_path, 'output_video.mp4', ground_truth_roles)
files.download(output_path)



Saving people-detection.mp4 to people-detection.mp4

0: 384x640 (no detections), 77.4ms
Speed: 15.4ms preprocess, 77.4ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 6.4ms
Speed: 3.4ms preprocess, 6.4ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 6.2ms
Speed: 2.7ms preprocess, 6.2ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 6.0ms
Speed: 2.7ms preprocess, 6.0ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 6.0ms
Speed: 2.3ms preprocess, 6.0ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 5.9ms
Speed: 2.3ms preprocess, 5.9ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections), 7.2ms
Speed: 2.5ms preprocess, 7.2ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 (no detections),

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

¡Entendido! Te proporciono un código completo para Google Colab que realiza el proceso completo: descarga el dataset "Hockey Fight Videos" desde Kaggle, procesa un video usando YOLOv8 para detectar personas, MoveNet para estimar poses, y asigna etiquetas de "agresor" y "víctima" basadas en una heurística simple (mano derecha alta y cerca de otra persona). También incluye métricas básicas (F1-score) y visualización de etiquetas claras en el video de salida. El código está optimizado para Colab, con instrucciones para autenticar Kaggle, descargar el dataset, procesar un video y descargar el resultado.

### Características del Código
- **Dataset**: Usa "Hockey Fight Videos" (https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes), con clips de peleas y no-peleas, ideal para detectar interacciones físicas.
- **Modelos**: YOLOv8 para detección de personas, MoveNet para poses.
- **Etiquetado**: Asigna "agresor" (mano alta y cerca de otra persona) y "víctima" (persona cercana al movimiento agresivo). Etiquetas en rojo (agresor) y azul (víctima) en el video.
- **Métricas**: Calcula F1-score para roles, asumiendo un ground truth de ejemplo (puedes ajustarlo con anotaciones reales).
- **Colab**: Incluye instalación de dependencias, carga del dataset, y descarga del video procesado.

### Código Completo para Colab
```python
# 1. Instalar dependencias
!pip install -q tensorflow tensorflow-hub opencv-python-headless ultralytics sklearn
!pip install -q kaggle

import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from ultralytics import YOLO
from sklearn.metrics import f1_score
from google.colab import files
import os

# 2. Configurar Kaggle API
print("Sube tu kaggle.json (descárgalo desde Kaggle > Account > Create API Token)")
uploaded = files.upload()
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# 3. Descargar y descomprimir Hockey Fight Videos dataset
!kaggle datasets download -d yassershrief/hockey-fight-vidoes
!unzip -q hockey-fight-vidoes.zip

# 4. Cargar modelos
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')  # Modelo YOLOv8 preentrenado

# 5. Preprocesar frame para MoveNet
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

# 6. Detectar poses
def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]  # [persona, keypoints, (y, x, conf)]
    return keypoints

# 7. Analizar movimiento para roles (heurística: mano derecha alta y cerca)
def analyze_roles(keypoints_list, person_boxes):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        right_wrist = keypoints[10]  # Mano derecha: [y, x, conf]
        if right_wrist[2] > 0.3:  # Confianza suficiente
            wrist_y, wrist_x = right_wrist[0], right_wrist[1]
            for j, other_box in enumerate(person_boxes):
                if i != j:  # Evita compararse consigo mismo
                    ox, oy, ow, oh = other_box
                    if abs(wrist_x * 192 - ox) < 100 and wrist_y * 192 < oy:  # Mano alta y cerca
                        roles.append((i, 'agresor'))
                        roles.append((j, 'víctima'))
                        return roles
    return [(i, 'desconocido') for i in range(len(keypoints_list))]  # Default si no hay detección

# 8. Calcular métricas (F1-score para roles)
def calculate_metrics(true_roles, pred_roles):
    y_true = [r for _, r in true_roles]
    y_pred = [r for _, r in pred_roles]
    if y_true and y_pred and len(y_true) == len(y_pred):
        f1 = f1_score(y_true, y_pred, labels=['agresor', 'víctima', 'desconocido'], average='weighted')
        return f1
    return 0.0

# 9. Procesar video
def process_video(video_path, output_path='output_hockey.mp4', ground_truth_roles=None):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: No se pudo abrir el video {video_path}")
        return None
    
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))
    
    frame_count = 0
    all_pred_roles = []
    
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        # Reducir resolución para rapidez (opcional)
        frame = cv2.resize(frame, (640, 480))
        width, height = 640, 480
        
        # Detectar personas con YOLOv8
        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if box.cls == 0:  # 0 es 'person' en COCO
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))
        
        # Aplicar MoveNet a cada persona
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[max(0, y):y+h, max(0, x):x+w]
            if person_crop.size > 0:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)
            else:
                keypoints_list.append(np.zeros((17, 3)))  # Dummy si crop falla
        
        # Analizar roles
        roles = analyze_roles(keypoints_list, person_boxes)
        all_pred_roles.append(roles)
        
        # Dibujar resultados
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            # Etiquetas: rojo para agresor, azul para víctima
            color = (0, 0, 255) if role == 'agresor' else (255, 0, 0) if role == 'víctima' else (128, 128, 128)
            cv2.putText(frame, role, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color, 2)
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx, ky = int(kp[1] * w + x), int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 5, (255, 255, 0), -1)
        
        out.write(frame)
        frame_count += 1
        if frame_count % 10 == 0:
            print(f"Procesados {frame_count} frames")
    
    cap.release()
    out.release()
    
    # Calcular métricas si hay ground truth
    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, all_pred_roles[0])
        print(f"F1-Score para roles: {f1:.4f}")
    
    print(f"Video procesado guardado en: {output_path}")
    return output_path

# 10. Seleccionar un video y procesar
video_path = 'fights/fi001.mp4'  # Ajusta según el nombre exacto (verifica con !ls fights/)
ground_truth_roles = [(0, 'agresor'), (1, 'víctima')]  # Ejemplo, ajusta si tienes CSV
output_path = process_video(video_path, 'output_hockey.mp4', ground_truth_roles)

# 11. Descargar resultado
if output_path and os.path.exists(output_path):
    files.download(output_path)
else:
    print("Error: No se generó el video de salida")
```

### Instrucciones para Usar
1. **Crear Notebook en Colab**:
   - Ve a https://colab.research.google.com/, crea un nuevo notebook.
   - Activa GPU: Entorno de ejecución → Cambiar tipo → T4 GPU.

2. **Subir kaggle.json**:
   - Descarga tu API token desde Kaggle (Account → Create API Token).
   - En la celda de `files.upload()`, sube `kaggle.json`.

3. **Verificar Videos**:
   - Después de descomprimir (`!unzip`), lista los videos con:
     ```python
     !ls fights/  # O !ls no_fights/
     ```
   - Actualiza `video_path = 'fights/fi001.mp4'` con un nombre real (los videos son `fi001.mp4`, `fi002.mp4`, etc.).

4. **Ground Truth**:
   - El código usa un ejemplo: `[(0, 'agresor'), (1, 'víctima')]`. Para métricas reales, anota roles con CVAT o usa el CSV del dataset (si incluye labels). Ejemplo CSV:
     ```csv
     frame_id,person_id,role
     1,0,agresor
     1,1,víctima
     ```
     Carga con `pandas.read_csv()` y pásalo a `ground_truth_roles`.

5. **Ejecutar**:
   - Corre todas las celdas. Descargará el dataset (~100-200 MB), procesará el video y descargará `output_hockey.mp4`.
   - Verás etiquetas: **rojo** para "agresor", **azul** para "víctima", gris para "desconocido".

### Detalles del Código
- **Etiquetado Visual**: Usa colores distintos (rojo/azul) para diferenciar agresor/víctima en el video, con bounding boxes verdes y keypoints amarillos.
- **Heurística de Roles**: Detecta "agresor" si la mano derecha (keypoint 10) está alta (y baja en coordenadas) y cerca (<100 píxeles) de otra persona. Ajusté el threshold de 50 a 100 para captar más movimientos en peleas de hockey.
- **Métricas**: F1-score para roles, comparando con ground truth. Puedes añadir PCK para poses:
  ```python
  def pck(pred_keypoints, gt_keypoints, threshold=0.05):
      distances = np.sqrt(np.sum((pred_keypoints - gt_keypoints)**2, axis=1))
      return np.mean(distances < threshold * 192)
  ```
- **Optimización**: Redimensiona frames a 640x480 para evitar saturar memoria. Limita frames si es necesario:
  ```python
  if frame_count > 300: break  # 10 seg a 30fps
  ```

### Consejos
- **Debugging**: Si el video no carga, verifica el path con `!ls fights/`. Si YOLO falla, usa `yolo = YOLO('yolov8s.pt')` (más robusto pero lento).
- **Dataset**: Los videos de "fights" son ideales para detectar agresor/víctima. Prueba `no_fights/` para verificar falsos positivos.
- **Mejoras**:
  - Usa optical flow (`cv2.calcOpticalFlowFarneback`) para confirmar movimientos agresivos.
  - Fine-tunea MoveNet con RWF-2000 si la heurística es imprecisa.
- **Ética**: Los videos de hockey son públicos, pero anonimiza rostros si usas datos propios (con Mediapipe).

Si no tienes una cuenta de Kaggle o el dataset falla, prueba un video alternativo como `people-detection.mp4` de Intel (https://github.com/intel-iot-devkit/sample-videos). ¿Quieres ayuda con otro video, métricas específicas o ajustar la heurística? ¡Cuéntame cómo sale!

In [6]:
# 1. Instalar dependencias
!pip install -q tensorflow tensorflow-hub opencv-python-headless ultralytics sklearn
!pip install -q kaggle

import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from ultralytics import YOLO
from sklearn.metrics import f1_score
from google.colab import files
import os

# 2. Configurar Kaggle API
print("Sube tu kaggle.json (descárgalo desde Kaggle > Account > Create API Token)")
uploaded = files.upload()
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# 3. Descargar y descomprimir Hockey Fight Videos dataset
!kaggle datasets download -d yassershrief/hockey-fight-vidoes
!unzip -q hockey-fight-vidoes.zip

# 4. Cargar modelos
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')  # Modelo YOLOv8 preentrenado

# 5. Preprocesar frame para MoveNet
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

# 6. Detectar poses
def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]  # [persona, keypoints, (y, x, conf)]
    return keypoints

# 7. Analizar movimiento para roles (heurística: mano derecha alta y cerca)
def analyze_roles(keypoints_list, person_boxes):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        right_wrist = keypoints[10]  # Mano derecha: [y, x, conf]
        if right_wrist[2] > 0.3:  # Confianza suficiente
            wrist_y, wrist_x = right_wrist[0], right_wrist[1]
            for j, other_box in enumerate(person_boxes):
                if i != j:  # Evita compararse consigo mismo
                    ox, oy, ow, oh = other_box
                    if abs(wrist_x * 192 - ox) < 100 and wrist_y * 192 < oy:  # Mano alta y cerca
                        roles.append((i, 'agresor'))
                        roles.append((j, 'víctima'))
                        return roles
    return [(i, 'desconocido') for i in range(len(keypoints_list))]  # Default si no hay detección

# 8. Calcular métricas (F1-score para roles)
def calculate_metrics(true_roles, pred_roles):
    y_true = [r for _, r in true_roles]
    y_pred = [r for _, r in pred_roles]
    if y_true and y_pred and len(y_true) == len(y_pred):
        f1 = f1_score(y_true, y_pred, labels=['agresor', 'víctima', 'desconocido'], average='weighted')
        return f1
    return 0.0

# 9. Procesar video
def process_video(video_path, output_path='output_hockey.mp4', ground_truth_roles=None):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: No se pudo abrir el video {video_path}")
        return None

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

    frame_count = 0
    all_pred_roles = []

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Reducir resolución para rapidez (opcional)
        frame = cv2.resize(frame, (640, 480))
        width, height = 640, 480

        # Detectar personas con YOLOv8
        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if box.cls == 0:  # 0 es 'person' en COCO
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        # Aplicar MoveNet a cada persona
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[max(0, y):y+h, max(0, x):x+w]
            if person_crop.size > 0:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)
            else:
                keypoints_list.append(np.zeros((17, 3)))  # Dummy si crop falla

        # Analizar roles
        roles = analyze_roles(keypoints_list, person_boxes)
        all_pred_roles.append(roles)

        # Dibujar resultados
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            # Etiquetas: rojo para agresor, azul para víctima
            color = (0, 0, 255) if role == 'agresor' else (255, 0, 0) if role == 'víctima' else (128, 128, 128)
            cv2.putText(frame, role, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, color, 2)
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx, ky = int(kp[1] * w + x), int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 5, (255, 255, 0), -1)

        out.write(frame)
        frame_count += 1
        if frame_count % 10 == 0:
            print(f"Procesados {frame_count} frames")

    cap.release()
    out.release()

    # Calcular métricas si hay ground truth
    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, all_pred_roles[0])
        print(f"F1-Score para roles: {f1:.4f}")

    print(f"Video procesado guardado en: {output_path}")
    return output_path

# 10. Seleccionar un video y procesar
video_path = 'fights/fi001.mp4'  # Ajusta según el nombre exacto (verifica con !ls fights/)
ground_truth_roles = [(0, 'agresor'), (1, 'víctima')]  # Ejemplo, ajusta si tienes CSV
output_path = process_video(video_path, 'output_hockey.mp4', ground_truth_roles)

# 11. Descargar resultado
if output_path and os.path.exists(output_path):
    files.download(output_path)
else:
    print("Error: No se generó el video de salida")

  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... [?25l[?25herror
[1;31merror[0m: [1mmetadata-generation-failed[0m

[31m×[0m Encountered error while generating package metadata.
[31m╰─>[0m See above for output.

[1;35mnote[0m: This is an issue with the package mentioned above, not pip.
[1;36mhint[0m: See above for details.
Sube tu kaggle.json (descárgalo desde Kaggle > Account > Create API Token)


Saving kaggle.json to kaggle.json
Dataset URL: https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes
License(s): unknown
Downloading hockey-fight-vidoes.zip to /content
 70% 114M/164M [00:00<00:00, 1.18GB/s]
100% 164M/164M [00:00<00:00, 739MB/s] 
Error: No se pudo abrir el video fights/fi001.mp4
Error: No se generó el video de salida


In [7]:
# Celda 1: Instalar dependencias (con fix para kaggle)
!pip install -q tensorflow tensorflow-hub opencv-python-headless ultralytics sklearn
!pip install --no-cache-dir -q kaggle  # Fix para metadata error

import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from ultralytics import YOLO
from sklearn.metrics import f1_score
from google.colab import files
import os
import shutil

# Celda 2: Configurar Kaggle API (con renombrado automático)
print("Sube tu kaggle.json (desde Kaggle > Account > Create API Token)")
uploaded = files.upload()

# Renombrar si es necesario (maneja 'kaggle (1).json')
json_file = None
for filename in uploaded.keys():
    if 'kaggle' in filename.lower() and filename.endswith('.json'):
        json_file = filename
        break

if json_file:
    !mkdir -p ~/.kaggle
    !cp "{json_file}" ~/.kaggle/kaggle.json  # Usa comillas para espacios
    !chmod 600 ~/.kaggle/kaggle.json
    print(f"Archivo configurado: {json_file}")
else:
    print("Error: No se encontró kaggle.json. Sube el archivo correcto.")
    raise ValueError("Sube kaggle.json")

# Celda 3: Descargar y descomprimir dataset
!kaggle datasets download -d yassershrief/hockey-fight-vidoes
!unzip -q hockey-fight-vidoes.zip -d /content/hockey_dataset/

# Listar videos para verificar
print("Videos en fights/:")
!ls /content/hockey_dataset/fights/ | head -10
print("\nVideos en no_fights/:")
!ls /content/hockey_dataset/no_fights/ | head -10

# Celda 4: Cargar modelos
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')  # Ligero y rápido

# Celda 5: Funciones (preprocesar, detectar, analizar roles)
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]  # [17 keypoints, (y, x, conf)]
    return keypoints

def analyze_roles(keypoints_list, person_boxes):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        right_wrist = keypoints[10]  # Mano derecha
        if right_wrist[2] > 0.3:  # Confianza
            wrist_y, wrist_x = right_wrist[0], right_wrist[1]
            for j, other_box in enumerate(person_boxes):
                if i != j:
                    ox, oy, ow, oh = other_box
                    # Ajustado para hockey: threshold más amplio (100 px) y mano "alta" (baja y en coords)
                    if abs(wrist_x * 192 - (ox + ow/2)) < 100 and wrist_y * height < oy + oh/2:  # Cerca del centro de la otra persona
                        roles.append((i, 'agresor'))
                        roles.append((j, 'víctima'))
                        return roles
    # Default
    return [(i, 'desconocido') for i in range(len(keypoints_list))]

def calculate_metrics(true_roles, pred_roles):
    y_true = [r for _, r in true_roles]
    y_pred = [r for _, r in pred_roles]
    if y_true and y_pred and len(y_true) == len(y_pred):
        return f1_score(y_true, y_pred, labels=['agresor', 'víctima', 'desconocido'], average='weighted')
    return 0.0

# Celda 6: Función para procesar video
def process_video(video_path, output_path='output_hockey.mp4', ground_truth_roles=None):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: No se pudo abrir {video_path}")
        return None

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

    frame_count = 0
    all_pred_roles = []

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Resize para eficiencia
        frame_resized = cv2.resize(frame, (640, 480))
        r_width, r_height = 640, 480

        # Detectar personas
        results = yolo(frame_resized)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if int(box.cls) == 0:  # Persona
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        # Poses
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame_resized[y:y+h, x:x+w]
            if person_crop.size > 0 and w > 50 and h > 100:  # Evita crops pequeños
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)
            else:
                keypoints_list.append(np.zeros((17, 3)))

        # Roles
        roles = analyze_roles(keypoints_list, person_boxes)
        all_pred_roles.append(roles)

        # Dibujar
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame_resized, (x, y), (x + w, y + h), (0, 255, 0), 2)
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            # Colores: rojo=agresor, azul=víctima, gris=desconocido
            color = (0, 0, 255) if role == 'agresor' else (255, 0, 0) if role == 'víctima' else (128, 128, 128)
            cv2.putText(frame_resized, role.upper(), (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
            # Keypoints
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx = int(kp[1] * w + x)
                    ky = int(kp[0] * h + y)
                    cv2.circle(frame_resized, (kx, ky), 3, (255, 255, 0), -1)

        out.write(frame_resized)
        frame_count += 1
        if frame_count % 30 == 0:  # Progreso cada seg
            print(f"Procesados {frame_count} frames...")

        # Limita a 300 frames (~10 seg) para pruebas rápidas
        if frame_count >= 300:
            break

    cap.release()
    out.release()

    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, all_pred_roles[0] if all_pred_roles else [])
        print(f"F1-Score para roles: {f1:.4f}")

    print(f"¡Listo! Video guardado en: {output_path}")
    return output_path

# Celda 7: Procesar un video de ejemplo
# Elige uno de fights/ (ver lista arriba). Ejemplo: fi001.mp4 (común en este dataset)
video_path = '/content/hockey_dataset/fights/fi001.mp4'  # Ajusta si el nombre es diferente (ej. 00001.mp4)
ground_truth_roles = [(0, 'agresor'), (1, 'víctima')]  # Ejemplo; ajusta con reales

if os.path.exists(video_path):
    output_path = process_video(video_path, 'output_hockey.mp4', ground_truth_roles)
    files.download(output_path)
else:
    print(f"Video no encontrado: {video_path}. Usa !ls para ver nombres exactos.")

  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... [?25l[?25herror
[1;31merror[0m: [1mmetadata-generation-failed[0m

[31m×[0m Encountered error while generating package metadata.
[31m╰─>[0m See above for output.

[1;35mnote[0m: This is an issue with the package mentioned above, not pip.
[1;36mhint[0m: See above for details.
Sube tu kaggle.json (desde Kaggle > Account > Create API Token)


Saving kaggle.json to kaggle (2).json
Archivo configurado: kaggle (2).json
Dataset URL: https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes
License(s): unknown
hockey-fight-vidoes.zip: Skipping, found more recently modified local copy (use --force to force download)
Videos en fights/:
ls: cannot access '/content/hockey_dataset/fights/': No such file or directory

Videos en no_fights/:
ls: cannot access '/content/hockey_dataset/no_fights/': No such file or directory
Video no encontrado: /content/hockey_dataset/fights/fi001.mp4. Usa !ls para ver nombres exactos.


In [8]:
# Celda 1: Instalar dependencias
!pip install -q tensorflow tensorflow-hub opencv-python-headless ultralytics scikit-learn
!pip install --no-cache-dir -q kaggle  # Evita error de metadata

import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from ultralytics import YOLO
from sklearn.metrics import f1_score
from google.colab import files
import os
import shutil

# Celda 2: Configurar Kaggle API
print("Sube tu kaggle.json (desde Kaggle > Account > Create API Token)")
uploaded = files.upload()

json_file = None
for filename in uploaded.keys():
    if 'kaggle' in filename.lower() and filename.endswith('.json'):
        json_file = filename
        break

if json_file:
    !mkdir -p ~/.kaggle
    !cp "{json_file}" ~/.kaggle/kaggle.json
    !chmod 600 ~/.kaggle/kaggle.json
    print(f"Archivo configurado: {json_file}")
else:
    print("Error: No se encontró kaggle.json")
    raise ValueError("Sube kaggle.json")

# Celda 3: Descargar y descomprimir dataset
# Borrar cualquier zip previo para evitar conflictos
!rm -f hockey-fight-vidoes.zip
# Forzar descarga
!kaggle datasets download -d yassershrief/hockey-fight-vidoes --force
# Crear carpeta y descomprimir
!mkdir -p /content/hockey_dataset
!unzip -o hockey-fight-vidoes.zip -d /content/hockey_dataset/

# Verificar archivos
print("Contenido de /content/hockey_dataset/:")
!ls -R /content/hockey_dataset/

# Celda 4: Cargar modelos
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')

# Celda 5: Funciones
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]
    return keypoints

def analyze_roles(keypoints_list, person_boxes, height):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        right_wrist = keypoints[10]  # Mano derecha
        if right_wrist[2] > 0.3:
            wrist_y, wrist_x = right_wrist[0], right_wrist[1]
            for j, other_box in enumerate(person_boxes):
                if i != j:
                    ox, oy, ow, oh = other_box
                    if abs(wrist_x * 192 - (ox + ow/2)) < 100 and wrist_y * height < oy + oh/2:
                        roles.append((i, 'agresor'))
                        roles.append((j, 'víctima'))
                        return roles
    return [(i, 'desconocido') for i in range(len(keypoints_list))]

def calculate_metrics(true_roles, pred_roles):
    y_true = [r for _, r in true_roles]
    y_pred = [r for _, r in pred_roles]
    if y_true and y_pred and len(y_true) == len(y_pred):
        return f1_score(y_true, y_pred, labels=['agresor', 'víctima', 'desconocido'], average='weighted')
    return 0.0

def process_video(video_path, output_path='output_hockey.mp4', ground_truth_roles=None):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: No se pudo abrir {video_path}")
        return None

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (640, 480))

    frame_count = 0
    all_pred_roles = []

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        frame = cv2.resize(frame, (640, 480))
        r_width, r_height = 640, 480

        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if int(box.cls) == 0:
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[max(0, y):y+h, max(0, x):x+w]
            if person_crop.size > 0 and w > 50 and h > 100:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)
            else:
                keypoints_list.append(np.zeros((17, 3)))

        roles = analyze_roles(keypoints_list, person_boxes, r_height)
        all_pred_roles.append(roles)

        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            color = (0, 0, 255) if role == 'agresor' else (255, 0, 0) if role == 'víctima' else (128, 128, 128)
            cv2.putText(frame, role.upper(), (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx = int(kp[1] * w + x)
                    ky = int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 3, (255, 255, 0), -1)

        out.write(frame)
        frame_count += 1
        if frame_count % 30 == 0:
            print(f"Procesados {frame_count} frames...")
        if frame_count >= 300:
            break

    cap.release()
    out.release()

    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, all_pred_roles[0] if all_pred_roles else [])
        print(f"F1-Score para roles: {f1:.4f}")

    print(f"¡Listo! Video guardado en: {output_path}")
    return output_path

# Celda 6: Procesar video
# Ajusta el nombre según la salida de !ls
video_path = '/content/hockey_dataset/fights/fi001.mp4'  # Verifica con !ls
ground_truth_roles = [(0, 'agresor'), (1, 'víctima')]  # Ejemplo

if os.path.exists(video_path):
    output_path = process_video(video_path, 'output_hockey.mp4', ground_truth_roles)
    if output_path:
        files.download(output_path)
else:
    print(f"Video no encontrado: {video_path}")
    print("Verifica nombres con:")
    !ls /content/hockey_dataset/fights/

Sube tu kaggle.json (desde Kaggle > Account > Create API Token)


Saving kaggle.json to kaggle (3).json
Archivo configurado: kaggle (3).json
Dataset URL: https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes
License(s): unknown
Downloading hockey-fight-vidoes.zip to /content
 81% 133M/164M [00:00<00:00, 1.38GB/s]
100% 164M/164M [00:00<00:00, 1.09GB/s]
Archive:  hockey-fight-vidoes.zip
  inflating: /content/hockey_dataset/data/fi100_xvid.avi  
  inflating: /content/hockey_dataset/data/fi101_xvid.avi  
  inflating: /content/hockey_dataset/data/fi102_xvid.avi  
  inflating: /content/hockey_dataset/data/fi103_xvid.avi  
  inflating: /content/hockey_dataset/data/fi104_xvid.avi  
  inflating: /content/hockey_dataset/data/fi105_xvid.avi  
  inflating: /content/hockey_dataset/data/fi106_xvid.avi  
  inflating: /content/hockey_dataset/data/fi107_xvid.avi  
  inflating: /content/hockey_dataset/data/fi108_xvid.avi  
  inflating: /content/hockey_dataset/data/fi109_xvid.avi  
  inflating: /content/hockey_dataset/data/fi10_xvid.avi  
  inflating: /conte

In [9]:
# Celda 1: Instalar dependencias
!pip install -q tensorflow tensorflow-hub opencv-python-headless ultralytics scikit-learn
!pip install --no-cache-dir -q kaggle

import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from ultralytics import YOLO
from sklearn.metrics import f1_score
from google.colab import files
import os

# Celda 2: Configurar Kaggle API (ya lo tienes, pero por si acaso)
print("Si ya subiste kaggle.json, salta esto. De lo contrario, sube.")
uploaded = files.upload()

json_file = next((f for f in uploaded.keys() if 'kaggle' in f.lower() and f.endswith('.json')), None)
if json_file:
    !mkdir -p ~/.kaggle
    !cp "{json_file}" ~/.kaggle/kaggle.json
    !chmod 600 ~/.kaggle/kaggle.json

# Celda 3: Verificar dataset (ya descargado)
dataset_dir = '/content/hockey_dataset/data'
if os.path.exists(dataset_dir):
    fight_files = [f for f in os.listdir(dataset_dir) if f.startswith('fi')]
    non_fight_files = [f for f in os.listdir(dataset_dir) if f.startswith('no')]
    print(f"Total videos de peleas (fi*): {len(fight_files)}")
    print(f"Primeros 10: {fight_files[:10]}")
    print(f"Total videos sin peleas (no*): {len(non_fight_files)}")
    print(f"Primeros 10: {non_fight_files[:10]}")
else:
    print("Dataset no encontrado. Ejecuta descarga manual si es necesario.")
    # Si no, fuerza descarga de nuevo
    !rm -f hockey-fight-vidoes.zip
    !kaggle datasets download -d yassershrief/hockey-fight-vidoes --force
    !mkdir -p /content/hockey_dataset
    !unzip -o hockey-fight-vidoes.zip -d /content/hockey_dataset/

# Celda 4: Cargar modelos
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')

# Celda 5: Funciones
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]
    return keypoints

def analyze_roles(keypoints_list, person_boxes, height):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        right_wrist = keypoints[10]  # Mano derecha [y, x, conf]
        if right_wrist[2] > 0.3:
            wrist_y, wrist_x = right_wrist[0], right_wrist[1]
            for j, other_box in enumerate(person_boxes):
                if i != j:
                    ox, oy, ow, oh = other_box
                    # Heurística ajustada para hockey: mano cerca del centro de la otra persona
                    center_other_x = ox + ow / 2
                    center_other_y = oy + oh / 2
                    if abs(wrist_x * 192 - center_other_x) < 100 and wrist_y * height < center_other_y:
                        roles.append((i, 'agresor'))
                        roles.append((j, 'víctima'))
                        return roles
    return [(i, 'desconocido') for i in range(len(keypoints_list))]

def calculate_metrics(true_roles, pred_roles):
    y_true = [r for _, r in true_roles]
    y_pred = [r for _, r in pred_roles]
    if y_true and y_pred and len(y_true) == len(y_pred):
        return f1_score(y_true, y_pred, labels=['agresor', 'víctima', 'desconocido'], average='weighted')
    return 0.0

# Celda 6: Procesar video
def process_video(video_path, output_path='output_hockey.mp4', ground_truth_roles=None):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: No se pudo abrir {video_path}. Verifica si es AVI válido.")
        return None

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (640, 480))  # Salida en MP4

    frame_count = 0
    all_pred_roles = []

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        frame = cv2.resize(frame, (640, 480))
        r_height = 480  # Altura redimensionada

        # Detectar personas
        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if int(box.cls) == 0:
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        # Poses
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[max(0, y):y+h, max(0, x):x+w]
            if person_crop.size > 0 and w > 50 and h > 100:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)
            else:
                keypoints_list.append(np.zeros((17, 3)))

        # Roles
        roles = analyze_roles(keypoints_list, person_boxes, r_height)
        all_pred_roles.append(roles)

        # Dibujar
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)  # Box verde
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            color = (0, 0, 255) if role == 'agresor' else (255, 0, 0) if role == 'víctima' else (128, 128, 128)
            cv2.putText(frame, role.upper(), (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx = int(kp[1] * w + x)
                    ky = int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 3, (255, 255, 0), -1)  # Keypoints amarillos

        out.write(frame)
        frame_count += 1
        if frame_count % 30 == 0:
            print(f"Procesados {frame_count} frames...")
        if frame_count >= 300:  # Limite para pruebas
            print("Procesamiento limitado a 300 frames para prueba rápida.")
            break

    cap.release()
    out.release()

    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, all_pred_roles[0] if all_pred_roles else [])
        print(f"F1-Score para roles: {f1:.4f}")

    print(f"¡Listo! Video guardado en: {output_path}")
    return output_path

# Celda 7: Procesar un video de ejemplo
video_path = '/content/hockey_dataset/data/fi1_xvid.avi'  # Video de pelea (ajusta si quieres otro, ej. fi2_xvid.avi)
ground_truth_roles = [(0, 'agresor'), (1, 'víctima')]  # Ejemplo; ajusta si tienes anotaciones

if os.path.exists(video_path):
    output_path = process_video(video_path, 'output_hockey.mp4', ground_truth_roles)
    if output_path:
        files.download(output_path)
else:
    print(f"Video no encontrado: {video_path}")
    print("Verifica nombres con:")
    !ls /content/hockey_dataset/data/ | grep fi | head -10  # Primeros 10 fi*

Si ya subiste kaggle.json, salta esto. De lo contrario, sube.


Saving kaggle.json to kaggle (4).json
Total videos de peleas (fi*): 500
Primeros 10: ['fi289_xvid.avi', 'fi43_xvid.avi', 'fi48_xvid.avi', 'fi229_xvid.avi', 'fi20_xvid.avi', 'fi39_xvid.avi', 'fi327_xvid.avi', 'fi142_xvid.avi', 'fi4_xvid.avi', 'fi242_xvid.avi']
Total videos sin peleas (no*): 500
Primeros 10: ['no142__xvid.avi', 'no462_xvid.avi', 'no414_xvid.avi', 'no285_xvid.avi', 'no493_xvid.avi', 'no179_xvid.avi', 'no365_xvid.avi', 'no221_xvid.avi', 'no226_xvid.avi', 'no326_xvid.avi']

0: 480x640 2 persons, 83.8ms
Speed: 1.9ms preprocess, 83.8ms inference, 4.3ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 2 persons, 7.0ms
Speed: 2.0ms preprocess, 7.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 2 persons, 6.6ms
Speed: 1.8ms preprocess, 6.6ms inference, 1.9ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 3 persons, 6.4ms
Speed: 1.8ms preprocess, 6.4ms inference, 2.2ms postprocess per image at shape (1, 3, 480, 640)

0: 480x

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [10]:
# Celda 1-3: Asume que ya tienes el dataset descargado. Si no, ejecuta las celdas anteriores para descargar.

# Celda 4: Cargar modelos (si no lo hiciste)
movenet_model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = movenet_model.signatures['serving_default']
yolo = YOLO('yolov8n.pt')

# Celda 5: Funciones con heurística mejorada
def preprocess_frame(frame):
    img = cv2.resize(frame, (192, 192))
    img = tf.convert_to_tensor(img, dtype=tf.int32)
    img = tf.expand_dims(img, axis=0)
    return img

def detect_pose(frame):
    input_image = preprocess_frame(frame)
    outputs = movenet(input_image)
    keypoints = outputs['output_0'].numpy()[0, 0]
    return keypoints

def analyze_roles(keypoints_list, person_boxes, height):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        # Verificar ambas manos (keypoint 9: izquierda, 10: derecha)
        hands = [keypoints[9], keypoints[10]]  # [y, x, conf]
        for hand in hands:
            if hand[2] > 0.3:  # Confianza suficiente
                hand_y, hand_x = hand[0], hand[1]
                if hand_y < 0.5:  # Mano "alta" (y bajo en coordenadas normalizadas)
                    for j, other_box in enumerate(person_boxes):
                        if i != j:
                            ox, oy, ow, oh = other_box
                            center_other_x = ox + ow / 2
                            center_other_y = oy + oh / 2
                            dist_x = abs(hand_x * 192 - center_other_x)
                            dist_y = abs(hand_y * height - center_other_y)
                            if dist_x < 80 and dist_y < 80:  # Cerca en x e y
                                print(f"Detectado agresor {i} (mano en y={hand_y:.2f}, dist_x={dist_x:.2f}, dist_y={dist_y:.2f}) cerca de víctima {j}")
                                roles.append((i, 'agresor'))
                                roles.append((j, 'víctima'))
                                return roles
    print("No se detectó interacción: manos bajas o lejanas.")
    return [(i, 'desconocido') for i in range(len(keypoints_list))]

def calculate_metrics(true_roles, pred_roles):
    y_true = [r for _, r in true_roles]
    y_pred = [r for _, r in pred_roles]
    if y_true and y_pred and len(y_true) == len(y_pred):
        return f1_score(y_true, y_pred, labels=['agresor', 'víctima', 'desconocido'], average='weighted')
    return 0.0

# Celda 6: Procesar video
def process_video(video_path, output_path='output_hockey.mp4', ground_truth_roles=None):
    # Opcional: Convertir AVI a MP4 si cv2 falla (descomenta si necesario)
    # !ffmpeg -i "{video_path}" temp.mp4 -y
    # video_path = 'temp.mp4'

    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: No se pudo abrir {video_path}. Prueba con ffmpeg para convertir.")
        return None

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (640, 480))

    frame_count = 0
    all_pred_roles = []

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        frame = cv2.resize(frame, (640, 480))
        r_height = 480

        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if int(box.cls) == 0:
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[max(0, y):y+h, max(0, x):x+w]
            if person_crop.size > 0 and w > 50 and h > 100:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)
            else:
                keypoints_list.append(np.zeros((17, 3)))

        roles = analyze_roles(keypoints_list, person_boxes, r_height)
        all_pred_roles.append(roles)

        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
            role = next((r for idx, r in roles if idx == i), 'desconocido')
            color = (0, 0, 255) if role == 'agresor' else (255, 0, 0) if role == 'víctima' else (128, 128, 128)
            cv2.putText(frame, role.upper(), (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx = int(kp[1] * w + x)
                    ky = int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 3, (255, 255, 0), -1)

        out.write(frame)
        frame_count += 1
        if frame_count % 30 == 0:
            print(f"Procesados {frame_count} frames...")
        if frame_count >= 300:
            break

    cap.release()
    out.release()

    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, all_pred_roles[0] if all_pred_roles else [])
        print(f"F1-Score para roles: {f1:.4f}")

    print(f"¡Listo! Video guardado en: {output_path}")
    return output_path

# Celda 7: Procesar
video_path = '/content/hockey_dataset/data/fi1_xvid.avi'  # Prueba con este o cambia a fi10_xvid.avi si no detecta
ground_truth_roles = [(0, 'agresor'), (1, 'víctima')]  # Ejemplo

output_path = process_video(video_path, 'output_hockey_improved.mp4', ground_truth_roles)
if output_path:
    files.download(output_path)


0: 480x640 2 persons, 9.0ms
Speed: 2.7ms preprocess, 9.0ms inference, 2.5ms postprocess per image at shape (1, 3, 480, 640)
No se detectó interacción: manos bajas o lejanas.

0: 480x640 2 persons, 7.7ms
Speed: 2.3ms preprocess, 7.7ms inference, 2.1ms postprocess per image at shape (1, 3, 480, 640)
No se detectó interacción: manos bajas o lejanas.

0: 480x640 2 persons, 6.8ms
Speed: 1.7ms preprocess, 6.8ms inference, 1.9ms postprocess per image at shape (1, 3, 480, 640)
No se detectó interacción: manos bajas o lejanas.

0: 480x640 3 persons, 6.4ms
Speed: 1.8ms preprocess, 6.4ms inference, 2.3ms postprocess per image at shape (1, 3, 480, 640)
No se detectó interacción: manos bajas o lejanas.

0: 480x640 3 persons, 6.4ms
Speed: 1.3ms preprocess, 6.4ms inference, 2.3ms postprocess per image at shape (1, 3, 480, 640)
No se detectó interacción: manos bajas o lejanas.

0: 480x640 2 persons, 6.4ms
Speed: 1.8ms preprocess, 6.4ms inference, 1.9ms postprocess per image at shape (1, 3, 480, 640)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [11]:
# Asume que ya tienes las otras funciones (preprocess_frame, detect_pose, calculate_metrics)

def analyze_roles(keypoints_list, person_boxes, height, mag_mean):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        hands = [keypoints[9], keypoints[10]]  # Izquierda y derecha
        for hand in hands:
            if hand[2] > 0.3:
                hand_y, hand_x = hand[0], hand[1]
                if hand_y < 0.6:  # Mano semi-alta
                    for j, other_box in enumerate(person_boxes):
                        if i != j:
                            ox, oy, ow, oh = other_box
                            center_other_x = ox + ow / 2
                            center_other_y = oy + oh / 2
                            dist_x = abs(hand_x * 192 - center_other_x)
                            dist_y = abs(hand_y * height - center_other_y)
                            if dist_x < 100 and dist_y < 100:
                                print(f"Detectado agresor {i} (mano y={hand_y:.2f}, dist_x={dist_x:.2f}, dist_y={dist_y:.2f}, mag={mag_mean:.2f}) cerca de víctima {j}")
                                return [(i, 'agresor'), (j, 'víctima')]
    if mag_mean > 5:  # Movimiento alto: fuerza "agresor" si hay proximidad general
        if len(person_boxes) >= 2:
            dist_between = abs(person_boxes[0][0] - person_boxes[1][0])
            if dist_between < 150:  # Personas cercanas con movimiento
                print(f"Movimiento alto ({mag_mean:.2f}) y proximidad: asignando agresor 0, víctima 1")
                return [(0, 'agresor'), (1, 'víctima')]
    print(f"No se detectó: manos bajas/lejanas, mag={mag_mean:.2f}")
    return [(i, 'desconocido') for i in range(len(keypoints_list))]

def process_video(video_path, output_path='output_hockey_persistent.mp4', ground_truth_roles=None, persistence_frames=5):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error al abrir {video_path}")
        return None

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (640, 480))

    frame_count = 0
    all_pred_roles = []
    prev_gray = None
    last_roles = None
    persistence_counter = 0

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        frame = cv2.resize(frame, (640, 480))
        r_height = 480
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Optical Flow
        mag_mean = 0
        if prev_gray is not None:
            flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
            mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1])
            mag_mean = np.mean(mag)

        prev_gray = gray

        # Detectar personas
        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if int(box.cls) == 0:
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        # Poses
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[max(0, y):y+h, max(0, x):x+w]
            if person_crop.size > 0 and w > 50 and h > 100:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)
            else:
                keypoints_list.append(np.zeros((17, 3)))

        # Roles
        current_roles = analyze_roles(keypoints_list, person_boxes, r_height, mag_mean)

        # Persistencia
        if any(r != 'desconocido' for _, r in current_roles):
            last_roles = current_roles
            persistence_counter = persistence_frames
        elif persistence_counter > 0:
            current_roles = last_roles
            persistence_counter -= 1
            print(f"Persistiendo roles por {persistence_counter} frames más")
        else:
            last_roles = None

        all_pred_roles.append(current_roles)

        # Dibujar
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
            role = next((r for idx, r in current_roles if idx == i), 'desconocido')
            color = (0, 0, 255) if role == 'agresor' else (255, 0, 0) if role == 'víctima' else (128, 128, 128)
            cv2.putText(frame, role.upper(), (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx = int(kp[1] * w + x)
                    ky = int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 3, (255, 255, 0), -1)

        out.write(frame)
        frame_count += 1
        if frame_count % 30 == 0:
            print(f"Procesados {frame_count} frames...")
        if frame_count >= 300:
            break

    cap.release()
    out.release()

    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, all_pred_roles[0] if all_pred_roles else [])
        print(f"F1-Score para roles: {f1:.4f}")

    print(f"¡Listo! Video guardado en: {output_path}")
    return output_path

# Uso: Cambia video si quieres
video_path = '/content/hockey_dataset/data/fi1_xvid.avi'
output_path = process_video(video_path, 'output_hockey_persistent.mp4', ground_truth_roles)
if output_path:
    files.download(output_path)


0: 480x640 2 persons, 31.0ms
Speed: 4.4ms preprocess, 31.0ms inference, 3.4ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=0.00

0: 480x640 2 persons, 13.8ms
Speed: 3.9ms preprocess, 13.8ms inference, 3.8ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=3.76

0: 480x640 2 persons, 13.3ms
Speed: 2.7ms preprocess, 13.3ms inference, 3.5ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.91

0: 480x640 3 persons, 13.8ms
Speed: 2.0ms preprocess, 13.8ms inference, 4.1ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.59

0: 480x640 3 persons, 19.4ms
Speed: 3.0ms preprocess, 19.4ms inference, 5.6ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.84

0: 480x640 2 persons, 11.2ms
Speed: 2.1ms preprocess, 11.2ms inference, 3.2ms postprocess per image at shape (1, 3, 480, 640)
No se detectó

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [12]:
import os
import cv2
import numpy as np
from google.colab import files

def analyze_roles(keypoints_list, person_boxes, height, mag_mean):
    roles = []
    for i, (keypoints, box) in enumerate(zip(keypoints_list, person_boxes)):
        hands = [keypoints[9], keypoints[10]]  # Izquierda, derecha
        for hand in hands:
            if hand[2] > 0.3:
                hand_y, hand_x = hand[0], hand[1]
                if hand_y < 0.6:
                    for j, other_box in enumerate(person_boxes):
                        if i != j:
                            ox, oy, ow, oh = other_box
                            center_other_x = ox + ow / 2
                            center_other_y = oy + oh / 2
                            dist_x = abs(hand_x * 192 - center_other_x)
                            dist_y = abs(hand_y * height - center_other_y)
                            if dist_x < 120 and dist_y < 120:
                                print(f"Detectado agresor {i} (mano y={hand_y:.2f}, dist_x={dist_x:.2f}, dist_y={dist_y:.2f}, mag={mag_mean:.2f}) cerca de víctima {j}")
                                return [(i, 'agresor'), (j, 'víctima')]
    if mag_mean > 4 and len(person_boxes) >= 2:
        dist_between = abs(person_boxes[0][0] - person_boxes[1][0])
        if dist_between < 150:
            print(f"Movimiento alto (mag={mag_mean:.2f}) y proximidad: asignando agresor 0, víctima 1")
            return [(0, 'agresor'), (1, 'víctima')]
    print(f"No se detectó: manos bajas/lejanas, mag={mag_mean:.2f}")
    return [(i, 'desconocido') for i in range(len(keypoints_list))]

def process_video(video_path, output_path, ground_truth_roles=None, persistence_frames=7):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error al abrir {video_path}")
        return None, 0, 0

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (640, 480))

    frame_count = 0
    agresor_count = 0
    victima_count = 0
    prev_gray = None
    last_roles = None
    persistence_counter = 0

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        frame = cv2.resize(frame, (640, 480))
        r_height = 480
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Optical Flow
        mag_mean = 0
        if prev_gray is not None:
            flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
            mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1])
            mag_mean = np.mean(mag)

        prev_gray = gray

        # Detectar personas
        results = yolo(frame)
        person_boxes = []
        for result in results:
            for box in result.boxes:
                if int(box.cls) == 0:  # Solo personas
                    x1, y1, x2, y2 = box.xyxy[0].tolist()
                    person_boxes.append((x1, y1, x2 - x1, y2 - y1))

        # Poses
        keypoints_list = []
        for box in person_boxes:
            x, y, w, h = [int(v) for v in box]
            person_crop = frame[max(0, y):y+h, max(0, x):x+w]
            if person_crop.size > 0 and w > 50 and h > 100:
                keypoints = detect_pose(person_crop)
                keypoints_list.append(keypoints)
            else:
                keypoints_list.append(np.zeros((17, 3)))

        # Roles
        current_roles = analyze_roles(keypoints_list, person_boxes, r_height, mag_mean)

        # Persistencia
        if any(r != 'desconocido' for _, r in current_roles):
            last_roles = current_roles
            persistence_counter = persistence_frames
            agresor_count += 1 if any(r == 'agresor' for _, r in current_roles) else 0
            victima_count += 1 if any(r == 'víctima' for _, r in current_roles) else 0
        elif persistence_counter > 0:
            current_roles = last_roles
            persistence_counter -= 1
            agresor_count += 1 if any(r == 'agresor' for _, r in current_roles) else 0
            victima_count += 1 if any(r == 'víctima' for _, r in current_roles) else 0
            print(f"Persistiendo roles por {persistence_counter} frames más")
        else:
            last_roles = None

        # Dibujar
        for i, (box, keypoints) in enumerate(zip(person_boxes, keypoints_list)):
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
            role = next((r for idx, r in current_roles if idx == i), 'desconocido')
            color = (0, 0, 255) if role == 'agresor' else (255, 0, 0) if role == 'víctima' else (128, 128, 128)
            cv2.putText(frame, role.upper(), (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
            for kp in keypoints:
                if kp[2] > 0.3:
                    kx = int(kp[1] * w + x)
                    ky = int(kp[0] * h + y)
                    cv2.circle(frame, (kx, ky), 3, (255, 255, 0), -1)

        out.write(frame)
        frame_count += 1
        if frame_count % 30 == 0:
            print(f"Procesados {frame_count} frames...")
        if frame_count >= 300:
            break

    cap.release()
    out.release()

    if ground_truth_roles:
        f1 = calculate_metrics(ground_truth_roles, current_roles)
        print(f"F1-Score para roles: {f1:.4f}")

    print(f"Video: {video_path}")
    print(f"Frames con agresor: {agresor_count}")
    print(f"Frames con víctima: {victima_count}")
    print(f"¡Listo! Video guardado en: {output_path}")
    return output_path, agresor_count, victima_count

# Procesar múltiples videos
dataset_dir = '/content/hockey_dataset/data'
videos_to_test = [
    'fi1_xvid.avi',
    'fi2_xvid.avi',
    'fi3_xvid.avi',
    'fi4_xvid.avi',
    'fi5_xvid.avi',
    'no1_xvid.avi'  # Control (sin pelea)
]
ground_truth_roles = [(0, 'agresor'), (1, 'víctima')]  # Ejemplo, ajusta si tienes anotaciones reales

report = []
for video in videos_to_test:
    video_path = os.path.join(dataset_dir, video)
    output_path = f'output_{video.replace(".avi", ".mp4")}'
    if os.path.exists(video_path):
        print(f"\nProcesando {video}...")
        result, agresor_count, victima_count = process_video(video_path, output_path, ground_truth_roles)
        if result:
            report.append((video, agresor_count, victima_count, result))
            files.download(result)
    else:
        print(f"Video no encontrado: {video_path}")

# Resumen
print("\nReporte Final:")
for video, agresor_count, victima_count, output in report:
    print(f"{video}: Agresor en {agresor_count} frames, Víctima en {victima_count} frames, Output: {output}")


Procesando fi1_xvid.avi...

0: 480x640 2 persons, 23.7ms
Speed: 5.5ms preprocess, 23.7ms inference, 3.7ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=0.00

0: 480x640 2 persons, 25.2ms
Speed: 2.0ms preprocess, 25.2ms inference, 15.0ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=3.76

0: 480x640 2 persons, 24.9ms
Speed: 8.0ms preprocess, 24.9ms inference, 8.4ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.91

0: 480x640 3 persons, 32.1ms
Speed: 5.3ms preprocess, 32.1ms inference, 15.3ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.59

0: 480x640 3 persons, 45.1ms
Speed: 3.8ms preprocess, 45.1ms inference, 11.3ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.84

0: 480x640 2 persons, 36.9ms
Speed: 10.4ms preprocess, 36.9ms inference, 11.1ms postprocess per image at sha

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Procesando fi2_xvid.avi...

0: 480x640 2 persons, 6.6ms
Speed: 2.0ms preprocess, 6.6ms inference, 1.7ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=0.00

0: 480x640 3 persons, 8.7ms
Speed: 2.4ms preprocess, 8.7ms inference, 2.6ms postprocess per image at shape (1, 3, 480, 640)
Movimiento alto (mag=5.48) y proximidad: asignando agresor 0, víctima 1

0: 480x640 3 persons, 6.8ms
Speed: 2.5ms preprocess, 6.8ms inference, 2.4ms postprocess per image at shape (1, 3, 480, 640)
Movimiento alto (mag=5.98) y proximidad: asignando agresor 0, víctima 1

0: 480x640 2 persons, 8.7ms
Speed: 2.4ms preprocess, 8.7ms inference, 2.2ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=5.22
Persistiendo roles por 6 frames más

0: 480x640 2 persons, 7.2ms
Speed: 1.9ms preprocess, 7.2ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=4.50
Persistiendo roles por 5 frames 

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Procesando fi3_xvid.avi...

0: 480x640 3 persons, 13.8ms
Speed: 1.8ms preprocess, 13.8ms inference, 3.4ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=0.00

0: 480x640 3 persons, 8.4ms
Speed: 1.9ms preprocess, 8.4ms inference, 2.2ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=1.82

0: 480x640 3 persons, 8.0ms
Speed: 2.3ms preprocess, 8.0ms inference, 2.1ms postprocess per image at shape (1, 3, 480, 640)
Detectado agresor 1 (mano y=0.35, dist_x=84.01, dist_y=65.48, mag=1.90) cerca de víctima 0

0: 480x640 4 persons, 7.2ms
Speed: 2.2ms preprocess, 7.2ms inference, 2.9ms postprocess per image at shape (1, 3, 480, 640)
Detectado agresor 1 (mano y=0.35, dist_x=83.52, dist_y=63.96, mag=2.00) cerca de víctima 0

0: 480x640 3 persons, 7.2ms
Speed: 2.2ms preprocess, 7.2ms inference, 2.4ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.26
Persistiendo roles por 6 fra

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Procesando fi4_xvid.avi...

0: 480x640 5 persons, 6.9ms
Speed: 2.3ms preprocess, 6.9ms inference, 3.1ms postprocess per image at shape (1, 3, 480, 640)
Detectado agresor 3 (mano y=0.48, dist_x=37.12, dist_y=57.52, mag=0.00) cerca de víctima 1

0: 480x640 4 persons, 6.8ms
Speed: 2.0ms preprocess, 6.8ms inference, 2.8ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=1.90
Persistiendo roles por 6 frames más

0: 480x640 4 persons, 7.1ms
Speed: 2.3ms preprocess, 7.1ms inference, 2.8ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=1.97
Persistiendo roles por 5 frames más

0: 480x640 5 persons, 9.6ms
Speed: 2.3ms preprocess, 9.6ms inference, 3.6ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.05
Persistiendo roles por 4 frames más

0: 480x640 4 persons, 7.0ms
Speed: 2.6ms preprocess, 7.0ms inference, 2.8ms postprocess per image at shape (1, 3, 480, 640)
Detectado agr

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Procesando fi5_xvid.avi...

0: 480x640 2 persons, 12.3ms
Speed: 2.0ms preprocess, 12.3ms inference, 3.0ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=0.00

0: 480x640 2 persons, 9.5ms
Speed: 3.2ms preprocess, 9.5ms inference, 2.7ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=0.98

0: 480x640 3 persons, 7.3ms
Speed: 2.2ms preprocess, 7.3ms inference, 2.6ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=1.09

0: 480x640 3 persons, 8.8ms
Speed: 2.0ms preprocess, 8.8ms inference, 2.7ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=1.12

0: 480x640 2 persons, 7.5ms
Speed: 1.8ms preprocess, 7.5ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=1.51

0: 480x640 3 persons, 1 handbag, 8.2ms
Speed: 2.4ms preprocess, 8.2ms inference, 3.6ms postprocess per image at shape (

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Procesando no1_xvid.avi...

0: 480x640 2 persons, 1 skateboard, 6.9ms
Speed: 1.8ms preprocess, 6.9ms inference, 2.3ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=0.00

0: 480x640 2 persons, 7.8ms
Speed: 2.2ms preprocess, 7.8ms inference, 2.1ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.22

0: 480x640 2 persons, 7.7ms
Speed: 2.3ms preprocess, 7.7ms inference, 2.1ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.32

0: 480x640 2 persons, 7.3ms
Speed: 2.2ms preprocess, 7.3ms inference, 2.2ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.34

0: 480x640 3 persons, 7.4ms
Speed: 2.1ms preprocess, 7.4ms inference, 2.4ms postprocess per image at shape (1, 3, 480, 640)
No se detectó: manos bajas/lejanas, mag=2.48

0: 480x640 3 persons, 7.8ms
Speed: 2.1ms preprocess, 7.8ms inference, 2.4ms postprocess per image at shape 

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Reporte Final:
fi1_xvid.avi: Agresor en 21 frames, Víctima en 21 frames, Output: output_fi1_xvid.mp4
fi2_xvid.avi: Agresor en 29 frames, Víctima en 29 frames, Output: output_fi2_xvid.mp4
fi3_xvid.avi: Agresor en 35 frames, Víctima en 35 frames, Output: output_fi3_xvid.mp4
fi4_xvid.avi: Agresor en 39 frames, Víctima en 39 frames, Output: output_fi4_xvid.mp4
fi5_xvid.avi: Agresor en 13 frames, Víctima en 13 frames, Output: output_fi5_xvid.mp4
no1_xvid.avi: Agresor en 16 frames, Víctima en 16 frames, Output: output_no1_xvid.mp4
