<a href="https://colab.research.google.com/github/djsg2021utec/TESIS_MAESTRIA/blob/main/1_Exploraci%C3%B3n_de_los_datasets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<h1 align=center><font size = 6> 1.Exploración de los datasets Real-World Fighting (RWF2000) y Smart-City CCTV Violence Detection Dataset (SCVD) </font></h1>

---

## Objetivo de este Notebook
Descargar, explorar (análisis estadístico descriptivo y funciones de pretratamientos), extraer y exportar características de los *datasets* que se usarán en el proyecto *Lightweight Vision Transformer apply to human violence recognition in Surveillance videos*.

## Actividades

1.1. Leer los datos de referencia *(benchmark)* [RWF2000 (2021)](https://github.com/mchengny/RWF2000-Video-Database-for-Violence-Detection/tree/master) y [SCVD (2022)](https://www.kaggle.com/datasets/toluwaniaremu/smartcity-cctv-violence-detection-dataset-scvd)

1.2. Realizar el análisis estadístico descriptivo de los datos.
  - Número de videos.
  - Tiempo de duración promedio por video.
  - Númeo de frames promedio por video.
  - Resolución de los videos.
  - Número de muestras por etiquetas.

1.3. Pretratamiento de los videos.
  - Normalización de videos.
  - Rotación de los videos.
  - Escalar los videos.

1.4. Extracción y exportación de las características de los videos.
  - Optical Flow.
  - Patch embedding frames.
  - Tubelet embedding video.
  - Exportar las características de ambos *datasets*

In [1]:
## Importar librerías
from tqdm import tqdm
import time
import os
from base64 import b64encode
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
from IPython.display import HTML
from base64 import b64encode
from matplotlib import pyplot as plt
import glob
from random import shuffle
import tensorflow as tf
import shutil
import sys

In [4]:
import tensorflow as tf
import timeit
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [5]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

def cpu():
  with tf.device('/cpu:0'):
    random_image_cpu = tf.random.normal((100, 100, 100, 3))
    net_cpu = tf.keras.layers.Conv2D(32, 7)(random_image_cpu)
    return tf.math.reduce_sum(net_cpu)

def gpu():
  with tf.device('/device:GPU:0'):
    random_image_gpu = tf.random.normal((100, 100, 100, 3))
    net_gpu = tf.keras.layers.Conv2D(32, 7)(random_image_gpu)
    return tf.math.reduce_sum(net_gpu)

# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()

# Run the op several times.
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
      '(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('GPU (s):')
gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
1.065427064000005
GPU (s):
0.04981163299999025
GPU speedup over CPU: 21x


## 1.1. Leer los datos de referencia *(benchmark)* [RWF2000 (2021)](https://github.com/mchengny/RWF2000-Video-Database-for-Violence-Detection/tree/master) y [SCVD (2022)](https://www.kaggle.com/datasets/toluwaniaremu/smartcity-cctv-violence-detection-dataset-scvd)

* Se descargan los videos de sus repositorios
* Se unifica el formato en .mp4
* Se visualizan los videos


### Leer RWF2000

In [6]:
# Enlace del repositorio donde está alojandos los datos
# https://github.com/mchengny/RWF2000-Video-Database-for-Violence-Detection

#%pip install gdown (Descomentar si requiere instalar)

# ********************************************************************
# Descargando los fragmentos del archivo ZIP de la data RWF-2000.zip
# ********************************************************************
download_links = [
    "https://drive.google.com/uc?id=1nQ9IR3cGc4NEDOXhPQ89id8je8Uj2VUc",
    "https://drive.google.com/uc?id=1w9G_Z7gkXZzK4DImdI8wanyjs22fQARO",
    "https://drive.google.com/uc?id=15LhjavoUsLS01CPkc3qav0rJxBc9d4nl"
]

for link in tqdm(download_links, desc="Descargando fragmentos"):
    !gdown {link}
# ********************************************************************
# Juntando los fragmentos del archivo ZIP de la data RWF-2000.zip
# ********************************************************************
!cat RWF-2000.zip.001 RWF-2000.zip.002 RWF-2000.zip.003 > RWF-2000.zip

# Eliminando los fragmentos
for fragment in tqdm(["RWF-2000.zip.001", "RWF-2000.zip.002", "RWF-2000.zip.003"], desc="Eliminando fragmentos"):
    !rm /content/{fragment}

# Descomprimiendo los archivos en el directorio RWF-2000
!unzip "/content/RWF-2000.zip" -d "/content/"

# Eliminando RWF-2000.zip
!rm /content/RWF-2000.zip

# Asegurar que la memoria RAM se liberó
import gc
gc.collect()

time.sleep(10)

Descargando fragmentos:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading...
From: https://drive.google.com/uc?id=1nQ9IR3cGc4NEDOXhPQ89id8je8Uj2VUc
To: /content/RWF-2000.zip.001
100% 4.29G/4.29G [01:01<00:00, 69.6MB/s]
Downloading...
From: https://drive.google.com/uc?id=1w9G_Z7gkXZzK4DImdI8wanyjs22fQARO
To: /content/RWF-2000.zip.002
100% 4.29G/4.29G [01:01<00:00, 70.3MB/s]
Downloading...
From: https://drive.google.com/uc?id=15LhjavoUsLS01CPkc3qav0rJxBc9d4nl
To: /content/RWF-2000.zip.003
100% 3.74G/3.74G [00:39<00:00, 94.1MB/s]


Eliminando fragmentos:   0%|          | 0/3 [00:00<?, ?it/s]

Archive:  /content/RWF-2000.zip
   creating: /content/RWF-2000/
   creating: /content/RWF-2000/train/
   creating: /content/RWF-2000/train/Fight/
  inflating: /content/RWF-2000/train/Fight/-1l5631l3fg_0.avi  
  inflating: /content/RWF-2000/train/Fight/-1l5631l3fg_1.avi  
  inflating: /content/RWF-2000/train/Fight/-1l5631l3fg_2.avi  
  inflating: /content/RWF-2000/train/Fight/0H2s9UJcNJ0_0.avi  
  inflating: /content/RWF-2000/train/Fight/0H2s9UJcNJ0_2.avi  
  inflating: /content/RWF-2000/train/Fight/0H2s9UJcNJ0_3.avi  
  inflating: /content/RWF-2000/train/Fight/0H2s9UJcNJ0_4.avi  
  inflating: /content/RWF-2000/train/Fight/0H2s9UJcNJ0_5.avi  
  inflating: /content/RWF-2000/train/Fight/0lHQ2f0d_0.avi  
  inflating: /content/RWF-2000/train/Fight/0lHQ2f0d_1.avi  
  inflating: /content/RWF-2000/train/Fight/0lHQ2f0d_2.avi  
  inflating: /content/RWF-2000/train/Fight/0lHQ2f0d_3.avi  
  inflating: /content/RWF-2000/train/Fight/0NWz-01A2yk_0.avi  
  inflating: /content/RWF-2000/train/Fight/0NWz

In [2]:
# ******************************************************************
# Se crea un dataframe para manejar fácilmente los datos de RWF-2000
# ******************************************************************
carpeta = 'RWF-2000'
lista_carpetas_avi = [
    {'ruta': f'/content/{carpeta}/train/Fight', 'data': 'train', 'etiqueta':'Fight'},
    {'ruta': f'/content/{carpeta}/train/NonFight', 'data': 'train', 'etiqueta':'NonFight'},
    {'ruta': f'/content/{carpeta}/val/Fight', 'data': 'val', 'etiqueta':'Fight'},
    {'ruta': f'/content/{carpeta}/val/NonFight', 'data': 'val', 'etiqueta':'NonFight'}
]
lista_archivos_avi = []
for carpeta in lista_carpetas_avi:
  ruta_carpeta=carpeta['ruta']
  tipo_data =carpeta['data']
  etiqueta_data =carpeta['etiqueta']
  for filename in os.listdir(ruta_carpeta):
      if filename.endswith(".avi"):
        registro_archivo = {'ruta': f'{ruta_carpeta}/{filename}', 'data': f'{tipo_data}', 'etiqueta':f'{etiqueta_data}'}
        lista_archivos_avi.append(registro_archivo)

In [3]:
videos_RWF2000_df = pd.DataFrame(lista_archivos_avi)
videos_RWF2000_df.describe()

Unnamed: 0,ruta,data,etiqueta
count,2000,2000,2000
unique,2000,2,2
top,/content/RWF-2000/train/Fight/OAfV0xPIhZw_4.avi,train,Fight
freq,1,1600,1000


In [7]:
# **********************************************************************
# Se crea carpetas para almacenar los archivos mp4
# **********************************************************************
carpeta = 'RWF-2000-mp4'
lista_carpetas_mp4 = [
    {'ruta': f'/content/{carpeta}/train/Fight', 'data': 'train', 'etiqueta':'Fight'},
    {'ruta': f'/content/{carpeta}/train/NonFight', 'data': 'train', 'etiqueta':'NonFight'},
    {'ruta': f'/content/{carpeta}/val/Fight', 'data': 'val', 'etiqueta':'Fight'},
    {'ruta': f'/content/{carpeta}/val/NonFight', 'data': 'val', 'etiqueta':'NonFight'}
]

for carpeta in lista_carpetas_mp4:
  os.makedirs(carpeta['ruta'], exist_ok=True)

In [5]:
# **********************************************************************
# Se toma una muestra con una fracción de los datos del dataframe
# **********************************************************************

# Función para obtener un subconjunto aleatorio del DataFrame
def grupo_de_muestras(group, frac=0.5):
    return group.sample(frac=frac)

# Dividir el DataFrame según las variables 'etapa' y 'etiqueta' y aplicar la función grupo_de_muestras
muestra_RWF2000_df = videos_RWF2000_df.groupby(['data', 'etiqueta']).apply(grupo_de_muestras).reset_index(drop=True)
# Se elimina una variable que tiene muchas variables
del videos_RWF2000_df

muestra_RWF2000_df.describe()

Unnamed: 0,ruta,data,etiqueta
count,1000,1000,1000
unique,1000,2,2
top,/content/RWF-2000/train/Fight/vQKHxtrdEHM_0.avi,train,Fight
freq,1,800,500


In [6]:
muestra_RWF2000_df.groupby(['data', 'etiqueta']).count().reset_index()

Unnamed: 0,data,etiqueta,ruta
0,train,Fight,400
1,train,NonFight,400
2,val,Fight,100
3,val,NonFight,100


In [None]:
# ******************************************************************
# Convertir a MP4 los videos AVI
# ******************************************************************
import gc

# Inicializar lista y DataFrame vacíos
lista_archivos_mp4 = []
df=muestra_RWF2000_df

# Procesar cada fila del DataFrame
for index, row in df.iterrows():
    ruta_archivo_avi = row['ruta']
    nombre_archivo = ruta_archivo_avi.split('/')[-1].split('.')[0]
    tipo_data = row['data']
    etiqueta = row['etiqueta']

    ruta_archivo_mp4 = f'/content/RWF-2000-mp4/{tipo_data}/{etiqueta}/{nombre_archivo}'
    registro_archivo = {'ruta': f'{ruta_archivo_mp4}.mp4', 'data': tipo_data, 'etiqueta': etiqueta}

    os.popen("ffmpeg -i '{input}' -ac 2 -b:v 2000k -c:a aac -c:v libx264 -b:a 160k -vprofile high -bf 0 -strict experimental -f mp4 '{output}.mp4'".format(input=ruta_archivo_avi, output=ruta_archivo_mp4))

    lista_archivos_mp4.append(registro_archivo)
    if os.path.exists(f'{ruta_archivo_mp4}.mp4'):
      print(f"Se creó {ruta_archivo_mp4}.mp4 (data: {tipo_data}, etiqueta: {etiqueta})")
      # Libera memoria
      del ruta_archivo_avi, nombre_archivo, tipo_data, etiqueta, ruta_archivo_mp4, registro_archivo
      gc.collect()

In [16]:
RWF2000_mp4_df = pd.DataFrame(lista_archivos_mp4)
RWF2000_mp4_df.describe()

Unnamed: 0,ruta,data,etiqueta
count,1000,1000,1000
unique,1000,2,2
top,/content/RWF-2000-mp4/train/Fight/vQKHxtrdEHM_...,train,Fight
freq,1,800,500


In [53]:
from base64 import b64encode

mp4 = open('/content/RWF-2000-mp4/train/Fight/PFw7SeFOD04_2.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=600 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

In [None]:
%%capture
!pip install moviepy
!pip install pillow

In [None]:
from moviepy.editor import VideoFileClip

# Esta muy pesado el archivo
def convert_mp4_to_gif(mp4_path, gif_path):
    clip = VideoFileClip(mp4_path)
    clip.write_gif(gif_path)

convert_mp4_to_gif(RWF2000_mp4_df['ruta'][0], "/content/video.gif")


MoviePy - Building file /content/video.gif with imageio.




In [None]:
from IPython.display import display, Image

# Mostrar el GIF en Jupyter Notebook
display(Image(filename="/content/video.gif"))

### Leer SCVD

In [17]:
enlace_SCVD = 'https://drive.google.com/uc?id=122VxKmwQL13wQx339PZMQATk1g428_1P'
!gdown {enlace_SCVD}

# Descomprimiendo los archivos en el directorio RWF-2000
!unzip "/content/SmartCity CCTV Violence Detection Dataset (SCVD).zip" -d "/content/"

!rm "/content/SmartCity CCTV Violence Detection Dataset (SCVD).zip"

Downloading...
From: https://drive.google.com/uc?id=122VxKmwQL13wQx339PZMQATk1g428_1P
To: /content/SmartCity CCTV Violence Detection Dataset (SCVD).zip
100% 1.01G/1.01G [00:12<00:00, 81.9MB/s]
Archive:  /content/SmartCity CCTV Violence Detection Dataset (SCVD).zip
  inflating: /content/SCVD/videos/Non-Violence Videos/nv1.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv10.mp4  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv100.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv101.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv102.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv103.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv104.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv105.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv106.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv107.mov  
  inflating: /content/SCVD/videos/Non-Violence Videos/nv108.mp4  
  inflating:

In [43]:
# ******************************************************************
# Se crea un dataframe para manejar fácilmente los datos de SCVD
# ******************************************************************
carpeta = 'SCVD/videos'
lista_carpetas = [
    {'ruta': f'/content/{carpeta}/Non-Violence Videos', 'etiqueta':'NonViolence'},
    {'ruta': f'/content/{carpeta}/Weapon Violence', 'etiqueta':'WeaponViolence'},
    {'ruta': f'/content/{carpeta}/violence video cleaned', 'etiqueta':'Violence'},

]
lista_archivos_mp4 = []
for carpeta in lista_carpetas:
  ruta_carpeta=carpeta['ruta']
  etiqueta_data =carpeta['etiqueta']
  for filename in os.listdir(ruta_carpeta):
      if filename.endswith(".mp4"):
        registro_archivo = {'ruta': f'{ruta_carpeta}/{filename}', 'etiqueta':f'{etiqueta_data}'}
        lista_archivos_mp4.append(registro_archivo)

In [44]:
SCVD_mp4_df = pd.DataFrame(lista_archivos_mp4)
SCVD_mp4_df.describe()

Unnamed: 0,ruta,etiqueta
count,307,307
unique,307,3
top,/content/SCVD/videos/Non-Violence Videos/nv133...,WeaponViolence
freq,1,124


In [None]:
from base64 import b64encode

mp4 = open('/content/SCVD/videos/violence video cleaned/V1.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=600 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

## 1.2. Realizar el análisis estadístico descriptivo de los datos.


### Número de videos.

In [None]:
conteo_RWF2000 = videos_RWF2000_df.groupby(['data', 'etiqueta']).count().reset_index()
print(conteo_RWF2000)

    data  etiqueta  ruta
0  train     Fight   800
1  train  NonFight   800
2    val     Fight   200
3    val  NonFight   200


In [None]:
conteo_SCVD = SCVD_mp4_df.groupby(['etiqueta']).count().reset_index()
print(conteo_SCVD)

         etiqueta  ruta
0     NonViolence    71
1        Violence   112
2  WeaponViolence   124


In [None]:
import cv2

# Cargar el video


video = cv2.VideoCapture('/content/SCVD/videos/violence video cleaned/V1.mp4')

# Obtener resolución
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
print(f"Resolución: {width}x{height}")

# Obtener número total de frames
total_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"Número total de frames: {total_frames}")

# Obtener fps (frames por segundo)
fps = int(video.get(cv2.CAP_PROP_FPS))
print(f"FPS: {fps}")

# Calcular la duración del video
duration = total_frames / fps
print(f"Duración: {duration} segundos")

# Liberar el objeto de captura de video
video.release()


Resolución: 1280x720
Número total de frames: 193
FPS: 25
Duración: 7.72 segundos


In [None]:
### Tiempo de duración promedio por video.
### Númeo de frames promedio por video.
### Resolución de los videos.
### Número de muestras por etiquetas.


3. Pretratamiento de los videos.
  - Normalización de videos.
  - Rotación de los videos.
  - Escalar los videos.

4. Extracción y exportación de las características de los videos.
  - Optical Flow.
  - Patch embedding frames.
  - Tubelet embedding video.
  - Exportar las características de ambos *datasets*

In [18]:
%pip install tensorflow_docs
%pip install tensorflow
from tensorflow_docs.vis import embed
from tensorflow.keras import layers
from tensorflow import keras

import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd
import numpy as np
import imageio
import cv2
import os

Collecting tensorflow_docs
  Downloading tensorflow_docs-2023.5.24.56664-py3-none-any.whl (183 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/183.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m174.1/183.6 kB[0m [31m5.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m183.6/183.6 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Collecting astor (from tensorflow_docs)
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Installing collected packages: astor, tensorflow_docs
Successfully installed astor-0.8.1 tensorflow_docs-2023.5.24.56664


## Hiperparámetros

In [90]:
MAX_SEQ_LENGTH = 20
NUM_FEATURES = 1024
IMG_SIZE = 224

EPOCHS = 15

##Preparación de Datos
Principalmente seguiremos los mismos pasos de preparación de datos en este ejemplo, excepto por los siguientes cambios:

- Reducimos el tamaño de la imagen a 128x128 en lugar de 224x224 para acelerar los cálculos.
- En lugar de usar una red preentrenada InceptionV3, usamos una red preentrenada DenseNet121 para la extracción de características.
- Directamente rellenamos los videos más cortos hasta alcanzar la longitud MAX_SEQ_LENGTH.

In [20]:
train_df=RWF2000_mp4_df[RWF2000_mp4_df['data']=='train']
test_df = RWF2000_mp4_df[RWF2000_mp4_df['data']=='val']

In [21]:
train_df.describe()

Unnamed: 0,ruta,data,etiqueta
count,800,800,800
unique,800,1,2
top,/content/RWF-2000-mp4/train/Fight/vQKHxtrdEHM_...,train,Fight
freq,1,800,400


In [22]:
test_df.describe()

Unnamed: 0,ruta,data,etiqueta
count,200,200,200
unique,200,1,2
top,/content/RWF-2000-mp4/val/Fight/RIXaF_TkLlU_1.mp4,val,Fight
freq,1,200,100


In [37]:
resize_and_rescale_layer = tf.keras.Sequential([
  layers.Resizing(IMG_SIZE, IMG_SIZE),
  layers.Rescaling(1./255)
])

def resize_and_rescale(frame):
  with tf.device('/device:GPU:0'):
    resized_rescaled = resize_and_rescale_layer(frame[None, ...])
    resized_rescaled = resized_rescaled.numpy().squeeze()
    return resized_rescaled


In [24]:
def load_video(path, max_frames=0):
    cap = cv2.VideoCapture(path)
    frames = []
    try:
        while True:
            ret, frame = cap.read()
            if not ret:
                break
            frame = resize_and_rescale(frame)
            frame = frame[:, :, [2, 1, 0]]
            frames.append(frame)

            if len(frames) == max_frames:
                break
    finally:
        cap.release()
    return np.array(frames)

In [48]:
#PRUEBAS
video=SCVD_mp4_df['ruta'][0]
load_video(video).shape

(147, 128, 128, 3)

In [25]:
def build_feature_extractor():
    feature_extractor = keras.applications.DenseNet121(
        weights="imagenet",
        include_top=False,
        pooling="avg",
        input_shape=(IMG_SIZE, IMG_SIZE, 3),
    )
    preprocess_input = keras.applications.densenet.preprocess_input

    inputs = keras.Input((IMG_SIZE, IMG_SIZE, 3))
    preprocessed = preprocess_input(inputs)

    outputs = feature_extractor(preprocessed)
    return keras.Model(inputs, outputs, name="feature_extractor")

In [26]:
#PRUEBAS
feature_extractor = build_feature_extractor()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/densenet/densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5


In [27]:
# Label preprocessing with StringLookup.
label_processor = keras.layers.StringLookup(
    num_oov_indices=0, vocabulary=np.unique(train_df["etiqueta"]), mask_token=None
)
print(label_processor.get_vocabulary())

['Fight', 'NonFight']


In [35]:
train_df

Unnamed: 0,ruta,data,etiqueta
0,/content/RWF-2000-mp4/train/Fight/vQKHxtrdEHM_...,train,Fight
1,/content/RWF-2000-mp4/train/Fight/Cce4ogHp_1.mp4,train,Fight
2,/content/RWF-2000-mp4/train/Fight/ч╜СхРзщБнх░П...,train,Fight
3,/content/RWF-2000-mp4/train/Fight/EtRfZ2KP_2.mp4,train,Fight
4,/content/RWF-2000-mp4/train/Fight/uGGFPY75IYQ_...,train,Fight
...,...,...,...
795,/content/RWF-2000-mp4/train/NonFight/Km94kW0O_...,train,NonFight
796,/content/RWF-2000-mp4/train/NonFight/IynvwHe9W...,train,NonFight
797,/content/RWF-2000-mp4/train/NonFight/hbTlgnTw_...,train,NonFight
798,/content/RWF-2000-mp4/train/NonFight/PqB7UvdW_...,train,NonFight


In [63]:
# ruta_archivo_avi='/content/RWF-2000/train/Fight/PFw7SeFOD04_2.avi'
# ruta_archivo_mp4='/content/RWF-2000-mp4/train/Fight/PFw7SeFOD04_2'
# os.popen("ffmpeg -i '{input}' -ac 2 -b:v 2000k -c:a aac -c:v libx264 -b:a 160k -vprofile high -bf 0 -strict experimental -f mp4 '{output}.mp4'".format(input=ruta_archivo_avi, output=ruta_archivo_mp4))

# mp4 = open('/content/RWF-2000-mp4/train/Fight/PFw7SeFOD04_1.mp4','rb').read()
# data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
# HTML("""
# <video width=600 controls>
#       <source src="%s" type="video/mp4">
# </video>
# """ % data_url)
# os.remove('/content/RWF-2000-mp4/train/Fight/PlHzsigAFas_7.mp4')

ruta_archivo_avi='/content/RWF-2000/train/Fight/PlHzsigAFas_7.avi'
ruta_archivo_mp4='/content/RWF-2000-mp4/train/Fight/PlHzsigAFas_7'
os.popen("ffmpeg -i '{input}' -ac 2 -b:v 2000k -c:a aac -c:v libx264 -b:a 160k -vprofile high -bf 0 -strict experimental -f mp4 '{output}.mp4'".format(input=ruta_archivo_avi, output=ruta_archivo_mp4))


<os._wrap_close at 0x7bb59d5a1fc0>

In [82]:
import time
def volver_crear_mp4(ruta):
  path = ruta
  if os.path.exists(ruta):
    print("Se elimina el archivo: ",path)
    os.remove(ruta)
    time.sleep(4)
  root_destino = os.path.dirname(path)
  root_origen = root_destino.replace('/RWF-2000-mp4/','/RWF-2000/')
  nombre=path.split('/')[-1:][0].split('.')[0]
  ruta_archivo_avi=f'{root_origen}/{nombre}.avi'
  ruta_archivo_mp4=f'{root_destino}/{nombre}'

  os.popen("ffmpeg -i '{input}' -ac 2 -b:v 2000k -c:a aac -c:v libx264 -b:a 160k -vprofile high -bf 0 -strict experimental -f mp4 '{output}.mp4'".format(input=ruta_archivo_avi, output=ruta_archivo_mp4))
  if os.path.exists(f'{ruta_archivo_mp4}.mp4'):
    print("Se vuelve a crear el archivo: ",path)
    time.sleep(10)

def prepare_all_videos(df):
    num_samples = len(df)
    video_paths = df["ruta"].values.tolist()
    labels = df["etiqueta"].values
    labels = label_processor(labels[..., None]).numpy()

    # `frame_features` are what we will feed to our sequence model.
    frame_features = np.zeros(
        shape=(num_samples, MAX_SEQ_LENGTH, NUM_FEATURES), dtype="float32"
    )

    # For each video.
    for idx, path in enumerate(video_paths):
        # Gather all its frames and add a batch dimension.
        frames = load_video(os.path.join(path))
        print(f'{idx} {path}:',"frame dimension",frames.shape,"\n")
        # Volver a generar videos en mp4 de aquellos que fallaron
        while frames.shape[0]==0:
          volver_crear_mp4(path)
          frames = load_video(os.path.join(path))
          print(f'{idx} {path}:',"frame dimension",frames.shape,"\n")
        # Pad shorter videos.
        if len(frames) < MAX_SEQ_LENGTH:
            diff = MAX_SEQ_LENGTH - len(frames)
            padding = np.zeros((diff, IMG_SIZE, IMG_SIZE, 3))
            frames = np.concatenate((frames, padding), axis=0)

        frames = frames[None, ...]

        # Initialize placeholder to store the features of the current video.
        temp_frame_features = np.zeros(
            shape=(1, MAX_SEQ_LENGTH, NUM_FEATURES), dtype="float32"
        )

        # Extract features from the frames of the current video.
        for i, batch in enumerate(frames):
            video_length = batch.shape[0]
            length = min(MAX_SEQ_LENGTH, video_length)
            for j in range(length):
                if np.mean(batch[j, :]) > 0.0:
                    temp_frame_features[i, j, :] = feature_extractor.predict(
                        batch[None, j, :]
                    )

                else:
                    temp_frame_features[i, j, :] = 0.0

        frame_features[idx,] = temp_frame_features.squeeze()

    return frame_features, labels


In [75]:
os.path.exists('/content/RWF-2000-mp4/train/Fight/GafFu4IZtIA_2.mp4')

False

In [83]:
train_data, train_labels = prepare_all_videos(train_df)
test_data, test_labels = prepare_all_videos(test_df)

[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
773 /content/RWF-2000-mp4/train/NonFight/GhZRf4GIw0w_0.mp4: frame dimension (150, 128, 128, 3) 

774 /content/RWF-2000-mp4/train/NonFight/gR4OsEPl894_0.mp4: frame dimension (150, 128, 128, 3) 

775 /content/RWF-2000-mp4/train/NonFight/x7iljhbM_0.mp4: frame dimension (150, 128, 128, 3) 

776 /content/RWF-2000-mp4/train/NonFight/RbpXgk5M1S0_2.mp4: frame dimension (150, 128, 128, 3) 

777 /content/RWF-2000-mp4/train/NonFight/B6DO4tXb_0.mp4: frame dimension (150, 128, 128, 3) 

778 /content/RWF-2000-mp4/train/NonFight/DTq6Gu30-uA_0.mp4: frame dimension (150, 128, 128, 3) 

779 /content/RWF-2000-mp4/train/NonFight/Am9mUxPTvK4_0.mp4: frame dimension (150, 128, 128, 3) 

780 /content/RWF-2000-mp4/train/NonFight/iRnSE1TL_0.mp4: frame dimension (150, 128, 128, 3) 

781 /content/RWF-2000-mp4/train/NonFight/VXLdrn0b_0.mp4: frame dimension (150, 128, 128, 3) 

782 /content/RWF-2000-mp4/train/NonFight/i5xJxtv4gXs_0.mp4: fram

In [84]:
## GUARDANDO LOS DATOS DE TRAIN.DF
np.save('train_data_RWF200.npy', train_data)
np.save('train_labels_RWF200.npy', train_labels)

## GUARDANDO LOS DATOS DE TEST.DF
np.save('test_data_RWF200.npy', test_data)
np.save('test_labels_RWF200.npy', test_labels)


In [85]:
os.path.exists('/content/train_data_RWF200.npy')

True

In [86]:
class PositionalEmbedding(layers.Layer):
    def __init__(self, sequence_length, output_dim, **kwargs):
      with tf.device('/device:GPU:0'):
        super().__init__(**kwargs)
        self.position_embeddings = layers.Embedding(
            input_dim=sequence_length, output_dim=output_dim
        )
        self.sequence_length = sequence_length
        self.output_dim = output_dim

    def call(self, inputs):
        # The inputs are of shape: `(batch_size, frames, num_features)`
        with tf.device('/device:GPU:0'):
          length = tf.shape(inputs)[1]
          positions = tf.range(start=0, limit=length, delta=1)
          embedded_positions = self.position_embeddings(positions)
          return inputs + embedded_positions

    def compute_mask(self, inputs, mask=None):
      with tf.device('/device:GPU:0'):
        mask = tf.reduce_any(tf.cast(inputs, "bool"), axis=-1)
        return mask


In [97]:
class TransformerEncoder(layers.Layer):
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
      with tf.device('/device:GPU:0'):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim, dropout=0.2
        )
        self.dense_proj = keras.Sequential(
            [
                layers.Dense(dense_dim, activation=tf.nn.gelu),
                layers.Dense(embed_dim),
            ]
        )
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()
        self.layernorm_3 = layers.LayerNormalization()

    def call(self, inputs, mask=None):
      with tf.device('/device:GPU:0'):
        if mask is not None:
            mask = mask[:, tf.newaxis, :]

        attention_output = self.attention(inputs, inputs, attention_mask=mask)
        proj_input = self.layernorm_1(inputs + attention_output)
        proj_output = self.dense_proj(proj_input)
        return self.layernorm_2(proj_input + proj_output)



In [98]:
def get_compiled_model():
  with tf.device('/device:GPU:0'):
    sequence_length = MAX_SEQ_LENGTH
    embed_dim = NUM_FEATURES
    dense_dim = 10
    num_heads = 10
    classes = len(label_processor.get_vocabulary())

    inputs = keras.Input(shape=(None, None))
    x = PositionalEmbedding(
        sequence_length, embed_dim, name="frame_position_embedding"
    )(inputs)
    x = TransformerEncoder(embed_dim, dense_dim, num_heads, name="transformer_layer")(x)
    x = layers.GlobalMaxPooling1D()(x)
    x = layers.Dropout(0.2)(x)
    outputs = layers.Dense(classes, activation="softmax")(x)
    model = keras.Model(inputs, outputs)

    model.compile(
        optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
    )
    return model


In [99]:
def run_experiment():
    filepath = "/tmp/video_classifier"
    checkpoint = keras.callbacks.ModelCheckpoint(
        filepath, save_weights_only=True, save_best_only=True, verbose=1
    )

    model = get_compiled_model()
    history = model.fit(
        train_data,
        train_labels,
        validation_split=0.1,
        epochs=100,
        callbacks=[checkpoint],
    )

    model.load_weights(filepath)
    _, accuracy = model.evaluate(test_data, test_labels)
    print(f"Test accuracy: {round(accuracy * 100, 2)}%")

    return model

In [100]:
trained_model = run_experiment()

Epoch 1/100
Epoch 1: val_loss improved from inf to 0.30390, saving model to /tmp/video_classifier
Epoch 2/100
Epoch 2: val_loss did not improve from 0.30390
Epoch 3/100
Epoch 3: val_loss did not improve from 0.30390
Epoch 4/100
Epoch 4: val_loss did not improve from 0.30390
Epoch 5/100
Epoch 5: val_loss did not improve from 0.30390
Epoch 6/100
Epoch 6: val_loss did not improve from 0.30390
Epoch 7/100
Epoch 7: val_loss did not improve from 0.30390
Epoch 8/100
Epoch 8: val_loss did not improve from 0.30390
Epoch 9/100
Epoch 9: val_loss did not improve from 0.30390
Epoch 10/100
Epoch 10: val_loss did not improve from 0.30390
Epoch 11/100
Epoch 11: val_loss did not improve from 0.30390
Epoch 12/100
Epoch 12: val_loss did not improve from 0.30390
Epoch 13/100
Epoch 13: val_loss did not improve from 0.30390
Epoch 14/100
Epoch 14: val_loss did not improve from 0.30390
Epoch 15/100
Epoch 15: val_loss did not improve from 0.30390
Epoch 16/100
Epoch 16: val_loss did not improve from 0.30390
Epo

https://keras.io/examples/vision/image_classification_with_vision_transformer/