# Tech Challenge 4 – Análise de Vídeo com Visão Computacional
Luís Felipe Alves- RM: 363734
Entrega individual — FIAP | Pós IA para Devs
                                                                                    
Este notebook:
- Usa **InsightFace** para detectar rostos (robusto em vídeo real)
- Usa **CNN de emoções (FER-2013)** e ajusta automaticamente o **resize** para o tamanho que o modelo espera
- Gera `output/video_output.mp4`, `summary.txt`, `summary.json`
- Mostra e baixa os resultados

> Dica: `Runtime → Change runtime type → GPU` (melhora a velocidade).  
> Se não tiver GPU, roda em CPU (mais lento, mas funciona).


## 1) Instalar dependências

In [1]:
import sys, subprocess

def pip_install(pkgs):
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)

pip_install(["opencv-python", "numpy", "tensorflow", "insightface", "onnxruntime"])
print("OK: dependências instaladas")

OK: dependências instaladas


## 2) Baixar modelo de emoções (FER-2013)

In [2]:
import os, subprocess
from pathlib import Path

os.makedirs("models", exist_ok=True)
os.makedirs("output", exist_ok=True)

model_path = Path("models/emotion_model.hdf5")
if not model_path.exists():
    url = "https://github.com/oarriaga/face_classification/raw/master/trained_models/emotion_models/fer2013_mini_XCEPTION.102-0.66.hdf5"
    subprocess.check_call(["wget", "-q", "-O", str(model_path), url])
    print("Baixado:", model_path)
else:
    print("Já existe:", model_path)

Baixado: models/emotion_model.hdf5


## 3) Upload do vídeo (MP4)

In [3]:
from google.colab import files
uploaded = files.upload()
video_filename = next(iter(uploaded.keys()))
print("Vídeo enviado:", video_filename)

Saving Unlocking Facial Recognition_ Diverse Activities Analysis (1).mp4 to Unlocking Facial Recognition_ Diverse Activities Analysis (1).mp4
Vídeo enviado: Unlocking Facial Recognition_ Diverse Activities Analysis (1).mp4


## 4) Processar vídeo (InsightFace + Emoções)

In [4]:
import cv2, json, numpy as np
from insightface.app import FaceAnalysis
from tensorflow.keras.models import load_model
from collections import Counter

# Inicializa InsightFace (detecção)
app = FaceAnalysis(name="buffalo_l")
try:
    app.prepare(ctx_id=0, det_size=(640, 640))  # tenta GPU
except Exception:
    app.prepare(ctx_id=-1, det_size=(640, 640))  # CPU

# Carrega modelo de emoções
emotion_model = load_model("models/emotion_model.hdf5", compile=False)

# Descobre tamanho esperado pelo modelo: (None, H, W, C)
input_shape = emotion_model.input_shape
if isinstance(input_shape, list):
    input_shape = input_shape[0]

H, W = int(input_shape[1]), int(input_shape[2])
C = int(input_shape[3]) if len(input_shape) > 3 else 1

print("Modelo de emoção espera:", (H, W, C))

EMOTIONS = ["angry", "disgust", "fear", "happy", "sad", "surprise", "neutral"]

cap = cv2.VideoCapture(video_filename)
fps = cap.get(cv2.CAP_PROP_FPS) or 30
frame_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

out = cv2.VideoWriter(
    "output/video_output.mp4",
    cv2.VideoWriter_fourcc(*"mp4v"),
    fps,
    (frame_w, frame_h)
)

activity_counter = Counter()
emotion_counter = Counter()
total_frames = 0
total_faces = 0
prev_gray = None

while True:
    ret, frame = cap.read()
    if not ret:
        break
    total_frames += 1

    gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    if prev_gray is None:
        activity = "parado"
    else:
        diff = cv2.absdiff(gray_frame, prev_gray)
        mean_diff = float(np.mean(diff))
        if mean_diff < 5:
            activity = "parado"
        elif mean_diff < 20:
            activity = "movimento leve"
        else:
            activity = "movimento intenso"
    prev_gray = gray_frame
    activity_counter[activity] += 1

    faces = app.get(frame)

    for face in faces:
        x1, y1, x2, y2 = map(int, face.bbox)
        x1, y1 = max(0, x1), max(0, y1)
        x2, y2 = min(frame_w - 1, x2), min(frame_h - 1, y2)

        if (x2 - x1) < 30 or (y2 - y1) < 30:
            continue

        total_faces += 1
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)

        face_img = frame[y1:y2, x1:x2]
        if face_img.size == 0:
            continue

        gray = cv2.cvtColor(face_img, cv2.COLOR_BGR2GRAY)
        gray = cv2.equalizeHist(gray)
        gray = cv2.resize(gray, (W, H), interpolation=cv2.INTER_AREA)
        gray = gray.astype("float32") / 255.0

        if C == 1:
            inp = gray.reshape(1, H, W, 1)
        else:
            inp = np.stack([gray, gray, gray], axis=-1).reshape(1, H, W, 3)

        preds = emotion_model.predict(inp, verbose=0)[0]
        emo = EMOTIONS[int(np.argmax(preds))]
        emotion_counter[emo] += 1

        cv2.rectangle(frame, (x1, max(0, y1 - 45)), (x2, max(0, y1 - 10)), (0, 0, 0), -1)
        cv2.putText(frame, emo, (x1, max(0, y1 - 15)),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 255), 2, cv2.LINE_AA)

    cv2.putText(frame, f"Atividade: {activity}", (10, frame_h - 20),
                cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

    out.write(frame)

cap.release()
out.release()

summary = {
    "total_frames": total_frames,
    "total_faces_analyzed": total_faces,
    "activities": dict(activity_counter),
    "emotions": dict(emotion_counter),
}

with open("output/summary.json", "w", encoding="utf-8") as f:
    json.dump(summary, f, indent=4, ensure_ascii=False)

with open("output/summary.txt", "w", encoding="utf-8") as f:
    f.write("TECH CHALLENGE 4\n")
    f.write("======================================\n\n")
    f.write(f"Total de frames analisados: {total_frames}\n")
    f.write(f"Total de faces analisadas: {total_faces}\n\n")
    f.write("Atividades detectadas (contagem):\n")
    for k, v in activity_counter.items():
        f.write(f" - {k}: {v}\n")
    f.write("\nEmocoes detectadas (contagem):\n")
    for k, v in emotion_counter.items():
        f.write(f" - {k}: {v}\n")

print("OK: processamento concluído. Arquivos em output/")

download_path: /root/.insightface/models/buffalo_l
Downloading /root/.insightface/models/buffalo_l.zip from https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip...


100%|██████████| 281857/281857 [00:02<00:00, 101055.10KB/s]


Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /root/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /root/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /root/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /root/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /root/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
Modelo de em

## 5) Ver vídeo e baixar resultados

In [5]:
from IPython.display import Video, display
display(Video("output/video_output.mp4", embed=False))

from google.colab import files
files.download("output/video_output.mp4")
files.download("output/summary.txt")
files.download("output/summary.json")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>