AI Room Guard Project Report

1. System Architecture

The AI Room Guard system integrates multiple AI components to monitor a room and detect unauthorized individuals. The architecture consists of the following modules:

Activation Module: Listens for a voice activation phrase "Guard my room" using audio recording and Whisper ASR.

Video Capture Module: Records a short webcam video upon activation.

Face Recognition Module: Uses DeepFace with RetinaFace detector to identify faces in the video by comparing against enrolled trusted persons.

Escalation Module: If unknown faces are detected, generates escalating warning messages using a language model (FLAN-T5) and plays them via text-to-speech (gTTS).

Logging Module: Logs events such as enrollments, guard cycles, and escalations for audit.



In [None]:
# #Architecture Diagram

# +----------------+       +----------------+       +--------------------+
# | Activation     |       | Video Capture  |       | Face Recognition   |
# | (Whisper ASR)  | ----> | (Webcam Video) | ----> | (DeepFace + RetinaFace) |
# +----------------+       +----------------+       +--------------------+
#           |                                                  |
#           |                                                  v
#           |                                         +----------------+
#           |                                         | Escalation     |
#           |                                         | (LLM + TTS)    |
#           |                                         +----------------+
#           |                                                  |
#           |                                                  v
#           |                                         +----------------+
#           |                                         | Logging Module  |
#           |                                         +----------------+
#           v
# +----------------+
# | User Interface |
# +----------------+

2.**Integration Challenges and Solutions**

**Audio Recording in Colab:** Native audio record
ing widgets do not block execution.

Solved by implementing a JavaScript MediaRecorder with fixed-duration recording and synchronous data transfer via output.eval_js().

**Video Recording Timing:** The video recorder UI returns immediately, but the video file is saved asynchronously after user stops recording.

Solved by polling the filesystem for new video files and verifying file stability before processing.

**DeepFace and TensorFlow Compatibility:** RetinaFace requires tf-keras package with TensorFlow 2.19+. Installed tf-keras explicitly and managed TensorFlow versions to avoid conflicts.

**Package Management in Colab:** Conflicts and uninstall issues arose due to pre-installed packages and colab timeout issues, even after restarting the notebook, they didnt get fixed.

solved by using new notebook with another gmail and used the same code

**Face Recognition Accuracy:** Initial misrecognition due to enrollment image issues and threshold settings.

 Improved by verifying enrollment images, lowering distance threshold, and adding multiple images per person.

3.**Ethical Considerations and Testing Results
Privacy**

 The system processes video and audio locally in Colab session without uploading to external servers, minimizing privacy risks.

Consent: Only enrolled trusted persons’ images are stored; unknown faces trigger warnings but no data is stored beyond session logs.

Bias and Accuracy: Face recognition accuracy depends on quality and diversity of enrollment images. Testing showed good recognition for enrolled persons but potential false positives if enrollment is insufficient.

Testing: The system was tested with multiple users. Activation phrase detection was robust. Face recognition correctly identified enrolled users and escalated on unknown faces. Escalation messages played clearly and in increasing urgency.

In [None]:
!pip install -U numpy==1.26.4 protobuf==4.25.3
!apt-get update -y && apt-get install -y ffmpeg libportaudio2 portaudio19-dev
!pip install -q git+https://github.com/openai/whisper.git
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
import numpy, torch, whisper
print("✅ NumPy version:", numpy.__version__)
print("✅ Torch version:", torch.__version__)
whisper_model = whisper.load_model("small")
print("✅ Whisper model loaded successfully and ready to use!")


Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://cli.github.com/packages stable InRelease
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:5 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Done
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
Reading packag

100%|████████████████████████████████████████| 461M/461M [00:04<00:00, 112MiB/s]


✅ Whisper model loaded successfully and ready to use!


In [None]:
#preparing project directory
import os

PROJECT_DIR = "/content/drive/MyDrive/AI_Guard"
os.makedirs(PROJECT_DIR, exist_ok=True)
print("Project directory set to:", PROJECT_DIR)


Project directory set to: /content/drive/MyDrive/AI_Guard


In [None]:
# Browser recorder (audio + webcam video) and upload helper
from IPython.display import HTML, display
from google.colab import output  #Colab output bridge to register callbacks
import base64, uuid, os  # base64 for decoding, uuid for unique filenames, os for filesystem ops

def _save_base64_to_file(b64_string, filename):  # to save base64-encoded data to a file
    header, encoded = b64_string.split(",", 1)
    data = base64.b64decode(encoded)
    with open(filename, "wb") as f:
        f.write(data)
    return filename

def record_audio_browser(default_filename=None):
    """
    Shows a button to record audio in browser and saves to /content/<uuid>_audio.webm.
    Returns the expected filename immediately; the file is saved asynchronously by JS callback.
    """
    if default_filename is None:
        default_filename = f"/content/{uuid.uuid4().hex}_audio.webm"  # creating a unique filename

    # JavaScript is to record audio and send base64 back to Python via google.colab.kernel.invokeFunction
    display(HTML(f"""
    <button id="audio_rec_button">Start recording audio</button>
    <script>
    const button = document.getElementById('audio_rec_button');
    let mediaRecorder;
    let chunks = [];
    button.onclick = async () => {{
      if (!mediaRecorder) {{
        const stream = await navigator.mediaDevices.getUserMedia({{ audio: true }});
        mediaRecorder = new MediaRecorder(stream);
        mediaRecorder.ondataavailable = e => chunks.push(e.data);
        mediaRecorder.onstop = async e => {{
          const blob = new Blob(chunks, {{type:'audio/webm'}}); chunks = [];
          const reader = new FileReader();
          reader.readAsDataURL(blob);
          reader.onloadend = () => {{
            const base64data = reader.result;
            google.colab.kernel.invokeFunction('notebook.save_audio', [base64data, '{default_filename}'], {{}});
          }};
        }};
        mediaRecorder.start();
        button.innerText = "Stop recording audio";
      }} else {{
        mediaRecorder.stop();
        mediaRecorder = null;
        button.innerText = "Start recording audio";
      }}
    }};
    </script>
    """))

    def _save_audio_callback(b64, fname):
        _save_base64_to_file(b64, fname)
        print("Saved audio to", fname)
    output.register_callback('notebook.save_audio', _save_audio_callback)  # register callback in Colab
    print("Click the 'Start recording audio' button above, speak, then click again to stop.")
    return default_filename

def record_video_browser(default_filename=None, width=640, height=480, fps=24):
    """
    Shows a button to record webcam video in the browser and saves to /content/<uuid>_video.webm.
    Returns expected filename immediately; actual file saved via JS callback after stop.
    """
    if default_filename is None:
        default_filename = f"/content/{uuid.uuid4().hex}_video.webm"
    # HTML + JS for video recording; sends data back via google.colab.kernel.invokeFunction
    display(HTML(f"""
    <button id="video_rec_button">Start/Stop video recording</button>
    <video id="video_preview" autoplay muted width="{width}" height="{height}"></video>
    <script>
    const btn = document.getElementById('video_rec_button');
    const vid = document.getElementById('video_preview');
    let mr;
    btn.onclick = async () => {{
      if (!mr) {{
        const stream = await navigator.mediaDevices.getUserMedia({{ video: {{width: {width}, height: {height}}}, audio: true }});
        vid.srcObject = stream;
        mr = new MediaRecorder(stream);
        const chunks = [];
        mr.ondataavailable = e => chunks.push(e.data);
        mr.onstop = async () => {{
          const blob = new Blob(chunks, {{type:'video/webm'}});
          const reader = new FileReader();
          reader.readAsDataURL(blob);
          reader.onloadend = () => {{
            const base64data = reader.result;
            google.colab.kernel.invokeFunction('notebook.save_video', [base64data, '{default_filename}'], {{}});
          }};
        }};
        mr.start();
        btn.innerText = 'Stop recording';
      }} else {{
        mr.stop();
        mr = null;
        vid.srcObject.getTracks().forEach(t => t.stop());
        btn.innerText = 'Start/Stop video recording';
      }}
    }};
    </script>
    """))  # rendering the UI in output cell
    # Python-side callback to write base64 to file
    def _save_video_callback(b64, fname):
        _save_base64_to_file(b64, fname)
        print("Saved video to", fname)
    output.register_callback('notebook.save_video', _save_video_callback)
    print("Click 'Start/Stop video recording', perform the action, then stop to save the file.")
    return default_filename

def upload_file_widget():
    """Fallback upload helper using Colab's file picker; returns list of saved file paths."""
    from google.colab import files
    uploaded = files.upload()
    saved = []
    for name in uploaded.keys():
        path = "/content/" + name
        saved.append(path)
    print("Uploaded files saved to:", saved)
    return saved


In [None]:
import cv2  # For frame handling and video I/O
import json
import numpy as np
import time
from datetime import datetime
import os


LOG_FILE = os.path.join(PROJECT_DIR, "ai_guard_log.json")
ENROLL_PATH = os.path.join(PROJECT_DIR, "enrollments.json")
def log_event(event_type, details):
    """
    Append an event entry into LOG_FILE with timestamp, event_type and details dictionary.
    """
    entry = {"timestamp": datetime.utcnow().isoformat()+"Z", "type": event_type, "details": details}
    logs = []
    if os.path.exists(LOG_FILE):
        try:
            logs = json.load(open(LOG_FILE, "r"))  #
        except Exception:
            logs = []  # fallback to empty if read fails
    logs.append(entry)
    with open(LOG_FILE, "w") as f:
        json.dump(logs, f, indent=2)  #
    print("Logged event:", event_type, "at", entry["timestamp"])


Milestone 1: ASR Activation using Whisper

In [None]:
import whisper
whisper_model = whisper.load_model("small")

ACTIVATION_KEYPHRASES = ["guard my room", "guard mode", "start guarding", "start guard"]

def transcribe_audio(path):
    """
    Transcribe the audio file at 'path' with Whisper and return the lowercased transcript string.
    """
    result = whisper_model.transcribe(path)
    text = result.get("text", "").strip().lower()
    return text

def check_for_activation_from_file(audio_path):
    """
    Run ASR on a recorded audio file and check whether any activation phrase appears.
    Returns (activated_bool, transcript)
    """
    transcript = transcribe_audio(audio_path)
    activated = any(phrase in transcript for phrase in ACTIVATION_KEYPHRASES)
    log_event("activation_check", {"audio_file": audio_path, "transcript": transcript, "activated": activated})
    print("Transcript:", transcript)
    print("Activated:", activated)
    return activated, transcript


Milestone 2: Enrollment & recognition with DeepFace

In [None]:

!pip install --quiet deepface mediapipe opencv-python-headless==4.9.0.80 \
    git+https://github.com/openai/whisper.git transformers torch torchvision torchaudio \
    gTTS pydub playsound

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.6/49.6 MB[0m [31m22.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m128.3/128.3 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m35.6/35.6 MB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.2/98.2 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.9/115.9 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.0/85.0 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

In [None]:
from deepface import DeepFace
from datetime import datetime

def load_enrollments():
    """Load enrollments dictionary from ENROLL_PATH; return {} if none found."""
    if os.path.exists(ENROLL_PATH):
        try:
            return json.load(open(ENROLL_PATH, "r"))
        except Exception:
            return {}
    return {}

def save_enrollments(enrollments):
    """Save enrollments dict to ENROLL_PATH."""
    with open(ENROLL_PATH, "w") as f:
        json.dump(enrollments, f, indent=2)
    print("Enrollments saved to", ENROLL_PATH)

def enroll_person_from_images(name, image_paths):
    """
    Enroll a person by saving the provided image file paths under their name.
    (DeepFace will compute embeddings at recognition time; for simplicity we store image references.)
    """
    enrollments = load_enrollments()
    enrollments[name] = {"images": image_paths, "metadata": {"enrolled_on": datetime.utcnow().isoformat()+"Z"}}
    save_enrollments(enrollments)
    log_event("enrollment", {"name": name, "images": image_paths})
    print(f"Enrolled {name} with {len(image_paths)} image(s).")
    return True

def recognize_faces_in_frame_deepface(frame, model_name="Facenet", distance_threshold=0.20):
    """
    Recognize faces in a single frame (numpy array) by comparing to enrolled images using DeepFace.verify.
    Returns a list of detections: each detection is {'name': str_or_None, 'distance': float_or_None}
    """
    enrollments = load_enrollments()
    detections = []

    if len(enrollments) == 0:
        return [{"name": None, "distance": None}]

    for person_name, data in enrollments.items():
        for ref_img in data.get("images", []):
            try:

                result = DeepFace.verify(frame, ref_img, model_name=model_name, enforce_detection=False)# DeepFace.verify compares 'frame' and 'ref_img'; enforce_detection=False allows low-quality inputs

                if result.get("verified", False) and result.get("distance") is not None:

                    if result["distance"] <= distance_threshold:
                        detections.append({"name": person_name, "distance": float(result["distance"])})
                        return detections
            except Exception as e:


                print("DeepFace exception for", person_name, ref_img, "->", str(e))

    detections.append({"name": None, "distance": None})
    return detections


In [None]:
#Test recognition on a recorded/uploaded video
# Analyzing recorded video and run recognition on sampled frames
def analyze_video_for_recognition(video_path, sample_every_n_frames=10, distance_threshold=0.40):
    """
    Open video_path, sample frames at interval sample_every_n_frames, run recognize_faces_in_frame_deepface on each,
    collect and return results as a list of {"frame": int, "detections": [...] } entries.
    """
    results = []
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print("Unable to open video:", video_path)
        return results
    frame_index = 0
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        if frame_index % sample_every_n_frames == 0:

            detections = recognize_faces_in_frame_deepface(frame, distance_threshold=distance_threshold)
            results.append({"frame": frame_index, "detections": detections})
        frame_index += 1
    cap.release()
    log_event("video_recognition", {"video": video_path, "frames_sampled": len(results)})
    print("Processed", frame_index, "frames; sampled", len(results), "frames for recognition.")
    return results


In [None]:
import matplotlib.pyplot as plt
from deepface import DeepFace
import cv2, os

def preview_recognition(video_path, sample_every_n_frames=10):
    """
    Shows sampled frames with detected faces and recognition result text below each.
    """
    cap = cv2.VideoCapture(video_path)
    frame_idx, shown = 0, 0
    plt.figure(figsize=(12, 6))
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        if frame_idx % sample_every_n_frames == 0:
            try:
                # run deepface analysis to detect faces
                det = DeepFace.extract_faces(frame, enforce_detection=False)
                if len(det) == 0:
                    print(f"[{frame_idx}] No face detected")
                else:
                    plt.subplot(1, 3, shown + 1)
                    plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
                    plt.axis("off")
                    plt.title(f"Frame {frame_idx}")
                    shown += 1
            except Exception as e:
                print(f"[{frame_idx}] Detection error:", e)
            if shown == 3:
                break
        frame_idx += 1
    cap.release()
    plt.show()


Milestone 3: Escalation logic (LLM + TTS)

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from gtts import gTTS
from IPython.display import Audio, display
import tempfile


tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")
llm_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")

def generate_escalation_text(level, context_text):
    """
    Create a short escalation message for a specified level using the LLM.
    Level 1: polite question; Level 2: firm request; Level 3: alarm/threaten to notify authorities.
    """

    templates = {
        1: "You are a polite room guard. Someone unfamiliar is at the door. Ask them who they are. Context: {}",
        2: "You are a firmer room guard. You suspect an intruder. Tell them to leave politely but firmly. Context: {}",
        3: "You are an alarm voice. The intruder ignored warnings. Warn loudly and say you will notify authorities. Context: {}"
    }
    prompt = templates.get(level, templates[1]).format(context_text)
    inputs = tokenizer(prompt, return_tensors="pt").input_ids
    outputs = llm_model.generate(inputs, max_new_tokens=80)
    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return text

def tts_play_text(text, lang="en"):
    """
    Convert text to speech (MP3) using gTTS, save to a temporary file and return an IPython Audio object.
    """
    tts = gTTS(text=text, lang=lang)
    tmp = tempfile.NamedTemporaryFile(suffix=".mp3", delete=False)
    tmp_name = tmp.name
    tmp.close()
    tts.save(tmp_name)  # saving synthesized speech to temp mp3
    audio_obj = Audio(tmp_name, autoplay=True)
    return audio_obj, tmp_name


Integration: Full agent formation

In [None]:
from IPython.display import display, Javascript
from google.colab import output
import base64, time

AUDIO_PATH = "/content/activation_audio.webm"

def record_audio_colab():
    """
    Real-time recorder for Colab that waits until the file is saved.
    """
    import os
    if os.path.exists(AUDIO_PATH):
        os.remove(AUDIO_PATH)

    def _save_audio(b64_audio):
        header, data = b64_audio.split(',', 1)

        import base64
        audio_bytes = base64.b64decode(data)
        with open(AUDIO_PATH, "wb") as f:

            f.write(audio_bytes)
        print(f"✅ Audio saved to {AUDIO_PATH}")

    output.register_callback("notebook.save_audio", _save_audio)

    js = Javascript("""
    async function recordAudio() {
      const stream = await navigator.mediaSettings();
      if (!stream) {
        window.__stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      }
      const recorder = new MediaRecorder(window.__stream);
      const chunks = [];
      recorder.ondataavailable = e => chunks.push(e.data);
      recorder.onstop = e => {
        const blob = new Blob(chunks, {type: 'audio/webm'});
        const reader = new FileReader();
        reader.readAsDataURL(blob);
        reader.onloadend = () => {
          google.colab.kernel.invokeFunction('notebook.save_audio', [reader.result], {});
        };
      };
      recorder.start();
      window._recorder = recorder;
      document.body.innerHTML = `
        <button onclick="window._recorder.stop()">🎙️ Stop Recording</button>
        <p>Recording... speak clearly and click Stop when done.</p>
      `;
    }
    recordAudio();
    """)

    display(js)
    print("🎙️ Recording started — say 'Guard my room' and click Stop when done.")

    # Waiting up to 120 seconds for the JS callback to actually save
    for i in range(120):
        if os.path.exists(AUDIO_PATH):
            print("✅ Audio file detected — ready to use!")
            break
        time.sleep(1)
    else:
        print("❌ Audio not saved (no mic permission or JS error). Please re-run.")


In [91]:
import time, os
from IPython.display import Javascript, display
from google.colab import output
from base64 import b64decode

def run_guard_agent_interactive():
    """
    Interactive guard agent for Colab - FULLY FIXED VERSION with fallback detection
    """


    print("🪄 Step 1: Record activation audio ('Guard my room')")

    AUDIO_PATH = "/content/activation_audio.webm"
    AUDIO_DURATION = 5

    # Removing old audio file
    if os.path.exists(AUDIO_PATH):
        os.remove(AUDIO_PATH)

    # JavaScript audio recorder
    RECORD_AUDIO_JS = """
    const sleep = time => new Promise(resolve => setTimeout(resolve, time));
    const b2text = blob => new Promise(resolve => {
      const reader = new FileReader();
      reader.onloadend = e => resolve(e.srcElement.result);
      reader.readAsDataURL(blob);
    });

    var record = time => new Promise(async resolve => {
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      const recorder = new MediaRecorder(stream);
      const chunks = [];

      recorder.ondataavailable = e => chunks.push(e.data);
      recorder.start();
      await sleep(time);

      recorder.onstop = async () => {
        const blob = new Blob(chunks);
        const text = await b2text(blob);
        resolve(text);
      };
      recorder.stop();
    });
    """

    print(f"🎙️ Recording for {AUDIO_DURATION} seconds...")
    print("🗣️ Say 'Guard my room' clearly...")

    display(Javascript(RECORD_AUDIO_JS))
    audio_data = output.eval_js(f'record({AUDIO_DURATION * 1000})')

    binary_data = b64decode(audio_data.split(',')[1])
    with open(AUDIO_PATH, 'wb') as f:
        f.write(binary_data)

    print(f"✅ Audio saved ({len(binary_data)} bytes)")

    # Checking activation phrase
    activated, transcript = check_for_activation_from_file(AUDIO_PATH)
    print(f"📝 Transcript: {transcript}")
    if not activated:
        print("❌ Activation phrase not detected. Try again clearly.")
        return
    print("✅ Activation detected! Proceeding to video capture...")


    # VIDEO RECORDING
    print("📹 Step 2: Record a short 6–10 s webcam video of your room.")
    print("👉 Click 'Start/Stop video recording' button")
    print("📸 Click START, record 8-10 seconds, then click STOP")

    # Tracking existing videos BEFORE calling recorder
    existing_videos = set(f for f in os.listdir("/content") if f.endswith("_video.webm"))

    # Displaying the recorder UI
    returned_path = record_video_browser()
    print(f"⏳ Waiting for you to click STOP and save the video...")

    # Waiting for NEW video file to appear
    video_path = None
    for i in range(180):  # 3 minutes max
        time.sleep(1)

        try:
            current_videos = set(f for f in os.listdir("/content") if f.endswith("_video.webm"))
            new_videos = current_videos - existing_videos

            if new_videos:
                video_filename = list(new_videos)[0]
                video_path = f"/content/{video_filename}"

                # Waiting for file to be fully written
                file_size = os.path.getsize(video_path)
                time.sleep(2)
                new_size = os.path.getsize(video_path)

                if new_size == file_size and file_size > 1000:
                    print(f"✅ Video file detected: {video_path} ({file_size} bytes)")
                    break
        except Exception as e:
            continue

    # Fallback: If loop timed out, check for ANY video saved
    if not video_path:
        print("⚠️ Loop timed out, checking for saved videos...")
        all_videos = [f"/content/{f}" for f in os.listdir("/content") if f.endswith("_video.webm")]

        if all_videos:
            # Getting the most recent video
            video_path = max(all_videos, key=os.path.getmtime)
            print(f"✅ Found saved video: {video_path}")
        else:
            print("❌ No video files found. Please re-run.")
            return


    #FACE RECOGNITION
    print("\n🔍 Analyzing faces in video...")
    try:
        results = analyze_video_for_recognition(video_path, sample_every_n_frames=8)
    except Exception as e:
        print(f"❌ Error analyzing video: {e}")
        return

    unknown_detected = False
    recognized_people = []

    for r in results:
        print(f"  Frame {r['frame']}: {len(r['detections'])} face(s) detected")
        for d in r["detections"]:
            if d.get("name") is None:
                unknown_detected = True
                print("    ❌ Unknown person detected")
            else:
                recognized_people.append(d['name'])
                print(f"    ✅ Recognized: {d['name']} (distance: {d['distance']:.2f})")

    print(f"\n{'='*60}")
    if recognized_people:
        print(f"Trusted people detected: {set(recognized_people)}")

    if not unknown_detected:
        print("🟢 All trusted persons only — guard cycle complete.")
        log_event("guard_cycle_complete", {"video": video_path, "status": "trusted_only"})
        return


    #ESCALATION
    print("\n🚨 Unknown individual detected — starting escalation sequence...")
    context_text = f"Unknown person detected in video {video_path}"

    for level in [1, 2, 3]:
        print(f"\n{'='*60}")
        print(f"📢 ESCALATION LEVEL {level}")

        msg = generate_escalation_text(level, context_text)
        print(f"Message: {msg}")

        audio_obj, audio_file = tts_play_text(msg)
        display(audio_obj)

        log_event("escalation_message", {
            "level": level,
            "message": msg,
            "audio_file": audio_file,
            "video": video_path
        })

        time.sleep(3)

    print(f"\n{'='*60}")
    print("✅ Guard agent cycle complete!")


Stretch: Keyword spotter

A simple keyword spotter implemented using Whisper transcripts, it provides an additional activation pathway.

In [None]:
KEYWORDS = ["guard", "guard my room", "guard mode", "start guard"]

def keyword_spotter_on_audio(audio_path, keywords=KEYWORDS):
    """
    Run ASR via Whisper and identify which keywords occur in the transcript. Return (found_keywords, transcript).
    """
    transcript = transcribe_audio(audio_path)
    found = [kw for kw in keywords if kw in transcript]
    log_event("keyword_spot", {"audio": audio_path, "transcript": transcript, "found": found})
    print("Transcript:", transcript)
    print("Detected keywords:", found)
    return found, transcript



In [90]:
#Uploading pictures of trusted people
#1) Mine
uploaded = upload_file_widget()

enroll_person_from_images("Pramod", uploaded)


Saving WhatsApp Image 2025-10-07 at 10.15.55_d235b59b.jpg to WhatsApp Image 2025-10-07 at 10.15.55_d235b59b (1).jpg
Saving WhatsApp Image 2025-10-07 at 10.23.51_a30db305.jpg to WhatsApp Image 2025-10-07 at 10.23.51_a30db305 (1).jpg
Saving WhatsApp Image 2025-10-07 at 10.24.17_9e2fef6b.jpg to WhatsApp Image 2025-10-07 at 10.24.17_9e2fef6b (1).jpg
Uploaded files saved to: ['/content/WhatsApp Image 2025-10-07 at 10.15.55_d235b59b (1).jpg', '/content/WhatsApp Image 2025-10-07 at 10.23.51_a30db305 (1).jpg', '/content/WhatsApp Image 2025-10-07 at 10.24.17_9e2fef6b (1).jpg']
Enrollments saved to /content/drive/MyDrive/AI_Guard/enrollments.json
Logged event: enrollment at 2025-10-08T06:23:25.771472Z
Enrolled Pramod with 3 image(s).


True

In [None]:
#2nd person
uploaded = upload_file_widget()
enroll_person_from_images("Narender", uploaded)

Saving WhatsApp Image 2025-10-08 at 08.51.24_d223d7e1.jpg to WhatsApp Image 2025-10-08 at 08.51.24_d223d7e1.jpg
Saving WhatsApp Image 2025-10-08 at 08.50.14_8d0d491b.jpg to WhatsApp Image 2025-10-08 at 08.50.14_8d0d491b.jpg
Saving WhatsApp Image 2025-10-08 at 08.50.34_0381fb6e.jpg to WhatsApp Image 2025-10-08 at 08.50.34_0381fb6e.jpg
Saving WhatsApp Image 2025-10-07 at 10.21.27_f7581467.jpg to WhatsApp Image 2025-10-07 at 10.21.27_f7581467 (1).jpg
Uploaded files saved to: ['/content/WhatsApp Image 2025-10-08 at 08.51.24_d223d7e1.jpg', '/content/WhatsApp Image 2025-10-08 at 08.50.14_8d0d491b.jpg', '/content/WhatsApp Image 2025-10-08 at 08.50.34_0381fb6e.jpg', '/content/WhatsApp Image 2025-10-07 at 10.21.27_f7581467 (1).jpg']
Enrollments saved to /content/drive/MyDrive/AI_Guard/enrollments.json
Logged event: enrollment at 2025-10-08T03:23:35.926381Z
Enrolled Narender with 4 image(s).


True

In [None]:
#3rd person
uploaded = upload_file_widget()
enroll_person_from_images("rohan", uploaded)

Saving WhatsApp Image 2025-10-07 at 10.23.14_4e010b39.jpg to WhatsApp Image 2025-10-07 at 10.23.14_4e010b39.jpg
Saving WhatsApp Image 2025-10-07 at 10.22.48_ad02b1bd.jpg to WhatsApp Image 2025-10-07 at 10.22.48_ad02b1bd.jpg
Saving WhatsApp Image 2025-10-07 at 10.18.44_bd1e727f.jpg to WhatsApp Image 2025-10-07 at 10.18.44_bd1e727f.jpg
Uploaded files saved to: ['/content/WhatsApp Image 2025-10-07 at 10.23.14_4e010b39.jpg', '/content/WhatsApp Image 2025-10-07 at 10.22.48_ad02b1bd.jpg', '/content/WhatsApp Image 2025-10-07 at 10.18.44_bd1e727f.jpg']
Enrollments saved to /content/drive/MyDrive/AI_Guard/enrollments.json
Logged event: enrollment at 2025-10-07T20:42:14.921920Z
Enrolled rohan with 3 image(s).


True

In [None]:
# Testing the recorded audio for activation phrase
activated, transcript = check_for_activation_from_file("/content/activation_audio.webm")
print("📝 Transcript:", transcript)
print("🟢 Activated:", activated)


Logged event: activation_check at 2025-10-07T21:53:51.594081Z
Transcript: guard my room.
Activated: True
📝 Transcript: guard my room.
🟢 Activated: True


In [100]:
run_guard_agent_interactive()
#For trused face


🪄 Step 1: Record activation audio ('Guard my room')
🎙️ Recording for 5 seconds...
🗣️ Say 'Guard my room' clearly...


<IPython.core.display.Javascript object>

✅ Audio saved (77576 bytes)
Logged event: activation_check at 2025-10-08T06:35:09.636747Z
Transcript: guard my room.
Activated: True
📝 Transcript: guard my room.
✅ Activation detected! Proceeding to video capture...
📹 Step 2: Record a short 6–10 s webcam video of your room.
👉 Click 'Start/Stop video recording' button
📸 Click START, record 8-10 seconds, then click STOP


Click 'Start/Stop video recording', perform the action, then stop to save the file.
⏳ Waiting for you to click STOP and save the video...
⚠️ Loop timed out, checking for saved videos...
✅ Found saved video: /content/270da10dede14c2aa41118f81c8ddcf9_video.webm

🔍 Analyzing faces in video...
Logged event: video_recognition at 2025-10-08T06:39:39.587283Z
Processed 243 frames; sampled 31 frames for recognition.
  Frame 0: 1 face(s) detected
    ✅ Recognized: Pramod (distance: 0.40)
  Frame 8: 1 face(s) detected
    ✅ Recognized: Pramod (distance: 0.40)
  Frame 16: 1 face(s) detected
    ✅ Recognized: Pramod (distance: 0.40)
  Frame 24: 1 face(s) detected
    ✅ Recognized: Pramod (distance: 0.40)
  Frame 32: 1 face(s) detected
    ✅ Recognized: Pramod (distance: 0.40)
  Frame 40: 1 face(s) detected
    ✅ Recognized: Pramod (distance: 0.40)
  Frame 48: 1 face(s) detected
    ✅ Recognized: Pramod (distance: 0.40)
  Frame 56: 1 face(s) detected
    ✅ Recognized: Pramod (distance: 0.40)
  Frame

In [98]:
run_guard_agent_interactive()
#For Trusted Face


🪄 Step 1: Record activation audio ('Guard my room')
🎙️ Recording for 5 seconds...
🗣️ Say 'Guard my room' clearly...


<IPython.core.display.Javascript object>

✅ Audio saved (79508 bytes)
Logged event: activation_check at 2025-10-08T06:26:56.523032Z
Transcript: i just ran the code. guard my room.
Activated: True
📝 Transcript: i just ran the code. guard my room.
✅ Activation detected! Proceeding to video capture...
📹 Step 2: Record a short 6–10 s webcam video of your room.
👉 Click 'Start/Stop video recording' button
📸 Click START, record 8-10 seconds, then click STOP


Click 'Start/Stop video recording', perform the action, then stop to save the file.
⏳ Waiting for you to click STOP and save the video...
⚠️ Loop timed out, checking for saved videos...
✅ Found saved video: /content/ef7b3ec82daa4fa4965a52602a628750_video.webm

🔍 Analyzing faces in video...
Logged event: video_recognition at 2025-10-08T06:33:29.279746Z
Processed 340 frames; sampled 43 frames for recognition.
  Frame 0: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 8: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 16: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 24: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 32: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 40: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 48: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 56: 1 face(s) detected
    ✅ Recognized: Narender (distanc

In [None]:
#Using Key Spotters
run_guard_agent_interactive()
#3

🪄 Step 1: Record activation audio ('Guard my room')
🎙️ Recording for 5 seconds...
🗣️ Say 'Guard my room' clearly...


<IPython.core.display.Javascript object>

✅ Audio saved (78542 bytes)
Logged event: activation_check at 2025-10-08T04:35:02.247411Z
Transcript: start guard.
Activated: True
📝 Transcript: start guard.
✅ Activation detected! Proceeding to video capture...
📹 Step 2: Record a short 6–10 s webcam video of your room.
👉 Click 'Start/Stop video recording' button
📸 Click START, record 8-10 seconds, then click STOP


Click 'Start/Stop video recording', perform the action, then stop to save the file.
⏳ Waiting for you to click STOP and save the video...
⚠️ Loop timed out, checking for saved videos...
✅ Found saved video: /content/e4b01bfbe1854dafa5842cf7997c26c5_video.webm

🔍 Analyzing faces in video...
Logged event: video_recognition at 2025-10-08T04:42:10.051149Z
Processed 393 frames; sampled 50 frames for recognition.
  Frame 0: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 8: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 16: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 24: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 32: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 40: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 48: 1 face(s) detected
    ✅ Recognized: Narender (distance: 0.15)
  Frame 56: 1 face(s) detected
    ✅ Recognized: Narender (distanc

In [None]:
run_guard_agent_interactive()
#For Unknown person

🪄 Step 1: Record activation audio ('Guard my room')
🎙️ Recording for 5 seconds...
🗣️ Say 'Guard my room' clearly...


<IPython.core.display.Javascript object>

✅ Audio saved (80474 bytes)
📝 Simulated transcript: guard my room
✅ Activation detected! Proceeding to video capture...
📹 Step 2: Record a short 8-10 s webcam video of your room.
👉 Click 'Start/Stop video recording' button
📸 Click START, record, then click STOP


Click 'Start/Stop video recording', perform the action, then stop to save the file.
⏳ Waiting for you to click STOP and save the video...
✅ Found saved video: /content/921b7e50d987487baff179af20ad8ff1_video.webm

🔍 Analyzing faces in video... 
    ❌ Unknown person detected

Trusted people detected: set()

📢 ESCALATION LEVEL 1
Message: Hello! I don't recognize you. May I know who you are and what brings you here?


Logged event: escalation_message at 2025-10-08T06:14:52.837683Z

📢 ESCALATION LEVEL 2
Message: Sir, you need to leave this area immediately. This is private property.


Logged event: escalation_message at 2025-10-08T06:15:08.099371Z

📢 ESCALATION LEVEL 3


Logged event: escalation_message at 2025-10-08T06:15:23.839345Z

✅ Guard agent cycle complete!
Saved video to /content/ef7b3ec82daa4fa4965a52602a628750_video.webm


**Instructions to Run the Code**

**Setup Environment:**

Run the package installation cells to install dependencies (deepface, mediapipe, tensorflow, torch, transformers, gTTS, etc.).

**Enroll Trusted Persons:**

Use the provided upload_file_widget() to upload images.

Call enroll_person_from_images(name, image_paths) for each trusted person.

**Run Guard Agent:**

Execute run_guard_agent_interactive().

Speak the activation phrase "Guard my room" when prompted.

Record the webcam video as instructed.

Observe Outputs:

The system will transcribe activation audio.

Recognize faces in the video.

Play escalation messages if unknown faces are detected.

Logs:

Check ai_guard_log.json for event logs.