Install the neccessary libraries

# Setup Libraries 

Install the necessary libraries:
    - OpenCV: (Open Source Computer Vision Library)
    - NumPy: (Numerical Python)
    - gTTS: (Google Text-to-Speech)
    - Pillow: (Python Imaging Library)
    - pyttsx3: (Python Text-to-Speech version 3)

Later:
    - MediaPipe (voor b.v. realtime acties, landmarks etc.) 

In [None]:
pip install opencv-python numpy gTTS Pillow pyttsx3 SpeechRecognition PyAudio


In [10]:
import cv2
import numpy as np
import pyttsx3
import os
import time
import random
from PIL import Image
from gtts import gTTS
import speech_recognition as sr



# Setup Array, Variables and Directories

The following viewing directions are distinguished and will be assigned corresponding labels:

The eight specific gaze directions that are distinguished in the driving procedure:

| Label            | Gaze Direction   |
|------------------|------------------|
| forward          | Forward          |
| left             | Left             |
| right            | Right            |
| mirror_interior  | Interior Mirror  |
| mirror_right     | Right Side Mirror|
| mirror_left      | Left Side Mirror |
| shoulder_right   | Right Shoulder   |
| shoulder_left    | Left Shoulder    |

The five additional gaze directions added:

| Label                     | Gaze Direction                        |
|---------------------------|---------------------------------------|
| dashboard_straight_down   | Dashboard Straight Down               |
| dashboard_down_right      | Dashboard Down Towards Center Console |
| forward_right             | Forward Right                         |
| forward_left              | Forward Left                          |

## setup Array 

This can also be parameterized (using a configuration file). Initially, let's proceed in this manner.

In [11]:
# Create an array containing the table data with an index
table_data = [
    ["Label", "Gaze Direction", "Sentence"],
    ["forward", "Forward", "Look forward."],
    ["left", "Left", "Look to the left."],
    ["right", "Right", "Look to the right."],
    ["mirror_interior", "Interior Mirror", "Look at the interior mirror."],
    ["mirror_right", "Right Side Mirror", "Look at the right side mirror."],
    ["mirror_left", "Left Side Mirror", "Look at the left side mirror."],
    ["shoulder_right", "Right Shoulder", "Look over your right shoulder."],
    ["shoulder_left", "Left Shoulder", "Look over your left shoulder."],
    ["dashboard_straight_down", "Dashboard Straight Down", "Look straight down at the dashboard."],
    ["dashboard_down_right", "Dashboard Down Towards Center Console", "Look down towards the center console."],
    ["forward_right", "Forward Right", "Look forward and to the right."],
    ["forward_left", "Forward Left", "Look forward and to the left."]
]

# To access, for example, the row for "forward_right" and its Sentence column:
# Note that Python is 0-indexed, so row 11 (forward_right) is at index 10 in the array
# print(table_data[11][2])  # Output: Look forward and to the right.


## setup Global Variables

In [3]:
import speech_recognition as sr

# Declareer globale variabelen
global recording_duration
global base_path
global recording_name
global current_save_path
global number_of_cameras
global number_of_subfolders
global max_selecties_per_label
global r
global label_counts
global audio

# Initialiseer de variabelen
recording_duration = 5  # Duration of the recording in seconds
base_path = "C:/GazeDetection"  # Base path for video storage
recording_name = "recording_10042024_Wim_v1"
current_save_path = None # Current save path, dynamically updated
number_of_cameras = 2
number_of_subfolders = 10  # Assuming you want 10 subfolders as mentioned before
max_selecties_per_label = 10  # Max selections pro label

# Initialiseer de speech recognizer
r = sr.Recognizer()

# Initialize a dictionary to track how many times each label has been selected, excluding the header.
# Start every label's count at 0.
label_counts = {data[0]: 0 for data in table_data[1:]}  # Skip the header row


## Setup Directory 

1. Create a folder in C:\GazeDetection\ and name it recording_<date><name><sequence_number>
2. Within this folder, create folders named after the label of the gaze direction(s).
3. In each folder, create subfolders pro camera
3. In each camerafolder, create subfolders that are numbered, starting at 0 and ending at <number>.

(The variables for name, sequence number, gaze direction(s), number of cameras and number can be parameterized)

In [4]:
# Create the main directory
recording_path = os.path.join(base_path, recording_name)
os.makedirs(recording_path, exist_ok=True)

# Create directories for each label and within those, subdirectories
for data in table_data[1:]:  # Skip the header row
    label = data[0]
    label_path = os.path.join(recording_path, label)
    os.makedirs(label_path, exist_ok=True)  # Create a directory for the label
    
    for j in range(1, number_of_subfolders + 1):
        subfolder_name = str(j)
        subfolder_path = os.path.join(label_path, subfolder_name)
        os.makedirs(subfolder_path, exist_ok=True)  # Create subdirectories numbered 1 through 9 within each label directory


# Define Procedures

## Text to Speech

Onderstaande nog verder uitzoeken, dit werkt voor een "standaard" engelse stem. Zet ik deze op NL (die op Windows wel geinstalleerd is) dan werkt het echter niet. Voor nu dus even in het Engels. Aanpassen Voor Windows 10/11: Taalpakketten controleren en installeren via instellingen > Tijd & Taal > Taal. De koppeling naar het lokale besturingssysteem is hier ook tricky, gaat niet 'zomaar' werken op elke laptop. 

In [5]:
def say_sentence(sentence):
    """
    This procedure initializes the text-to-speech converter, adjusts the speaking rate,
    and speaks out the given sentence.
    
    :param sentence: The text sentence to be spoken by the computer.
    """
    # Initialize the converter
    engine = pyttsx3.init()

    # Retrieve the current speaking rate
    rate = engine.getProperty('rate')

    # Decrease the speaking rate (adjust the value as desired)
    engine.setProperty('rate', rate - 80)

    # Set the text to be spoken
    text = sentence

    # Make the computer speak
    engine.say(text)

    # Wait until the speaking is complete
    engine.runAndWait()

# Make and save (random) video's 

Of course, this can be improved as well.

## update de global variables

In [8]:
def select_label_and_prepare_recording():
    global current_save_path, label_counts
    if min(label_counts.values()) < 10:
        available_labels = [label for label, count in label_counts.items() if count < 10]
        chosen_label = random.choice(available_labels)
        chosen_index = next(index for index, data in enumerate(table_data) if data[0] == chosen_label)
        _, _, chosen_sentence = table_data[chosen_index]

        # say sentence
        say_sentence(chosen_sentence)
        time.sleep(2) # wacht 2 seconden
        say_sentence("Start")

        current_save_path = [os.path.join(base_path, recording_name, chosen_label, str(label_counts[chosen_label]+1))]
        
        label_counts[chosen_label] += 1
    else:
        print("Each procedure has been executed 10 times for each label.")

 

## Check if the recording is correct or needs to be redone (under construction)

Dit werkt 'n beetje, Google geeft heel erg vaak de "UnkownValueError" terug. Mischien helpt het om de audio mbv van de webcams op te nemen,samen te voegen en dat als input te gebruiken). 

In [None]:
def ask_and_process():
    # Ask the user a question and process the input
    question = "Should this recording be deleted? Say Yes or No!"
    say_sentence(question)
    
    # Create a recognizer object
    r = sr.Recognizer()

    # Use the default microphone as the audio source
    with sr.Microphone() as source:
        audio = r.listen(source)
    
    # Use Google Speech Recognition to convert audio to text
    try:
        answer = r.recognize_google(audio, language='en-US')
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
        # Recall the function to ask the question again
        ask_and_process()
        return  # Exit the function if speech recognition fails
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))
        return  # Exit the function if request fails
    
    if answer.lower() == 'yes':
        print("You chose 'yes', we will continue.")
        # Do nothing
    elif answer.lower() == 'no':
        print("You chose 'no', we will stop here.")
        # Delete the video
        os.remove(os.path.join(current_save_path[0], 'camera_1.avi'))
        os.remove(os.path.join(current_save_path[0], 'camera_2.avi'))       
        # Decrease the index
        label_counts[chosen_label] -= 1
    else:
        print("Invalid answer, please enter 'yes' or 'no'.")
        # Recall the function to ask the question again
        ask_and_process()


## Open the cameras and start de recording loop

- Dingen die uitgezocht moeten worden:
    - Zoom
    - fps, resolutie
    - Opslagformaat (avi, mp4 etc.)
  
  Hierin ook kijken wat het beste resultaat op levert. 

In [9]:
# Aannames: base_path, recording_name, table_data, number_of_cameras zijn gedefinieerd

# Pas dit deel van je code aan
cap1 = cv2.VideoCapture(1)  # Camera index 1
cap2 = cv2.VideoCapture(2)  # Camera index 2

# Voorbeeld: initialisatie van de VideoWriter objecten buiten de loop, heropen ze voor elke nieuwe opname
out1 = None
out2 = None

while True:
        
    # Check if all labels have been selected 10 times
    if min(label_counts.values()) >= 10:
        print("All labels have been processed 10 times. Ending recording.")
        break  # End the loop
        
    # Check if it's time to start a new recording or if no recording is currently active
    if current_save_path is None or time.time() - start_time >= recording_duration:
        select_label_and_prepare_recording()  # Update the path and prepare for recording
        start_time = time.time()  # Reset the start time for the new recording

        if out1 is not None:
            out1.release()
            out2.release()

        # Initialiseer de VideoWriter opnieuw voor een nieuwe file
        fourcc = cv2.VideoWriter_fourcc(*'XVID')
        # Deze doet het wel, ik vraag me alleen af of de kwaliteit afdoende is
        out1 = cv2.VideoWriter(os.path.join(current_save_path[0], 'camera_1.avi'), fourcc, 20.0, (640, 480))
        out2 = cv2.VideoWriter(os.path.join(current_save_path[0], 'camera_2.avi'), fourcc, 20.0, (640, 480))
      
        # Deze werken niet. Openen video geeft foutmelding
        # out1 = cv2.VideoWriter(os.path.join(current_save_path[0], 'camera_1.avi'), fourcc, 0, (0, 0))
        # out1 = cv2.VideoWriter(os.path.join(current_save_path[0], 'camera_1.avi'), fourcc, 30.0, (1920, 1080))
        # out2 = cv2.VideoWriter(os.path.join(current_save_path[0], 'camera_2.avi'), fourcc, 0, (0, 0))
        # out2 = cv2.VideoWriter(os.path.join(current_save_path[0], 'camera_2.avi'), fourcc, 30.0, (1920, 1080))
        
        # Definieer de gewenste resolutie en framerate
        # resolution = (1920, 1080)  # Resolutie van 1920x1080 (1080p)
        # fps = 30  # Framerate van 30 frames per seconde

        # Initialiseer de VideoWriter objecten
        # fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Codec voor het MP4-formaat
        # out1 = cv2.VideoWriter(os.path.join(current_save_path[0], 'camera_1.mp4'), fourcc, fps, resolution)
        # out2 = cv2.VideoWriter(os.path.join(current_save_path[0], 'camera_2.mp4', fourcc, fps, resolution)

    ret1, frame1 = cap1.read()
    ret2, frame2 = cap2.read()
    
    if ret1:
        out1.write(frame1)
        #imshow uitzetten, alleen voor debugging
        #cv2.imshow('Camera 1', frame1)
    if ret2:
        out2.write(frame2)
        #imshow uitzetten, alleen voor debugging
        #cv2.imshow('Camera 2', frame2)

    # this process for this moment on hold, the speech recognition wil not work correct
    # ask_and_process()
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

if out1 is not None:
    out1.release()
    out2.release()
cap1.release()
cap2.release()
cv2.destroyAllWindows()


All labels have been processed 10 times. Ending recording.


## Code for positioning the cameras
### De camera's worden getoond op het scherm. Is een check hoe dit script het camerabeeld toont. (Na het instellen van de zoom en positie via de Logitech-Camera app)

In [6]:
# Open the cameras
cap1 = cv2.VideoCapture(1)  # Camera index 1
cap2 = cv2.VideoCapture(2)  # Camera index 2

# Check if the cameras are opened correctly
if not cap1.isOpened() or not cap2.isOpened():
    print("Unable to open one of the cameras.")
    exit()
    
# Read frames from the cameras and display them on the screen
while True:
    ret1, frame1 = cap1.read()
    ret2, frame2 = cap2.read()
    
    # Display the frames on the screen
    cv2.imshow('Camera 1', frame1)
    cv2.imshow('Camera 2', frame2)
    
    # Wait for a key press and check if 'q' is pressed to exit the loop
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Close all windows and release the resources
cap1.release()
cap2.release()
cv2.destroyAllWindows()

## To Try, MPEG format

In [None]:
import cv2

camera = cv2.VideoCapture(0, cv2.CAP_DSHOW)

camera.set(cv2.CAP_PROP_FPS, 30.0)
camera.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('m','j','p','g'))
camera.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('M','J','P','G'))
camera.set(cv2.CAP_PROP_FRAME_WIDTH, 1920)
camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 1080)

while (1):
    retval, im = camera.read()
    cv2.imshow("image", im)

    k = cv2.waitKey(1) & 0xff
    if k == 27:
        break

camera.release()
cv2.destroyAllWindows()