# NVIDIA Maxine Eye Contact - Python Notebook

This notebook demonstrates how to use the NVIDIA Maxine Eye Contact service with Python. The Eye Contact feature estimates gaze angles in video and redirects them to create natural, frontal eye contact.

## Overview

The Eye Contact service processes MP4 video files and outputs enhanced videos with corrected gaze direction. This implementation provides:

- **Service Integration**: Connect to Maxine Eye Contact services
- **Default Configuration**: Uses standard parameters with easy customization
- **Streaming Support**: gRPC bi-directional streaming support.

## Requirements

- **Input**: MP4 files with H.264 video codec (audio optional), videos with Variable Frame Rate (VFR) are not supported.
- **Output**: MP4 files with H.264 video codec (preserves original audio)
- **Service**: Access to a running Maxine Eye Contact service instance


## Installation

**Requirements:**
- Python 3.10+ 
- pip package manager
- gRPC dependencies from the requirements.txt file

```bash
pip install -r ../requirements.txt
```

## Service Configuration

Configure the connection to your NVIDIA Maxine Eye Contact NIM service. The service can be running on your machine or on a remote server accessible from your environment.


In [None]:
import os
import sys
import pathlib

# Setup paths for Eye Contact modules
SCRIPT_PATH = str(pathlib.Path().resolve())
sys.path.append(os.path.join(SCRIPT_PATH, "../scripts"))
sys.path.append(os.path.join(SCRIPT_PATH, "../interfaces"))
sys.path.append(os.path.join(SCRIPT_PATH, "../../.."))

# Service connection configuration
SERVICE_HOST = "localhost"  # Update to your service host
SERVICE_PORT = 8001         # Update to your service port
SERVICE_TARGET = f"{SERVICE_HOST}:{SERVICE_PORT}"

print(f"Service target configured: {SERVICE_TARGET}")
print(f"Python paths configured for Eye Contact modules")


## Import Libraries


In [None]:
import grpc
import time
from typing import Iterator
from tqdm import tqdm

# Import Eye Contact modules
from config import EyeContactConfig
from constants import DATA_CHUNK_SIZE
import eyecontact_pb2
import eyecontact_pb2_grpc

print("All libraries imported successfully!")


## Helper Functions

Define functions for processing video data and communicating with the Eye Contact service. These functions handle the bi-directional gRPC streaming protocol used by the Eye Contact service.

### Function Overview

The Eye Contact service uses bi-directional gRPC streaming over a channel. The implementation consists of two main functions:

1. **Request Generator**: A Python iterator that yields request chunks to stream to the service
2. **Response Processor**: A function that processes the incoming gRPC data stream and writes the output file

### Streaming Protocol Details

- **First Stream Item**: Configuration object that sets the Eye Contact feature parameters
- **Subsequent Items**: Video data chunks (64KB each) containing the input MP4 file
- **Response Stream**: Video data chunks containing the processed output MP4 file
- **Configuration Echo**: If parameters are sent, the first response item is an echo that should be skipped


In [None]:
def generate_request_for_inference(
    input_filepath: str, 
    config_params: dict | None = None
) -> Iterator[eyecontact_pb2.RedirectGazeRequest]:
    """Generate streaming requests for the Eye Contact service.

    Args:
        input_filepath: Path to the input MP4 video file (H.264 codec recommended)
        config_params: Dictionary of Eye Contact configuration parameters. If None or empty,
                      default values will be used by the service.

    Yields:
        RedirectGazeRequest messages containing either:
        - Configuration object (first yield, only if config_params provided)
        - Video file data chunks (64KB each)
        
    Raises:
        IOError: If the input file cannot be read due to permissions or I/O errors
        FileNotFoundError: If the input file doesn't exist at the specified path
        
    Example:
        config = {"eye_size_sensitivity": 4, "detect_closure": 1}
        for request in generate_request_for_inference("video.mp4", config):
            # Process each request chunk
            pass
    """
    
    # Send configuration first
    if config_params:
        print("Sending configuration parameters")
        yield eyecontact_pb2.RedirectGazeRequest(
            config=eyecontact_pb2.RedirectGazeConfig(**config_params)
        )
    
    # Send video data in chunks with progress tracking
    print("Sending video data")
    try:
        file_size = os.path.getsize(input_filepath)
        chunk_count = 0
        bytes_sent = 0
        
        with open(input_filepath, "rb") as fd:
            with tqdm(total=file_size, unit='B', unit_scale=True, 
                     desc="Uploading", leave=False) as pbar:
                
                while True:
                    buffer = fd.read(DATA_CHUNK_SIZE)
                    if not buffer:
                        break
                        
                    chunk_count += 1
                    bytes_sent += len(buffer)
                    pbar.update(len(buffer))
                    
                    yield eyecontact_pb2.RedirectGazeRequest(video_file_data=buffer)
                    
        print(f"Upload complete: {chunk_count} chunks ({bytes_sent / (1024*1024):.1f} MB)")
        
    except IOError as e:
        print(f"Error reading input file: {e}")
        raise


#### Create the configuration for Eye Contact service request processing

In [None]:
def create_eye_contact_config() -> dict:
    """Create Eye Contact configuration with default parameters.
    
    Uses the standard default values from the Eye Contact service configuration.
    Users can modify these values as needed for your use case.
    Detailed documentation about the parameters can be found below:
    
    Returns:
        Configuration dictionary for the Eye Contact service
    """
    
    # Default configuration using values from constants.py
    config = {
        "temporal": 0xFFFFFFFF,                    # Enable temporal filtering
        "detect_closure": 0,                       # Eye closure detection (0=disabled, 1=enabled)
        "eye_size_sensitivity": 3,                 # Eye size sensitivity (range: 2-6)
        "enable_lookaway": 0,                      # Natural look-away (0=disabled, 1=enabled)
        "lookaway_max_offset": 5,                  # Max gaze offset for look-away (range: 1-10)
        "lookaway_interval_min": 3,                # Min frames between look-aways (range: 1-600)
        "lookaway_interval_range": 8,              # Look-away timing range (range: 1-600)
        "gaze_pitch_threshold_low": 25.0,          # Gaze pitch correction start (range: 10-35)
        "gaze_pitch_threshold_high": 30.0,         # Gaze pitch full correction (range: 10-35)
        "gaze_yaw_threshold_low": 20.0,            # Gaze yaw correction start (range: 10-35)
        "gaze_yaw_threshold_high": 30.0,           # Gaze yaw full correction (range: 10-35)
        "head_pitch_threshold_low": 20.0,          # Head pitch correction start (range: 10-35)
        "head_pitch_threshold_high": 25.0,         # Head pitch full correction (range: 10-35)
        "head_yaw_threshold_low": 25.0,            # Head yaw correction start (range: 10-35)
        "head_yaw_threshold_high": 30.0,           # Head yaw full correction (range: 10-35)
        "output_video_encoding": eyecontact_pb2.OutputVideoEncoding(
            lossy=eyecontact_pb2.LossyEncoding(
                bitrate=3000000,                   # Video bitrate (3 Mbps)
                idr_interval=8                     # IDR frame interval
            )
        )
    }
    
    return config

### Function Usage Details

#### Request Generation Process

The `generate_request_for_inference` function implements a Python iterator that yields gRPC request chunks:

1. **Configuration Phase**: If `config_params` is provided, the first yielded item is a `RedirectGazeConfig` object
2. **Data Streaming Phase**: The input MP4 file is read in 64KB chunks and yielded as `RedirectGazeRequest` messages
3. **Completion**: The iterator completes when the entire file has been streamed

#### Response Processing

The `write_output_file_from_response` function handles the service response:

1. **Configuration Echo**: If configuration was sent, the first response item is an echo that should be skipped
2. **Data Reception**: Subsequent responses contain `video_file_data` chunks
3. **File Assembly**: Chunks are written sequentially to reconstruct the output MP4 file
4. **Progress Tracking**: Real-time feedback shows chunks received and total data downloaded

#### Error Handling

Both functions include comprehensive error handling:
- File validation before processing
- I/O error detection and reporting  
- Progress tracking for debugging
- Graceful cleanup on failures


In [None]:
def write_output_file_from_response(
    response_iter: Iterator[eyecontact_pb2.RedirectGazeResponse],
    output_filepath: str,
) -> None:
    """Write output video file from the incoming gRPC data stream.

    Args:
        response_iter: Iterator of RedirectGazeResponse messages from the gRPC service.
                      Each message may contain video_file_data chunks.
        output_filepath: Path where the output MP4 video file will be saved.
                        The file will be created or overwritten if it exists.
        
    Raises:
        IOError: If the output file cannot be written due to permissions or disk space issues
        
    Example:
        responses = stub.RedirectGaze(request_stream)
        write_output_file_from_response(responses, "processed_video.mp4")
    """
    print(f"Writing output: {output_filepath}")
    
    chunk_count = 0
    total_bytes = 0
    
    try:
        with open(output_filepath, "wb") as fd:
            # Progress bar for receiving video chunks
            pbar = tqdm(desc="Receiving video chunks", unit="chunks", unit_scale=False, 
                       dynamic_ncols=True, leave=False,
                       bar_format="{desc}: {n} chunks | {rate_fmt} | {postfix}")
            
            try:
                for response in response_iter:
                    if response.HasField("video_file_data"):
                        chunk_data = response.video_file_data
                        fd.write(chunk_data)
                        
                        chunk_count += 1
                        total_bytes += len(chunk_data)
                        
                        pbar.update(1)
                        pbar.set_postfix_str(f"{total_bytes / (1024*1024):.1f} MB received")
            finally:
                pbar.close()
        
        print(f"Output complete: {chunk_count} chunks ({total_bytes / (1024*1024):.1f} MB total)")
        
    except IOError as e:
        print(f"Error writing output file: {e}")
        raise


## Video Processing Example

Process a video file with the Eye Contact service. Configure parameters based on your specific requirements.


In [None]:
# Configuration
input_filepath = "../assets/sample_transactional.mp4"      # Update with your input video path
output_filepath = "output.mp4"     # Desired output video path

# Create default configuration for Eye Contact service request processing
config_params = create_eye_contact_config()

# Process the video
def process_video(input_path: str, output_path: str, config: dict) -> bool:
    """Process video with Eye Contact service.
    
    Returns:
        True if processing succeeded, False otherwise
    """
    try:
        print(f"\nProcessing: {input_path}")
        print(f"Connecting to service: {SERVICE_TARGET}")
            # Validate input file exists
        if not os.path.exists(input_path):
            raise FileNotFoundError(f"Input file not found: {input_path}")
    
        # Connect to service (use secure channel if needed)
        with grpc.insecure_channel(SERVICE_TARGET) as channel:
            stub = eyecontact_pb2_grpc.MaxineEyeContactServiceStub(channel)
            
            start_time = time.time()
            
            # Process video
            responses = stub.RedirectGaze(
                generate_request_for_inference(input_path, config)
            )
            
            # Skip configuration echo response
            next(responses)
            
            # Write output with progress tracking
            write_output_file_from_response(responses, output_path)
            
            end_time = time.time()
            processing_time = end_time - start_time
            
            print(f"Processing complete in {processing_time:.1f}s")
            print(f"Output saved: {output_path}")
            
            return True
            
    except FileNotFoundError:
        print(f"Input file not found: {input_path}")
        print("   Please update 'input_filepath' with a valid video file path")
        return False
    except grpc.RpcError as e:
        print(f"Service connection failed: {e}")
        print(f"   Ensure the Eye Contact service is running at {SERVICE_TARGET}")
        return False
    except Exception as e:
        print(f"Processing failed: {e}")
        return False

# Execute processing
success = process_video(input_filepath, output_filepath, config_params)


## Configuration Parameters 

#### Encoding Options

- `Lossless`: Enables lossless video encoding. This setting overrides any bitrate configuration to ensure maximum quality output, although it results in larger file sizes. Use this mode when quality is the top priority.
   ```bash
   python eye-contact.py --target 127.0.0.1:8001 --lossless
   ```

- `bitrate`: Sets the target bitrate for video encoding in bits per second (bps). Higher bitrates result in better video quality but larger file sizes. This allows balancing quality and file size by controlling the video bitrate. The default is 3,000,000 bps (3 Mbps). For example, setting `--bitrate 5000000` targets 5 Mbps encoding.
   ```bash
   python eye-contact.py --target 127.0.0.1:8001 --bitrate 5000000
   ```

- `idr-interval`: Sets the interval between instantaneous decoding refresh (IDR) frames in the encoded video. IDR frames are special I-frames that clear all reference buffers, allowing the video to be decoded from that point without needing previous frames. Lower values improve seeking accuracy, random access, and overall encoding quality but increase file size, while higher values reduce file size but may impact seeking performance and quality. The default is 8 frames.
   ```bash
   python eye-contact.py --target 127.0.0.1:8001 --idr-interval 10
   ```

- `custom-encoding-params`: Passes custom encoding parameters as a JSON string, that provides fine-grained control for expert users via JSON configuration. These parameters are used to configure properties of the GStreamer nvvideo4linux2 encoder plugin, allowing direct control over the underlying hardware encoder settings.
   ```bash
   python eye-contact.py --custom-encoding-params '{"idrinterval": 20, "maxbitrate": 3000000}'
   ```

**Note:** <span style="color:red">Custom encoding parameters are for expert users who need fine-grained control over video encoding. Incorrect values can cause encoding failures or poor-quality output. To configure the nvenc encoder, refer to [Gst properties of the Gst-nvvideo4linux2 encoder plugin](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvvideo4linux2.html#:~:text=The%20following%20table%20summarizes%20the%20Gst%20properties%20of%20the%20Gst%2Dnvvideo4linux2%20encoder%20plugin).</span>

### Arguments to Control Feature Behavior

The following arguments affect the overall behavior of the feature, such as enabling or disabling temporal filtering or gaze redirection:

- `temporal` - (UINT32) Flag to control temporal filtering (default `0xffffffff`). When set to true, the landmark computation for eye contact is temporally optimized.

- `detect_closure` - (UINT32) Flag to toggle detection of eye closure and occlusion. If turned off, blink and occlusion detection turns off. This might be desirable during estimation-only mode if you still want to obtain gaze estimation in case of occlusion. Not recommended for gaze redirection. Value is either 0 or 1 (default 0).

- `eye_size_sensitivity` - (UINT32) Eye size sensitivity parameter that modifies the blending parameters to use a larger region around the eyes for blending. Integer value from 2 to 6 (default 3).

### Randomized Look Away Parameters

A continuous redirection of gaze to look at the camera might give a perception of staring. Some users might find this effect unnatural or undesired. To occasionally break eye contact, you can enable randomized look away in gaze redirection. Although the gaze is always expected to redirect toward the camera within the range of operation, enabling look away makes the user occasionally break gaze lock to the camera with a micro-movement of the eyes at randomly chosen time intervals. The `enable_look_away` parameter must be set to true to enable this feature. Additionally, you can use the optional parameters `look_away_offset_max`, `look_away_interval_min`, and `look_away_interval_range` to tune the extent and frequency of look away.

- `enable_lookaway` - (UINT32) Flag to toggle look away. If set to on, the eyes are redirected to look away for a random period occasionally to avoid staring. Value is either 0 or 1 (default 0).

- `lookaway_max_offset` - (UINT32) Maximum value of gaze offset angle (degrees) during a random look away when look away is enabled. Requires `--enable_look_away` parameter to be set to true. Integer value from 1 to 10 (default 5).

- `lookaway_interval_min` - (UINT32) Minimum limit for the number of frames at which random look away occurs when look away is enabled. Requires `--enable_look_away` parameter to be set to true. Integer value from 1 to 600 (default 3).

- `lookaway_interval_range` - (UINT32) Range for picking the number of frames at which random look away occurs when look away is enabled. Requires `--enable_look_away` parameter to be set to true. Integer value from 1 to 600 (default 8).

### Range Control

The gaze redirection feature redirects the eyes to look at the camera within a certain range of head and eye motion in which eye contact is desired and looks natural. Beyond this range, the feature gradually transitions away from looking at the camera toward the estimated gaze and eventually turns off in a seamless manner. To provide for various use cases and user preferences, we provide range parameters for the user to control the range of gaze angles and head poses in which gaze redirection occurs and the range in which transition occurs before the redirection is turned off. These are optional parameters.

`gaze_pitch_threshold_low` and `gaze_yaw_threshold_low` define the parameters for the pitch and yaw angles of the estimated gaze within which gaze is redirected toward the camera. Beyond these angles, redirected gaze transitions away from the camera and toward the estimated gaze, turning off redirection beyond `gaze_pitch_threshold_high` and `gaze_yaw_threshold_high` respectively.

Similarly, `head_pitch_threshold_low` and `head_yaw_threshold_low` define the parameters for pitch and yaw angles of the head pose within which gaze is redirected toward the camera. Beyond these angles, redirected gaze transitions away from the camera and toward the estimated gaze, turning off redirection beyond `head_pitch_threshold_high` and `head_yaw_threshold_high`.

- `gaze_pitch_threshold_low` - (FP32) Gaze pitch threshold (degrees) at which the redirection starts transitioning away from camera toward estimated gaze. Float value from 10 to 35 (default 25).

- `gaze_pitch_threshold_high` - (FP32) Gaze pitch threshold (degrees) at which the redirection is equal to estimated gaze and the gaze redirection is turned off beyond this angle. Float value from 10 to 35 (default 30).

- `gaze_yaw_threshold_low` - (FP32) Gaze yaw threshold (degrees) at which the redirection starts transitioning away from camera toward estimated gaze. Float value from 10 to 35 (default 20).

- `gaze_yaw_threshold_high` - (FP32) Gaze yaw threshold (degrees) at which the redirection the redirection is equal to estimated gaze and the gaze redirection is turned off beyond this angle. Float value from 10 to 35 (default 30).

- `head_pitch_threshold_low` - (FP32) Head pose pitch threshold (degrees) of the estimated head pose at which redirection starts transitioning away from camera and toward the estimated gaze. Float value from 10 to 35 (default 20).

- `head_pitch_threshold_high` - (FP32) Head pose pitch threshold (degrees) of the estimated head pose at which redirection equals the estimated gaze and redirection is turned off beyond this angle. Float value from 10 to 35 (default 25).

- `head_yaw_threshold_low` - (FP32) Head pose yaw threshold (degrees) at which the redirection starts transitioning away from camera toward estimated gaze. Float value from 10 to 35 (default 25).

- `head_yaw_threshold_high` - (FP32) Head pose yaw threshold (degrees) of the estimated head pose at which redirection equals the estimated gaze and redirection is turned off beyond this angle. Float value from 10 to 35 (default 30).

More details on customizing Eye Contact behavior based on your specific use case can found in the [Advanced Usage Section](https://docs.nvidia.com/nim/maxine/eye-contact/latest/advanced-usage.html)
