## Demo
Here is a demo of VeriLight's capabilities. When starting this notebook, ensure you select the ```verilight``` kernel, corresponding to the conda environment created according to the Requirements and Installation section of the README.

NOTE: Depending on your editor, the videos may not play with audio. To view the videos with audio, you can directly open their corresponding files in a video viewer.

© 2025 The Trustees of Columbia University in the City of New York.  
This work may be reproduced, distributed, and otherwise exploited for academic non-commercial purposes only.  
To obtain a license to use this work for commercial purposes, 
please contact Columbia Technology Ventures at techventures@columbia.edu.

In [None]:
from IPython.display import HTML
from base64 import b64encode

# embed video with base64 encoding
def embed_video(video_path):
    video = open(video_path, "rb").read()
    video_encoded = b64encode(video).decode()
    return HTML(f"""
        <p>Depending on your editor, the videos may not play with audio.<br>To view the videos with audio, you can directly open their corresponding files in a video viewer.</p>
        <video width="960" height="540" controls>
            <source src="data:video/mp4;base64,{video_encoded}" type="video/mp4">
            Your browser does not support the video tag.
        </video>
    """)

### Core unit deployed at a speech
First, the VeriLight core unit is deployed at the speech site. The speaker reads aloud our paper abstract. For each 4.5 second window of speech, the core unit extracts two visual feature vectors:
1) An identity feature vector, used to verify the speaker identity in a published video to protect against identity swap falsifications.
2. A dynamic feature vector, used to protect against falsifications of delivered content by ensuring that a speaker’s face and lip motion have not been modified.

The feature vectors are compressed using locality-sensitive hashing, appended with additional provenance metadata (e.g., date and time), and cryptographically-secured to form a speech video **signature**.

The signature data is embedded into the scene via our adaptive embedding algorithm, which continuosly adjusts the projected modulated light to blend into the scene. For demonstration purposes, we configure the projected light to start off as a visible green. Once the adaptive embedding algorithm performs its first iteration, the modulated light becomes imperceptible.

In [None]:
embed_video("assets/deployment_demo.mp4")


### Verify a real video
An authentic recording from this speech event is provided at ```data/authentic.mp4```. Input it to the verification software to confirm its authenticity. Pass the `-vis` flag to produce a video visualizing the results. The script will take ~8 minutes to complete (~5 minutes for verification and ~3 minutes for generation of the visualization video).

In [None]:
!python verify.py data/authentic.mp4 authentic_verification -vis

The verification results (shown below alongside the expected visualization video) confirm the video has not been tampered with. You can find your visualization video as well as intermediate outputs from the verification process in the newly created ```authentic_verification``` folder. As a reference for the expected content of this folder, see [```data/expected_outputs```](data/expected_outputs/).

In [None]:
embed_video("authentic_verification/visualization.mp4") # view the visualization video created by running verify.py above

Suppose an attacker produces a **lipsync deepfake,** like ```data/lipsync.mp4```, which portrays the speaker delivering a completely different speech. Running this falsified video through the verification software, we can see it is flagged for containing manipulated facial and lip motions, indicating that the speech content has been altered.

In [None]:
!python verify.py data/lipsync.mp4 lipsync_verification -vis

In [None]:
embed_video("lipsync_verification/visualization.mp4") # view the visualization video created by running verify.py above

An attacker might **change only a few key words of a speech**. ```data/pinpointed_lipsync.mp4``` shows the speaker calling VeriLight a "high-overhead and obtrusive system" rather than a "low-overhead and unobtrusive system," with the remaining portions of the video left intact. This pinpointed falsification is made at window 2 of the video, which VeriLight correctly identifies. Additionally, we can see that the minor change *cascades* throughout the rest of the video. Because the replaced phrase is slightly shorter than its original version, all subsequent frames in the video are shifted; this has the effect of modifying content in subsequent windows as well. VeriLight is intentionally designed with this sensitivity to the synchronization of embedded signatures and portrayed content. This enables it to detect a variety of temporal edits (e.g., clip splicing, speed modifications, lipsyncs, etc.)

In [None]:
!python verify.py data/pinpointed_lipsync.mp4 pinpointed_lipsync_verification -vis

In [None]:
embed_video("pinpointed_lipsync_verification/visualization.mp4") # view the visualization video created by running veriy.py above

Let's also take a look at an **identity swap deepfake**. In ```data/identityswap.mp4```, the speaker's delivered content (and thus their lip/facial motion) remains the same, but their identity has been swapped. This is reflected in the verification output.

In [None]:
!python verify.py data/identityswap.mp4 identityswap_verification -vis

In [None]:
embed_video("identityswap_verification/visualization.mp4") # view the visualization video created by running verify.py above