# 7 Visualise activity in a video.

We have extracted all the features we plan to use. Overlaying these on the video was useful.
But watching annotated videos is inefficient and not always informative.. 

To help with understanding we build a few tools that let's see at a glance what happens over time.

In [1]:
import os
import utils
import calcs
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ultralytics

In [2]:
videos_in = os.path.join("..","LookitLaughter.test")
demo_data = os.path.join("..","data", "demo")
temp_out = os.path.join("..","data","0_temp")
data_out = os.path.join("..","data","1_interim")
videos_out = os.path.join("..","data","2_final")

metadata_file = "_LookitLaughter.xlsx"

processedvideos = utils.getProcessedVideos(data_out)
processedvideos.head()

Found existing processedvideos.xlsx with 54 rows.


Unnamed: 0,VideoID,ChildID,JokeType,Joke.Label,JokeNum,JokeRep,JokeTake,HowFunny,LaughYesNo,Frames,...,Objects.file,Objects.when,Understand.file,Understand.when,Faces.normed,Keypoints.normed,annotatedVideo,annotated.when,Diary.file,Diary.when
0,2UWdXP.joke1.rep2.take1.Peekaboo.mp4,2UWdXP,Peekaboo,2,1,2,1,Slightly funny,No,217,...,,,,,../data/1_interim/2UWdXP.joke1.rep2.take1.Peek...,../data/1_interim/2UWdXP.joke1.rep2.take1.Peek...,../data/2_final/2UWdXP.joke1.rep2.take1.Peekab...,2024-02-16 11:03:50,..\data\1_interim\2UWdXP.joke1.rep2.take1.Peek...,2024-09-11 10:13:58
1,2UWdXP.joke1.rep3.take1.Peekaboo.mp4,2UWdXP,Peekaboo,2,1,3,1,Slightly funny,No,152,...,,,,,../data/1_interim/2UWdXP.joke1.rep3.take1.Peek...,../data/1_interim/2UWdXP.joke1.rep3.take1.Peek...,../data/2_final/2UWdXP.joke1.rep3.take1.Peekab...,2024-02-16 11:03:51,..\data\1_interim\2UWdXP.joke1.rep3.take1.Peek...,2024-09-11 10:13:59
2,2UWdXP.joke2.rep1.take1.NomNomNom.mp4,2UWdXP,NomNomNom,1,2,1,1,Funny,No,95,...,,,,,../data/1_interim/2UWdXP.joke2.rep1.take1.NomN...,../data/1_interim/2UWdXP.joke2.rep1.take1.NomN...,../data/2_final/2UWdXP.joke2.rep1.take1.NomNom...,2024-02-16 11:03:52,..\data\1_interim\2UWdXP.joke2.rep1.take1.NomN...,2024-09-11 10:14:00
3,2UWdXP.joke2.rep2.take1.NomNomNom.mp4,2UWdXP,NomNomNom,1,2,2,1,Slightly funny,No,97,...,,,,,../data/1_interim/2UWdXP.joke2.rep2.take1.NomN...,../data/1_interim/2UWdXP.joke2.rep2.take1.NomN...,../data/2_final/2UWdXP.joke2.rep2.take1.NomNom...,2024-02-16 11:03:53,..\data\1_interim\2UWdXP.joke2.rep2.take1.NomN...,2024-09-11 10:14:01
4,2UWdXP.joke2.rep3.take1.NomNomNom.mp4,2UWdXP,NomNomNom,1,2,3,1,Slightly funny,No,133,...,,,,,../data/1_interim/2UWdXP.joke2.rep3.take1.NomN...,../data/1_interim/2UWdXP.joke2.rep3.take1.NomN...,../data/2_final/2UWdXP.joke2.rep3.take1.NomNom...,2024-02-16 11:03:54,..\data\1_interim\2UWdXP.joke2.rep3.take1.NomN...,2024-09-11 10:14:02


In [3]:
#a couple of files for testing
VIDEO_FILE  = os.path.join(videos_in, "2UWdXP.joke1.rep2.take1.Peekaboo.mp4")
VIDEO_FILE2 = os.path.join(videos_in, "2UWdXP.joke2.rep1.take1.NomNomNom.mp4")
AUDIO_FILE = os.path.join(data_out, "2UWdXP.joke1.rep2.take1.Peekaboo.wav")
SPEECH_FILE = os.path.join(data_out, "2UWdXP.joke1.rep2.take1.Peekaboo.json")

testset = [VIDEO_FILE, VIDEO_FILE2] 

# 7.1 Use Voxel51 and PytorchVideo for examining videos

Voxel51 seems to be a useful tool for looking at training data (and trained predictions).

Let's start with the minimal implementation. Just viewing videos.

https://docs.voxel51.com/user_guide/dataset_creation/index.html



In [4]:
import fiftyone as fo

### 7.1.1 Is dataset already created?

FiftyOne may aleady have a dataset created. Let's check. And reload it. 

In [5]:
datasets = fo.list_datasets()
if len(datasets) == 0:
    print("No datasets found. Load in step 7.1.2")
else:
    print("Loading saved datasets: ", datasets[0])
    dataset = fo.load_dataset(datasets[0])

Loading saved datasets:  LookitLaughter.test


### 7.1.2 Populate a FiftyOne dataset with our videos and labels.

Either there is no existing dataset or we want to rebuild it.

### Either  

In [6]:
fo.delete_datasets("*")

In [7]:
# Create a dataset from a directory of videos
dataset = fo.Dataset.from_videos_dir("../LookitLaughter.test")
dataset.ensure_frames()
dataset.compute_metadata()

dataset.name = 'LookitLaughter.test'


dataset.add_sample_field("JokeType", fo.StringField, description="What joke is being told?")
dataset.add_sample_field("HowFunny", fo.StringField, description="How funny is the joke?")
dataset.add_sample_field("LaughYesNo",  fo.BooleanField, description="Did the child laugh?")
dataset.add_sample_field("ChildSide",  fo.IntField, description="Is the child on left (-1) or right (1) of adult or on lap (0)?")


 100% |███████████████████| 54/54 [26.6ms elapsed, 0s remaining, 2.0K samples/s]   
Computing metadata...
 100% |███████████████████| 54/54 [459.5ms elapsed, 0s remaining, 117.5 samples/s]     


Now let's see if we can add our metadata classifications. Recalling that each video demos one joke type `[Peekaboo,TearingPaper,NomNomNom,ThatsNotAHat,ThatsNotACat]` and has rating of how funny the baby found it `[Not Funny, Slightly Funny, Funny, Extremely Funny]` and whether they laughed `[Yes, No]`.


In [8]:
# add the joke type, how funny and laugh yes/no for each sample in the dataset
for sample in dataset:
    #split the filepath to get the video name, system independent
    videoname = os.path.basename(sample.filepath)
    phrase = processedvideos[processedvideos["VideoID"]==videoname]
    if len(phrase) == 0:
        print(f"Video {videoname} not found in processed videos.")
        continue
    sample["VideoID"]  = phrase["VideoID"].values[0]
    sample["JokeType"]  = phrase["JokeType"].values[0]
    sample["HowFunny"]  = phrase["HowFunny"].values[0]
    sample["LaughYesNo"]  = (phrase["LaughYesNo"].values[0] == "Yes")
    sample.save()

In [9]:
def idx2person(idx):
    idx = int(idx)
    if idx == 0:
        return "Child"
    elif idx == 1:
        return "Adult"
    else:
        return "Unknown"

Let's add the frame by frame annotations directly onto the videos inside fiftyone

In [10]:
#Let's start with people bounding boxes

for sample in dataset:
    #retrieve people bounding boxes from the keypoints file
    keypoints = utils.readKeyPointsFromCSV(processedvideos,sample.filepath,normed= True)    

    for framenumber, frame in sample.frames.items():
        rows = keypoints[keypoints["frame"]==framenumber -1] #framenumbver is 1 based in fiftyone!!
        dets = []
        for index, row in rows.iterrows():
            person = idx2person(row["person"])    
            bbox = [row["bbox.x1"], row["bbox.y1"], row["bbox.x2"], row["bbox.y2"]]
            bbox51 = calcs.xyxy2ltwh(bbox)
            det = fo.Detection(label=person, bounding_box=bbox51)
            dets.append(det)
        frame["People"] = fo.Detections(detections=dets)
        sample.save()
        
dataset.save()

#### Add the speech as temporal annotations 

In [30]:
def framerange_from_timestamps(timestamps, fps, max_frames):
    start = max(int(timestamps[0]*fps)+1 ,1)
    end =  min(int(timestamps[1]*fps)+1, max_frames )
    return start, end


In [33]:
for sample in dataset:
    videoname = os.path.basename(sample.filepath)
    fps = sample.metadata["frame_rate"]
    max_frames = sample.metadata["total_frame_count"]
    print(fps)
    speechdata = utils.getSpeechData(processedvideos,videoname)
    if speechdata is None:
        print(f"Speech data not found for {videoname}")
        continue
    phrases = []
    for phrase in speechdata["segments"]:
        start, end = framerange_from_timestamps([phrase["start"],phrase["end"]], fps, max_frames)
        print (start, end)
        phrases.append(fo.TemporalDetection(label=phrase["text"],
                                        support=[start,end]))
        print(phrase["text"])
        
    sample["Speech"] = fo.TemporalDetections(detections=phrases)
    sample["Speech"] = phrases
    sample.save()

dataset.save()

14.29889298892989
We have a speech data file for 2UWdXP.joke1.rep2.take1.Peekaboo.mp4
1 58
 Hey, excuse me. Look.
58 101
 Ah, I can't handle this.
101 129
 I'm just going to put it on.
129 158
 You know, peek-a-boo!
172 186
 Hey.


ValidationError: Only lists and tuples may be used in a list field

In [23]:
sample = dataset.first()


dets =[
        fo.TemporalDetection(label="meeting", support=[10, 20]),
        fo.TemporalDetection(label="party", support=[30, 60]),
    ]

sample["events"] = fo.TemporalDetections(
    detections= dets
)

print(sample)

<Sample: {
    'id': '66e1c1ba6baa1cbe4fe48642',
    'media_type': 'video',
    'filepath': 'C:\\Users\\caspar\\OneDrive\\LegoGPI\\babyjokes\\LookitLaughter.test\\2UWdXP.joke1.rep2.take1.Peekaboo.mp4',
    'tags': [],
    'metadata': <VideoMetadata: {
        'size_bytes': 1209336,
        'mime_type': 'video/mp4',
        'frame_width': 640,
        'frame_height': 480,
        'frame_rate': 14.29889298892989,
        'total_frame_count': 217,
        'duration': 15.176,
        'encoding_str': 'avc1',
    }>,
    'JokeType': 'Peekaboo',
    'HowFunny': 'Slightly funny',
    'LaughYesNo': False,
    'ChildSide': None,
    'VideoID': '2UWdXP.joke1.rep2.take1.Peekaboo.mp4',
    'Speech': [
        <TemporalDetection: {
            'id': '66e1ca686baa1cbe4fe4cd55',
            'tags': [],
            'label': ' Hey, excuse me. Look.',
            'support': [1, 58],
            'confidence': None,
        }>,
        <TemporalDetection: {
            'id': '66e1ca686baa1cbe4fe4cd56',
   

In [24]:
sample.save()

In [None]:
for sample in dataset:
    videoname = os.path.basename(sample.filepath)
    speechdata = utils.getSpeechData(processedvideos,videoname)
    if speechdata is None:
        print(f"Speech data not found for {videoname}")
        continue
    
    subtitles = speechdata["segments"]
    # Create a list of text annotations
    text_annotations = [
        fo.Detection(
            text=sub["text"],
            start_time=sub["start"],
            end_time =sub["end"]
        )
        for sub in subtitles
    ]
    sample["subtitles"] = fo.Detections(detections=text_annotations)    
    sample.save()

14.299


## 7.2 View dataset in Voxel51 GUI

In [25]:
session = fo.launch_app(dataset)
# in docker launch fiftiy needs port
# session = fo.launch_app(dataset, address="0.0.0.0", port=5151)

In [None]:
print(session.selected)

In [None]:
#session.selected contains the indices of the dataset samples clicked on in the UI.
if len(session.selected) == 0:
    print("No samples selected. Click the checkbox in the top left of each video to select it.")
else:
    print(dataset[session.selected[0]])

# 7.2 Draw annotated timeline for a select video 

A group of visualisations to see what happens in a video. 

In each frame let's find the `centre of gravity` for each person (the average of all the high-confidence marker points). This is handy for time series visualisation. For example plotting the cog.x for each person over time shows how they move closer and further from each other. 

Let's get the keypoint data and calculate

In [8]:
emotionColors = {"angry":{"color":"red","arousal":0.9,"valence":-0.2},
                 "fear":{"color":"orange","arousal":0.2,"valence":-0.9},
                 "happy":{"color":"yellow","arousal":0.2,"valence":0.9},
                 "neutral":{"color":"grey","arousal":0,"valence":0},
                 "sad":{"color":"blue","arousal":-0.2,"valence":-0.9},
                 "surprise":{"color":"green","arousal":0.9,"valence":0.2},
                 "disgust":{"color":"purple","arousal":-0.7,"valence":-0.7}}
who = ["child", "adult"]

In [None]:
plotCoGrav = True
plotStDev = True
plotSpeech = True
plotEmotions = True

#numerical sum of boolean flags
subplots = sum([plotCoGrav, plotStDev, plotSpeech, plotEmotions])

if len(session.selected) == 0:
    print("No video selected")
    exit()

VideoID = dataset[session.selected[0]]["VideoID"]
keypoints = utils.readKeyPointsFromCSV(processedvideos,VideoID)
FPS = utils.getVideoProperty(processedvideos, VideoID, "FPS")
xmax = keypoints["frame"].max()
#this bit of pandas magic calculates average x and y for all the rows.
keypoints[["cogx","cogy"]] = keypoints.apply(lambda row: calcs.rowcogs(row.iloc[8:59]), axis=1, result_type='expand')
keypoints[["stdx","stdy"]] = keypoints.apply(lambda row: calcs.rowstds(row.iloc[8:59]), axis=1, result_type='expand')

#going to add a subplot foe each of the above flags
plt.figure(figsize=(20, 5*subplots))
plt.suptitle("Video Time Line Plots")
pltidx = 0
if plotCoGrav:
    ax = plt.subplot(subplots, 1, pltidx + 1)
    pltidx += 1
    ax.set_xlabel("Time (seconds)")
    ax.set_ylabel("Horizontal Position")
    ax.set_xlim(0, xmax/FPS)
    child = keypoints[keypoints["person"]=="child"]
    adult = keypoints[keypoints["person"]=="adult"]
    #a plot of child's centre of gravity frame by frame
    childplot = ax.plot(child["frame"], child["cogx"], c="red", alpha=0.5)
    ## add line of adult's centre of gravity
    adultplot = ax.plot(adult["frame"], adult["cogx"], c="blue", alpha=0.5)
    #add legend
    ax.legend(['child', 'adult'], loc='upper left')

if plotStDev:
    ax = plt.subplot(subplots, 1, pltidx + 1)
    pltidx += 1
    ax.set_xlabel("Time (seconds)")
    ax.set_ylabel("Horizontal Position")
    ax.set_xlim(0, xmax/FPS)
    child = keypoints[keypoints["person"]=="child"]
    adult = keypoints[keypoints["person"]=="adult"]
    #a plot of child's centre of gravity frame by frame
    childplot = ax.plot(child["frame"], child["stdx"], c="red", alpha=0.5)
    ## add line of adult's centre of gravity
    adultplot = ax.plot(adult["frame"], adult["stdx"], c="blue", alpha=0.5)
    #add legend
    ax.legend(['child', 'adult'], loc='upper left')

if plotSpeech:
    ax2 = plt.subplot(subplots, 1, pltidx + 1)
    pltidx += 1
    ax2.set_xlabel("Time (seconds)")
    ax2.set_ylabel("Identified Speech")
    speechjson = utils.getSpeechData(processedvideos,VideoID)
    if speechjson is not None:
        nsegs = len(speechjson["segments"])
        ax2.set_xlim(0, xmax/FPS)
        ax2.set_ylim(0, nsegs)
        #let's plot the speech segments as boxes
        #label each one with the text
        for idx, seg in enumerate(speechjson["segments"]):
            # #rectangle with the start and end times as x coordinates and nsegs - idx as y coordinates
            #fill the rectangle
            ax2.fill([seg["start"], seg["end"], seg["end"], seg["start"]], [nsegs - idx - 1, nsegs - idx - 1, nsegs - idx, nsegs - idx], 'r', alpha=0.5)
            ax2.text(seg["start"], nsegs- idx -.5 , seg["text"])

if plotEmotions:
    ax3 = plt.subplot(subplots, 1, pltidx + 1)
    pltidx += 1
    ax3.set_xlabel("Time (seconds)")
    ax3.set_ylim(0, 2)
    ax3.set_xlim(0, xmax/FPS)  
    emotions = utils.getFaceData(processedvideos,VideoID)
    emotions["ticker"] = 1
    for index in range(2):
        ems = emotions[emotions["index"]==index]
        #who is the person we are plotting
        # key gives the emotion name, data gives the actual values (also labels)
        for key, data in ems.groupby('emotion'):
            #plot scatter plot of emotion occurances
            ax3.scatter(data["frame"], data["ticker"] + index, label=key, c=emotionColors[key]["color"], alpha=0.5, s=100)

        
    #show legend with emotion colours
    plt.legend(loc='best')



plt.show()




Let's plot the captions.
Go through the speechjson. For each speech segment add a horizotal line with the text. Start and End times from the speechjson.

Now let's do a timeline for the emotions of the participants.
We'll experiment to find best visualisation. 
Note this assumes that faces are correctly assigned to correct indviduals. 
TODO - Code that uses bounding boxes to assign faces to individuals.

First we will try a 'scatter' graph. Color coded for each emotion. 