# 0 Demonstrating the main function calls we use during feature extraction 

A useful set of simple examples to show how to call the models and parse the data they return.

## 0.1 Demo data

Where will we find videos, images and audio for our examples? Two videos, the associate audio files and a set of images are available in the `data\demo` directory.

In [None]:
import os

demo_data = r"..\data\demo"

#a couple of videos for testing
VIDEO_FILE  = os.path.join(demo_data, "2UWdXP.joke1.rep2.take1.Peekaboo.mp4")
VIDEO_FILE2 = os.path.join(demo_data, "2UWdXP.joke2.rep1.take1.NomNomNom.mp4")

AUDIO_FILE = os.path.join(demo_data, "2UWdXP.joke1.rep2.take1.Peekaboo.mp3")
AUDIO_FILE2 = os.path.join(demo_data, "2UWdXP.joke2.rep1.take1.NomNomNom.mp3")

IMAGE1 = os.path.join(demo_data, "mother-and-baby.jpg")
IMAGE2 = os.path.join(demo_data, "peekaboo.png")
IMAGE3 = os.path.join(demo_data, "twopeople.jpg")

videoset = [VIDEO_FILE, VIDEO_FILE2] 
audioset = [AUDIO_FILE, AUDIO_FILE2] 
photoset = [IMAGE1, IMAGE2, IMAGE3]


## 0.1 YOLOv8

Go to [docs.ultralytics.com](https://docs.ultralytics.com/) for detailed documentation and lots of examples. We just demo a few here.


In [None]:
import cv2
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from ultralytics import YOLO
import utils   # local utils.py contains some helper functions
import display # local display.py contains display helper functions

### 0.1.1 Pose estimation

In [None]:
#use a yolo model with pose estimation
model = YOLO('yolov8n-pose.pt')

#the results will contain object detection and pose estimation data.
results = model(IMAGE3)
print(results)

In [None]:
#automatically display image overlayed with keypoints, skeleton and bounding boxes
labelledimage = results[0].plot()
plt.imshow(labelledimage)
plt.show()

#get the keypoints as a numpy arrays of x,y coordinates each with a confidence score.
#note yolo returns tensors so we have to convert to numpy
keypoints = results[0].keypoints.cpu().numpy()
print(keypoints.xy)
print(keypoints.conf)
print(keypoints.data)

In [None]:
#yolo returns keypoints as a 3 x 17 tensor of x,y,confidence, we typically flatten it to a 51 element list to store in dataframes
xyc = keypoints.data[0].flatten().tolist()
print(xyc)

# 0.1.2 YOLOv8 video -> keypoints dataframe

If we pass model a video rather than image, the results object must be iterated over to get the results for each frame.

We extract movement and save it to dataframe using our own helper functions: 

* `utils.createkeypointsdf` - initialise an empty keypoints dataframe
* `utils.addkeypointstodf` - adds keypoints to dataframe
* `utils.videotodf` - extracts keypoints from video and saves to dataframe



In [None]:
results = model(VIDEO_FILE, stream=True)
df = utils.createkeypointsdf()
frame = 0
for r in results:
    #print(torch.flatten(r.keypoints.xy[0]).tolist())
    df = utils.addkeypointstodf(df,frame,r.boxes.xywh,r.boxes.conf,r.keypoints.data)  
    frame += 1

print(f"Video {VIDEO_FILE} has {frame} frames and {len(df)} rows of data")
df.head()

Our keypoints dataframe has the following structure

![keypoints dataframe](../docs/keypointsdf.png)

For each video `frame`, we have one row person `person` and `index`. The next five columns describe the bounding box for that person marked by it's top left `(x1,y1)` and bottom right `(x2,y2)` corners. This is followed 51 columns representing 17 COCO pose points each labelled with `(x,y)` coordinate and a confidence `c` between (0,1).

In [None]:
df = utils.videotokeypoints(model, VIDEO_FILE, track=False)
df.head()

In [None]:
stemname = os.path.splitext(VIDEO_FILE)[0]
csvpath = stemname + ".csv"
df.to_csv(csvpath, index=False)

df = pd.read_csv(csvpath, index_col=None)
df.head()

## Displaying data 

Two functions help display keypoint and other data overlayed on frame. 

* `utils.getframekpts` takes keypoints dataframe and framenumber and returns list of all bounding boxes, their labels and corresponding keypoints.
* `display.drawOneFrame` takes thes outputs and draws them on the frame.

In [None]:
framenumber = 34
bboxlabels, bboxes, xycs = utils.getframekpts(df, framenumber)

print(bboxlabels)
print(bboxes)
print(xycs)

video = cv2.VideoCapture(VIDEO_FILE)
video.set(cv2.CAP_PROP_POS_FRAMES, framenumber)
success, image = video.read()
video.release()

image = display.drawOneFrame(image, bboxlabels, bboxes, xycs)

plt.imshow(image)

## 0.1.2 model.track()

YoloV8 also comes with a `model.track` method. This aims to keep track of all identified people (and other objects?) over the course of a video. 

This is pretty easy instead of calling 
`results = model(video_path, stream=True)`

we can call
`results = model.track(video_path, stream=True)`

https://docs.ultralytics.com/modes/track/#persisting-tracks-loop

Here's an inline example of it working..

In [None]:
# Open the video file
video_path = VIDEO_FILE
cap = cv2.VideoCapture(video_path)

# Loop through the video frames
while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

    if success:
        # Run YOLOv8 tracking on the frame, persisting tracks between frames
        results = model.track(frame, persist=True)

        # Visualize the results on the frame
        annotated_frame = results[0].plot()

        # Display the annotated frame
        cv2.imshow("YOLOv8 Tracking", annotated_frame)

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()

# 0.2 Extracting Speech

We extract the audio and then use off the shelf speech recognition to extract the text.

### 0.2.1 Extracting audio with moviepy

MoviePy is basic movie editing tool that wraps ffmpeg and allows us to extract audio from video.


In [None]:
import moviepy.editor as mp

video_path = VIDEO_FILE
output_ext = "mp3"
output_ext = "wav"

filename = os.path.splitext(video_path)[0]
clip = mp.VideoFileClip( video_path)
audio_file = os.path.join( f"{filename}.{output_ext}")
clip.audio.write_audiofile(audio_file)
clip.close()


In [None]:
#playback the audio file
from IPython.display import Audio
Audio(audio_file)

# 0.7 visualising data over time

some of the calc


In [1]:
#function that calculates the average x and y coordinates of a set of keypoints (where confidence score is above a threshold)
xycs = np.array([[1,2,0.9],
                 [2,3,0.8],
                 [3,4,0.7],
                 [4,5,0.6],
                 [5,6,0.5],
                 [6,7,0.4],
                 [7,8,0.3],
                 [8,9,0.2],
                 [9,10,0.1]])

avgx, avgy = calcs.avgxys(xycs, threshold = 0.5)

print(avgx, avgy)

NameError: name 'np' is not defined

: 