# Estimating workload and task duration in a Machine Learning project

... is very difficult. Let's say we have a clear task at hand that can be achieved using Machine Learning.
##### Data engineering:
- is the data ready at-hand, or does it need to be transported/extracted from external sources? Do we have to wait for it?
- on-premises: how much processing power do we have? Cloud: how much money can we spend?
- do we need any changes to the infrastructure?
- how much effort and costs will be necessary to deploy the solution? What's the end goal - a python script saving the results to a data lake, or a web service with an API and thousands of users?
##### Data quality:
- do we have enough data to achieve the desired performance?
- is manual labeling necessary?
- can we train on the original data, or does it have to be anonymized?
##### Team members:
- what knowledge and experience do we have as a team?
- lack of experience = higher chances of an inaccurate estimation
##### Machine learning problem:
- do we know what's the level of performance we should be able to get?
- are we going to use a single model, or compare the performance of different models?
- how difficult will it be to develop the code, given our architecture? What are the bottlenecks? (The data could be hosted in a relational database. Is could be possible to load all of the data into RAM. Can we run something in parallell?)
##### Levels of complexity:
1. The task is extremely typical. An off-the-shelf pre-trained model can be used
2. It's possible to transform the data (select predictors, run metrics) so an off-the-shelf pre-trained model can be used
3. We can train/fine-tune an off-the-shelf model
4. The task is not typical OR we want to create a proprietary model. No existing solution can be applied. A brand new model will be designed
##### Spike:
- we don't have the knowledge to estimate some of the bullet points above. How much time do we need to do that?
- what's the risk that the project turns out to be worthless along the way?

After the estimation, a business perspective should be considered:
- what's the value that the model generates? Is it even useful?
- given the data we have, can we do something else?

Suprisingly often, the client just has some money to spend on a machine learning model, and the goal of such a model can be up for negotiation.

<img src="plots/estimating_workload/frame140760.jpg" alt="drawing" width="1500"/>

# Let's say we want to detect shot types in movies

<img src="plots/estimating_workload/types_of_shots.webp" alt="drawing" width="1000"/>

We have an annotated dataset of four films, one frame per second.

How should we approach this problem?

Certainly not by learning a CNN from scratch. Let's try and see why.

In [4]:
DRY_RUN = True  # run without data, show example results from files

In [5]:
import os
import cv2
import PIL
import bokeh, bokeh.plotting, bokeh.palettes, bokeh.transform
import ipywidgets
import IPython.display
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision, torchvision.transforms, torchvision.io, torch.utils.data, torchvision.datasets, torchvision.models
import time

from facenet_pytorch import MTCNN


In [6]:
bokeh.plotting.output_notebook()
bokeh.io.output_file('plots/estimating_workload/temp_bokeh.html')

Let's ignore the case when there are no people in the shot for now

In [7]:
int_to_shot = ['Closeup', 'Medium closeup', 'Medium shot', 'Cowboy shot', 'Full shot', 'Long shot']
shot_to_int = {int_to_shot[i]: i for i in range(len(int_to_shot))}

Show class distribution

In [8]:
def plot_class_distributions(class_labels, title):
    labels, counts = np.unique(class_labels, return_counts=True)
    X = {l: c for l, c in sorted(zip(labels, counts), key=lambda x: shot_to_int[x[0]])}

    data = pd.Series(X).reset_index(name='value').rename(columns={'index': 'shottype'})
    data['angle'] = data['value']/data['value'].sum() * 2 * 3.14159
    data['color'] = bokeh.palettes.Category20[len(X.keys())]

    p = bokeh.plotting.figure(
        height=350,
        title=title,
        toolbar_location=None,
        tools="hover",
        tooltips="@shottype: @value",
        x_range=(-0.5, 1.0)
        )

    p.wedge(x=0, y=1, radius=0.4,
            start_angle=bokeh.transform.cumsum('angle', include_zero=True), end_angle=bokeh.transform.cumsum('angle'),
            line_color="white", fill_color='color', legend_field='shottype', source=data)

    p.axis.axis_label = None
    p.axis.visible = False
    p.grid.grid_line_color = None
    return p

if DRY_RUN:
    gridplot = open('./plots/estimating_workload/class_distribution.html', 'r').read()
    display(IPython.display.HTML(gridplot))
else:
    datasources = [
        ('DMCpredictions.csv', 'Drive my car'),
        ('OUATIHpredictions.csv', 'Once Upon A Time ... In Hollywood'),
        ('BCATSKpredictions.csv', 'Butch Cassidy And The Sundance Kid'),
        ('BRpredictions.csv', 'Blade Runner (1982)'),
    ]
    figures = []
    for filename, title in datasources:
        df = pd.read_csv(filename)
        labels = df[df.iloc[:, 1].isin(int_to_shot)].iloc[:, 1]
        figures.append(plot_class_distributions(labels, title))
    gridplot = bokeh.layouts.gridplot([[figures[0], figures[1]], [figures[2], figures[3]]])
    bokeh.plotting.save(gridplot)
    gridplot = open('./plots/estimating_workload/temp_bokeh.html', 'r').read()
    display(IPython.display.HTML(gridplot))

Create a balanced dataset for each film

In [9]:
class CustomShotTypesBalancedDataset(torch.utils.data.Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_labels = self.img_labels[self.img_labels.iloc[:, 1].isin(int_to_shot)]
        smallest_class_count = min(np.unique(self.img_labels.iloc[:, 1], return_counts=True)[1])
        self.img_labels = self.img_labels.sample(frac=1).groupby(self.img_labels.columns[1]).apply(lambda x: x.sample(n=min(smallest_class_count, len(x))))
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform
        self.shot_to_int = shot_to_int

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, f'frame{self.img_labels.iloc[idx, 0]}.jpg')
        image = PIL.Image.open(img_path)
        label = self.shot_to_int[self.img_labels.iloc[idx, 1]]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label


Initialize datasets and dataloaders

In [10]:
if DRY_RUN:
    print(f'7092 training samples')
    print(f'552 validation samples')
else:
    transform = torchvision.transforms.Compose([
        torchvision.transforms.Resize((64, 64)),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])
    batch_size = 4
    dataset_dmc = CustomShotTypesBalancedDataset('DMCpredictions.csv', 'C:\\Projects\\iml\\data\\films\\frames\\DMC', transform=transform)
    dataset_ouatih = CustomShotTypesBalancedDataset('OUATIHpredictions.csv', 'C:\\Projects\\iml\\data\\films\\frames\\OUATIH', transform=transform)
    dataset_bcatsk = CustomShotTypesBalancedDataset('BCATSKpredictions.csv', 'C:\\Projects\\iml\\data\\films\\frames\\BCATSK', transform=transform)
    dataset_br = CustomShotTypesBalancedDataset('BRpredictions.csv', 'C:\\Projects\\iml\\data\\films\\frames\\BR', transform=transform)

    trainset = torch.utils.data.ConcatDataset([dataset_dmc, dataset_ouatih, dataset_bcatsk])
    valset = dataset_br

    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)
    valloader = torch.utils.data.DataLoader(valset, batch_size=batch_size, shuffle=True)

    dataloaders = {'train': trainloader, 'val': valloader}
    dataset_sizes = {'train': trainset.__len__(), 'val': valset.__len__()}

    print(f'{dataset_sizes["train"]} training samples')
    print(f'{dataset_sizes["val"]} validation samples')

7092 training samples
552 validation samples


Visualize a few samples

In [26]:
def imshow(img):
    img = img / 2 + 0.5
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

if DRY_RUN:
    display(IPython.display.HTML(
        '<img src="./plots/estimating_workload/dataloaders_1.png" style="width:50%">'
    ))
    print(' '.join(['Medium shot', 'Medium closeup', 'Medium closeup', 'Medium shot']))
else:
    dataiter = iter(trainloader)
    images, labels = next(dataiter)
    imshow(torchvision.utils.make_grid(images))
    print(' '.join(f'{int_to_shot[labels[j]]:5s}' for j in range(batch_size)))

Medium shot Medium closeup Medium closeup Medium shot


Define the network

In [27]:
# https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 13 * 13, 120) # ((64-4)/2-4)/2=13
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 7)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = SimpleCNN()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
lr_scheduler = torch.optim.lr_scheduler.ConstantLR(optimizer)


Define the training method

In [28]:
# https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

device = torch.device("cpu")

def train_cnn(model, criterion, optimizer, scheduler, num_epochs, best_model_params_path):
    history = {'train': [], 'val': []}
    since = time.time()
    best_model_params_path = 'best_model_params.pt'
    torch.save(model.state_dict(), best_model_params_path)
    best_acc = 0.0
    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)
        for phase in ['train', 'val']:
            running_loss = 0.0
            running_corrects = 0
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)
                optimizer.zero_grad()
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
            history[phase].append((epoch_loss, epoch_acc))

            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                torch.save(model.state_dict(), best_model_params_path)

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:4f}')

    model.load_state_dict(torch.load(best_model_params_path))
    return model, history

In [29]:
if not DRY_RUN:
    model, learning_history = train_cnn(model, criterion, optimizer, lr_scheduler, num_epochs=30, best_model_params_path='SimpleCNNBestParams.pt')

Visualize training history

In [30]:
# plotting function

def plot_learning_history(learning_history):
    train_loss_history = [loss for loss, acc in learning_history['train']]
    val_loss_history = [loss for loss, acc in learning_history['val']]
    train_acc_history = [acc for loss, acc in learning_history['train']]
    val_acc_history = [acc for loss, acc in learning_history['val']]

    fig1 = bokeh.plotting.figure(
        tools="pan,wheel_zoom,box_zoom,reset",
        width=500,
        height=500,
        x_axis_label='epoch',
        y_axis_label='loss',
        title='Custom CNN learning history: loss'
    )

    fig1.line(list(range(len(train_loss_history))), train_loss_history, line_width=3.5, line_color="red", alpha=1, legend_label='train')
    fig1.line(list(range(len(val_loss_history))), val_loss_history, line_width=3.5, line_color="blue", alpha=1, legend_label='val')


    fig2 = bokeh.plotting.figure(
        tools="pan,wheel_zoom,box_zoom,reset",
        width=500,
        height=500,
        x_axis_label='epoch',
        y_axis_label='accuracy',
        title='Custom CNN learning history: accuracy'
    )

    fig2.line(list(range(len(train_acc_history))), train_acc_history, line_width=3.5, line_color="red", alpha=1, legend_label='train')
    fig2.line(list(range(len(val_acc_history))), val_acc_history, line_width=3.5, line_color="blue", alpha=1, legend_label='val')

    gridplot = bokeh.layouts.gridplot([[fig1, fig2]])
    bokeh.plotting.save(gridplot)
    gridplot = open('./plots/estimating_workload/temp_bokeh.html', 'r').read()
    display(IPython.display.HTML(gridplot))

In [31]:
if DRY_RUN:
    display(IPython.display.HTML(
        '<img src="./plots/estimating_workload/learninghistory_1.png" style="width:70%">'
    ))
else:
    plot_learning_history(learning_history)

The model fails to generalize into the validation dataset

<div class="row">
  <div class="column">
    <img src="./plots/estimating_workload/frame21024.jpg" style="width:40%">
    <img src="./plots/estimating_workload/frame129528.jpg" style="width:40%">
  </div>
  <div class="column">
    <img src="./plots/estimating_workload/frame80808.jpg" style="width:40%">
    <img src="./plots/estimating_workload/frame66864.jpg" style="width:40%">
  </div>
  <div class="column">
    <img src="./plots/estimating_workload/frame64944.jpg" style="width:40%">
    <img src="./plots/estimating_workload/frame65616.jpg" style="width:40%">
  </div>
</div>


Let's try transfer learning instead

In [32]:
if DRY_RUN:
    print(f'7092 training samples')
    print(f'552 validation samples')
else:
    transform = torchvision.transforms.Compose([
        torchvision.transforms.Resize((224, 224)),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    batch_size = 4
    dataset_dmc = CustomShotTypesBalancedDataset('DMCpredictions.csv', 'C:\\Projects\\iml\\data\\films\\frames\\DMC', transform=transform)
    dataset_ouatih = CustomShotTypesBalancedDataset('OUATIHpredictions.csv', 'C:\\Projects\\iml\\data\\films\\frames\\OUATIH', transform=transform)
    dataset_bcatsk = CustomShotTypesBalancedDataset('BCATSKpredictions.csv', 'C:\\Projects\\iml\\data\\films\\frames\\BCATSK', transform=transform)
    dataset_br = CustomShotTypesBalancedDataset('BRpredictions.csv', 'C:\\Projects\\iml\\data\\films\\frames\\BR', transform=transform)

    trainset = torch.utils.data.ConcatDataset([dataset_dmc, dataset_ouatih, dataset_bcatsk])
    valset = dataset_br

    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)
    valloader = torch.utils.data.DataLoader(valset, batch_size=batch_size, shuffle=True)

    dataloaders = {'train': trainloader, 'val': valloader}
    dataset_sizes = {'train': trainset.__len__(), 'val': valset.__len__()}

    print(f'{dataset_sizes["train"]} training samples')
    print(f'{dataset_sizes["val"]} validation samples')

7092 training samples
552 validation samples


In [33]:

def imshow(inp, title=None):
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)

if DRY_RUN:
    display(IPython.display.HTML(
        '<img src="./plots/estimating_workload/dataloaders_2.png" style="width:50%">'
    ))
else:
    inputs, classes = next(iter(dataloaders['train']))
    out = torchvision.utils.make_grid(inputs)
    imshow(out, title=[int_to_shot[x] for x in classes])

Load a pretrained model, freeze the weights and add a single fully connected layer at the end

In [34]:
device = torch.device("cpu")

model = torchvision.models.resnet18(weights='IMAGENET1K_V1')
for param in model.parameters():
    param.requires_grad = False

num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, len(int_to_shot))

model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

In [35]:
if not DRY_RUN:
    model, learning_history = train_cnn(model, criterion, optimizer, lr_scheduler, num_epochs=30, best_model_params_path='TransferLearningBestParams.pt')

In [36]:
if DRY_RUN:
    display(IPython.display.HTML(
        '<img src="./plots/estimating_workload/learninghistory_2.png" style="width:70%">'
    ))
else:
    plot_learning_history(learning_history)

It's better, but still not any good

We have an insufficient dataset. Maybe there is a way to avoid training altogether?

Let's try to detect faces in frames using a pretrained model

In [37]:
def detect_faces(img):
    mtcnn = MTCNN(image_size=160, margin=0)
    detected_faces_coord, detected_faces_prob = mtcnn.detect(img)
    return detected_faces_coord, detected_faces_prob

We'll probably need a threshold for face detection. Let's visualize a few examples

In [38]:
good_samples_for_face_detection = [
    'plots/estimating_workload/face_detection_examples/BCATSK/frame24432.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame21384.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame27024.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame43104.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame46632.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame131904.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame137232.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame136416.jpg'
]

In [43]:
# some unimportant interactive plot code here

button_previous = ipywidgets.Button(
    description='Previous',
)
button_next = ipywidgets.Button(
    description='Next',
)
slider_threshold = ipywidgets.FloatSlider(
    min=0.0, max=1.0, step=0.05, description="threshold", continuous_update=False
)
selected_image_index = [0]
out = ipywidgets.Output()

@out.capture(clear_output=True, wait=True)
def plot_image_and_detected_faces(change):
    IPython.display.clear_output()
    out.clear_output()
    img = PIL.Image.open(good_samples_for_face_detection[selected_image_index[0]])
    detected_faces_coord, detected_faces_prob = detect_faces(img)
    img = cv2.imread(good_samples_for_face_detection[selected_image_index[0]])
    if detected_faces_coord is not None:
        for (x1, y1, x2, y2), prob in zip(detected_faces_coord, detected_faces_prob):
            if prob < slider_threshold.value:
                continue
            cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0))
            cv2.putText(img, f"{prob:.4f}", (int(x1), int(y1)), 0, 1, (0, 255, 0))

    cv2.imwrite('./plots/estimating_workload/cv2outputtemp.jpg', img)
    IPython.display.display(IPython.display.Image(filename='./plots/estimating_workload/cv2outputtemp.jpg'))


plot_image_and_detected_faces(None)

@button_previous.on_click
def plot_on_click(b):
    selected_image_index[0] -= 1
    if selected_image_index[0] == -1:
        selected_image_index[0] = len(good_samples_for_face_detection) - 1
    plot_image_and_detected_faces(None)

@button_next.on_click
def plot_on_click(b):
    selected_image_index[0] += 1
    if selected_image_index[0] == len(good_samples_for_face_detection):
        selected_image_index[0] = 0
    plot_image_and_detected_faces(None)

slider_threshold.observe(plot_image_and_detected_faces, names='value')

IPython.display.display(ipywidgets.VBox([ipywidgets.HBox(children=[button_previous, button_next]), slider_threshold, out]))


VBox(children=(HBox(children=(Button(description='Previous', style=ButtonStyle()), Button(description='Next', …

Threshold 0.85 seems good enough. Let's select the biggest face on the image and measure the amount of space below

We can use it to roughly estimate the type of the shot

In [40]:
THRESHOLD = 0.85

def predict_shot_type(img_path):
    img = PIL.Image.open(img_path)
    mtcnn = MTCNN(image_size=1000, margin=0)
    detected_faces_coord, detected_faces_prob = mtcnn.detect(img)

    if detected_faces_coord is None:
        return 'no face detected', None, None, None

    biggest_face_coord = None
    biggest_face_prob = None
    biggest_face_area = 0
    for (x1, y1, x2, y2), prob in zip(detected_faces_coord, detected_faces_prob):
        if prob < THRESHOLD:
            continue
        face_area = (x2 - x1) * (y2 - y1)
        if face_area > biggest_face_area:
            biggest_face_coord = (x1, y1, x2, y2)
            biggest_face_prob = prob
            biggest_face_area = face_area

    if biggest_face_coord is None:
        return 'no face detected', None, None, None

    (x1, y1, x2, y2) = biggest_face_coord
    space_below = img.size[1] - y2
    space_below_num_heads = float(space_below) / (y2 - y1)

    if space_below_num_heads < 1:
        predicted_shot_type = 'Closeup'
    elif space_below_num_heads < 2:
        predicted_shot_type = 'Medium closeup'
    elif space_below_num_heads < 4:
        predicted_shot_type = 'Medium shot'
    elif space_below_num_heads < 5:
        predicted_shot_type = 'Cowboy shot'
    elif space_below_num_heads < 7:
        predicted_shot_type = 'Full shot'
    else:
        predicted_shot_type = 'Long shot'
    return predicted_shot_type, (x1, y1, x2, y2), biggest_face_prob, space_below_num_heads


In [44]:
# some more plot code

random_frames = [
    'plots/estimating_workload/face_detection_examples/BCATSK/frame70368.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame67368.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame83136.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame123144.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame121416.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame144648.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame109752.jpg',
    'plots/estimating_workload/face_detection_examples/BCATSK/frame74328.jpg',

]

button_previous = ipywidgets.Button(
    description='Previous',
)
button_next = ipywidgets.Button(
    description='Next',
)
selected_image_index = [0]

out = ipywidgets.Output()

@out.capture(clear_output=True, wait=True)
def plot_shot_type_prediction(change):
    IPython.display.clear_output()
    out.clear_output()
    frame_filename = random_frames[selected_image_index[0]]
    predicted_shot_type, predicted_coord, prob, space_below_num_heads = predict_shot_type(
        frame_filename
    )
    img = cv2.imread(frame_filename)
    if predicted_shot_type != 'no face detected':
        (x1, y1, x2, y2) = predicted_coord
        cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0))
        cv2.putText(img, f"{prob:.4f}", (int(x1), int(y1)), 0, 1, (0, 255, 0))
        for head_space_index in range(int(space_below_num_heads)):
            cv2.rectangle(img, (int(x1), int(head_space_index*(y2-y1)+y2+10)), (int(x2), int((head_space_index+1)*(y2-y1)+y2-10)), (0, 0, 255))

    cv2.putText(img, predicted_shot_type, (20, 20), 0, 1, (0, 255, 0))
    cv2.imwrite('./plots/estimating_workload/cv2outputtemp.jpg', img)
    IPython.display.display(IPython.display.Image(filename='./plots/estimating_workload/cv2outputtemp.jpg'))

plot_shot_type_prediction(None)

@button_previous.on_click
def plot_on_click(b):
    selected_image_index[0] -= 1
    if selected_image_index[0] == -1:
        selected_image_index[0] = len(random_frames) - 1
    plot_shot_type_prediction(None)

@button_next.on_click
def plot_on_click(b):
    selected_image_index[0] += 1
    if selected_image_index[0] == len(random_frames):
        selected_image_index[0] = 0
    plot_shot_type_prediction(None)

IPython.display.display(ipywidgets.VBox([ipywidgets.HBox(children=[button_previous, button_next]), out]))

VBox(children=(HBox(children=(Button(description='Previous', style=ButtonStyle()), Button(description='Next', …

Let's try it on the validation set as well

In [45]:
# some more plot code

random_frames = [
    'plots/estimating_workload/face_detection_examples/BR/frame17688.jpg',
    'plots/estimating_workload/face_detection_examples/BR/frame74232.jpg',
    'plots/estimating_workload/face_detection_examples/BR/frame153984.jpg',
    'plots/estimating_workload/face_detection_examples/BR/frame126528.jpg',
    'plots/estimating_workload/face_detection_examples/BR/frame121224.jpg',
    'plots/estimating_workload/face_detection_examples/BR/frame53904.jpg',
    'plots/estimating_workload/face_detection_examples/BR/frame49704.jpg',
    'plots/estimating_workload/face_detection_examples/BR/frame52992.jpg',
    'plots/estimating_workload/face_detection_examples/BR/frame84888.jpg',
]

button_previous = ipywidgets.Button(
    description='Previous',
)
button_next = ipywidgets.Button(
    description='Next',
)
selected_image_index = [0]

out = ipywidgets.Output()

@out.capture(clear_output=True, wait=True)
def plot_shot_type_prediction(change):
    IPython.display.clear_output()
    out.clear_output()
    frame_filename = random_frames[selected_image_index[0]]
    predicted_shot_type, predicted_coord, prob, space_below_num_heads = predict_shot_type(
        frame_filename
    )
    img = cv2.imread(frame_filename)
    if predicted_shot_type != 'no face detected':
        (x1, y1, x2, y2) = predicted_coord
        cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0))
        cv2.putText(img, f"{prob:.4f}", (int(x1), int(y1)), 0, 1, (0, 255, 0))
        for head_space_index in range(int(space_below_num_heads)):
            cv2.rectangle(img, (int(x1), int(head_space_index*(y2-y1)+y2+10)), (int(x2), int((head_space_index+1)*(y2-y1)+y2-10)), (0, 0, 255))

    cv2.putText(img, predicted_shot_type, (20, 20), 0, 1, (0, 255, 0))
    cv2.imwrite('./plots/estimating_workload/cv2outputtemp.jpg', img)
    IPython.display.display(IPython.display.Image(filename='./plots/estimating_workload/cv2outputtemp.jpg'))

plot_shot_type_prediction(None)

@button_previous.on_click
def plot_on_click(b):
    selected_image_index[0] -= 1
    if selected_image_index[0] == -1:
        selected_image_index[0] = len(random_frames) - 1
    plot_shot_type_prediction(None)

@button_next.on_click
def plot_on_click(b):
    selected_image_index[0] += 1
    if selected_image_index[0] == len(random_frames):
        selected_image_index[0] = 0
    plot_shot_type_prediction(None)

IPython.display.display(ipywidgets.VBox([ipywidgets.HBox(children=[button_previous, button_next]), out]))

VBox(children=(HBox(children=(Button(description='Previous', style=ButtonStyle()), Button(description='Next', …

It's by no means a perfect solution, but it works better than everything we did so far

Now, we have two options - gather more data and try transfer learning once more, or use the manual solution. We can make it better by detecting poses instead of faces, adding a moving average to consecutive frames, rotating the images, etc.

We went through three levels of complexity mentioned earlier:


2. Transforming the data (selecting predictors, running metrics) so an off-the-shelf pre-trained model can be used <font color="green">Face recognition</font>

3. The data is typical, but the task is not. We will train/fine-tune an off-the-shelf model <font color="green">Transfer learning</font>

4. The task is not typical OR we want to create a proprietary model. No existing solution can be applied. A brand new model will be designed <font color="green">Custom CNN</font>
