Thank you for sharing such awesome work! But how to achieve the [demo of Active Speaker Detection](https://www.youtube.com/watch?v=A5tmRjpxHvA&feature=emb_logo) as the [Project page](https://www.robots.ox.ac.uk/~vgg/research/avobjects/) shows? I noticed the function [viz_boxes_with_scores](https://github.com/afourast/avobjects/blob/4a9d0d5af373d682be29487e68b9233809552e08/viz_utils.py#L121) but I didn't find the corresponding scores computing function.