# Video Clip Sampling
This notebook seeks to explore the *video clip sampling* strategy avaiable in the `frame_sampling` package.

## Setup
First we need to get everything setup ...

In [None]:
# check for colab
if "google.colab" in str(get_ipython()):
  # install colab dependencies
  !pip install tqdm git+https://github.com/DiogenesAnalytics/frame-sampling

## Get Data
Need to set up a few things in order to have data for the demo. We will download *10 videos* from the [WebVid-10M dataset](https://maxbain.com/webvid-dataset/) *validation subset*.

In [None]:
## load necessary libs
import pathlib
from urllib.request import urlretrieve
import pandas as pd
from tqdm.auto import tqdm

# get csv data from WebVid dataset: https://maxbain.com/webvid-dataset/
webvid_csv = pd.read_csv("http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_val.csv")

# videos for testing purposes
DWNLD_LNKS = webvid_csv["contentUrl"].to_list()[:10]

# set path to data
WEBVID_VAL_DATA = pathlib.Path("/usr/local/src/frame-sampling/tests/data/video/demo")

# create directory
WEBVID_VAL_DATA.mkdir(parents=True, exist_ok=True)

# notify of downloading
print("Downloading videos ...")

# download test videos
for video_url in tqdm(DWNLD_LNKS):
    # get video file name
    vid_file_name = pathlib.Path(video_url).name
    
    # create new output path for video file
    video_output_path = WEBVID_VAL_DATA / vid_file_name
    
    # download to demo_data path
    if not video_output_path.exists():
        _ = urlretrieve(video_url, video_output_path)

## Frame Sampling
Now we are finally ready to start using the `VideoClipSampler` class to *sample frames* from the WebVid-10M video dataset. Here we will implement a function to get *frames* that have an *entropy* above a certain threshold.

In [None]:
# get necessary libs
import secrets
from PIL.Image import Image
from frame_sampling.dataset import VideoDataset
from frame_sampling.strategy import VideoClipSampler

# get instance object
data = VideoDataset(WEBVID_VAL_DATA)

# set frame sample output directory
FRAME_SAMPLE_OUTPUT = pathlib.Path(f"./frame_samples/{secrets.token_hex(8)}")

# set chosen frame sample interval
FRAME_SAMPLE_INTERVAL = 30

# custom function
def calculate_entropy_threshold(image: Image) -> bool:
    """Check PIL image entropy is above threshold."""
    # convert the image to grayscale
    image_gray = image.convert('L')

    # calculate entropy
    entropy_value = image_gray.entropy()

    # check threshold
    return entropy_value > 7.0

# get instance
frame_sampler = VideoClipSampler(FRAME_SAMPLE_INTERVAL, calculate_entropy_threshold) 

# ... now sample
frame_sampler.sample(data, FRAME_SAMPLE_OUTPUT)