---
title: "Auto Clipping"
author: "Ali Zaidi"
date: "2025-11-13"
categories: [Data Engineering]
description: "End to end clipping and saving of multiswing videos to their component clips"
format:
  html:
    code-fold: true
jupyter: python3
---

## Now that we have some functionality to find each swing in the video, lets clip a longer video

### To develop this functionality, we want to utilize:
    1) Our auto detection functions from the previous notebook to find each swing
    2) Core conditional logic to save and store these individual clips for future use

### With this in place, we can generate swing datasets efficiently for modeling + our code should be modular to make tweaks

In [1]:
#| include: false
from fastai.vision.all import *
from swing_detect import *
from swing_data import *
from video_utils import *
from labeler import *

In [2]:
#| include: false
base_path = '../../../data/full_videos'
swing_days = ['jun8', 'aug9', 'sep14']
files = get_files(f'{base_path}/{swing_days[-1]}', extensions='.MOV')

In [3]:
parent_dir = files[0].parent
parent_dir

Path('../../../data/full_videos/sep14')

In [4]:
#| code-fold: true
#| echo: false
fname = files[0].name.split('.')[0]
frames, fps = get_frames(files[0], 
                         per_second=False, # only grab every fps frame
                         #start_idx=600, # start 10 seconds in
                         #start_idx=1200, # start 20 seconds in
#                         num_frames=1500, # only pull down 25 seconds of video
                         #num_frames=None, # Pulls down all of the frames of video
                         num_frames=250, # only pull down 4ish seconds of video)
                         resize_dim=(256,256),
                         show_progress=True
                        )
save_frames(frames=frames, fps=fps, 
            parent_dir=f'{parent_dir}/{fname}',
            output_filename='full_clip.mp4')
print(f'The video {files[0].name} is in numpy array with shape: {frames.shape}')

100%|█████████████████████████████████████████████████████████████████████████████| 250/250 [00:00<00:00, 350.69it/s]


The video IMG_1090.MOV is in numpy array with shape: (250, 256, 256, 3)


In [5]:
#| code-fold: true
# 11:48 seconds for one video (3ish min) -- lets just run once so we don't have to keep dealing with this wait
process_label_video(f'{parent_dir}/{fname}/full_clip.mp4', 
                                   out_dir=f'{parent_dir}/{fname}/keypoints')

Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192-e32adcd4_20230314.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: backbone.cls_token



  _bootstrap._exec(spec, module)


Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth


  with torch.cuda.amp.autocast(enabled=False):
  with torch.cuda.amp.autocast(enabled=False):
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
249it [00:14, 17.77it/s]

11/14 16:38:40 - mmengine - [4m[37mINFO[0m - the output video has been saved at ../../../data/full_videos/sep14/IMG_1090/keypoints/full_clip.mp4


250it [00:14, 17.12it/s]


In [4]:
#| code-fold: true
kps = KpExtractor(f'keypoints/{fname}_full_clip.pkl').keypoint_data.kps
higher_idxs = find_all_higher_wrist_idxs(kps, conf_threshold=0.7)
frame_increment = 90 # add 90 frames before and after our first frame with both hands above in backswing"
highest_idxs = find_each_first_higher_wrist(higher_idxs, skip_frames=900) # 900 frames is 15 seconds
all_idx_bounds = get_all_idx_bounds(highest_idxs, frame_increment=frame_increment)
final_frames = np.vstack([frames[idxs[0]: idxs[1]] for idxs in all_idx_bounds])
print(f'the highest (first) indexes where both wrists are above elbow and shoulder are:\n{highest_idxs}')
print(f'the start end end indexes based on an increment of {frame_increment} is:\n{all_idx_bounds}')
print(f'Our final combined video has shape: {final_frames.shape}')

the highest (first) indexes where both wrists are above elbow and shoulder are:
[1750, 3273, 5298, 6901, 8773, 10945]
the start end end indexes based on an increment of 90 is:
[(1660, 1840), (3183, 3363), (5208, 5388), (6811, 6991), (8683, 8863), (10855, 11035)]
Our final combined video has shape: (1080, 256, 256, 3)


In [6]:
#| echo: false
save_frames(fname='final_frames.mp4', frames=final_frames)

### Each swing isolated, with about 1.5 seconds before and after the first point when the hands become visible over the shoulders in the backswing

{{< video final_frames.mp4 width="400" height="300" >}}

 - We can extend the number of frames to 120 (2 seconds) before and after if we want

In [10]:
#| echo: false
print(f'We have {len(all_idx_bounds)} total swings from our original video {fname}')

We have 6 total swings from our original video IMG_1090


### Some functions to walk through generated keypoints to find all swing start and end indexes
- 

In [18]:
#| code-fold: true
def save_idx_df(fname, all_idx_bounds, out_dir):
    start_idxs = [idxs[0] for idxs in all_idx_bounds]
    end_idxs = [idxs[1] for idxs in all_idx_bounds]
    swing_idxs = [x for x in range(len(all_idx_bounds))]
    df = pd.DataFrame([swing_idxs, start_idxs, end_idxs], 
                 index=['swing_idx', 'start_idx', 'end_idx']).T
    df.to_csv(f'{out_dir}/{fname}.csv', index=False)
    return df

In [17]:
#| code-fold: true
def get_swing_idx_df(kps_fpath,
                     fname,
                     out_dir,
                     conf_threshold=0.7, 
                     frame_increment=90, # add 1.5 seconds before and the found idx
                     skip_frames=900, # 900 frames is 15 seconds
                     # ^ skips frames between swings
                     ):
    kps = KpExtractor(kps_fpath).keypoint_data.kps
    higher_idxs = find_all_higher_wrist_idxs(kps, conf_threshold=conf_threshold)
    highest_idxs = find_each_first_higher_wrist(higher_idxs, skip_frames=skip_frames) # 900 frames is 15 seconds
    all_idx_bounds = get_all_idx_bounds(highest_idxs, frame_increment=frame_increment)
    df = save_idx_df(fname, all_idx_bounds, out_dir)
    return df

In [21]:
#| code-fold: true
def ensure_out_dir(out_dir_fpath):
    if not os.path.isdir(out_dir_fpath):
        os.makedirs(out_dir_fpath)

def find_each_swing(video_path,
                    per_second=False, # only grab every fps frame
                    num_frames=None, #1500, # Pulls down all of the frames of video
                    start_idx=None, #600, # None starts from 0
                    resize_dim=(256,256),
                    show_progress=True,
                    model_type='vit', 
                    #out_dir='testing'
                   ):
    parent_dir = video_path.parent
    fname = video_path.name.split('.')[0]
    out_dir = f'{parent_dir}/{fname}'
    ensure_out_dir(out_dir)                        
    fname = video_path.name.split('.')[0]
    frames, fps = get_frames(video_path,
                             start_idx=start_idx,
                             per_second=per_second, # only grab every fps frame
                             num_frames=num_frames,#None, # Pulls down all of the frames of video
                             resize_dim=resize_dim,
                             show_progress=show_progress,
                            )
    output_filename = 'full_video.mp4'
    out_fpath = f'{out_dir}/{output_filename}'
    kp_fpath = f'{out_dir}/keypoints/{output_filename.split(".")[0]}.pkl'
    
    save_frames(frames=frames, fps=fps, 
            parent_dir=f'{parent_dir}/{fname}',
            output_filename=output_filename)
    #save_frames(frames=frames, fps=fps, fname=out_fpath)
                       
    process_label_video(out_fpath, out_dir=f'{out_dir}/keypoints')
    df = get_swing_idx_df(kps_fpath=kp_fpath, fname=fname, out_dir=out_dir)
    return df

In [22]:
#fname = files[0].name.split('.')[0]
df = find_each_swing(files[1], start_idx=None, num_frames=1500, )#out_dir=fname)

100%|███████████████████████████████████████████████████████████████████████████| 1500/1500 [00:04<00:00, 368.81it/s]


Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192-e32adcd4_20230314.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: backbone.cls_token



  _bootstrap._exec(spec, module)


Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth


  with torch.cuda.amp.autocast(enabled=False):
  with torch.cuda.amp.autocast(enabled=False):
1500it [01:26, 17.36it/s]

11/14 16:58:47 - mmengine - [4m[37mINFO[0m - the output video has been saved at ../../../data/full_videos/sep14/IMG_1087/keypoints/full_video.mp4


1500it [01:26, 17.35it/s]


### Now everything is setup so we can extract all the swing frames with one command
- the entire thing is parameterized in order to make small tweaks in how much data we index around the peak
- It should make things easier/more reproducible
    - Imagine a scenario where we decide to add some processing to videos before doing all of this, now that can be added with a function/line of code into this overall pipeline
- Everything should remain organizable
- Scaling things up to full frame shouldn't be a problem. we grab the frames and just clip with ffmpeg commands --> label them with the labeler. 7 swings is about 20-25 seconds. This can be done in parallel
## Next up:
- Process individual swings and apply the analysis framework
    - Want to further build out functionality; add x-torque and others
- Update plotting functionality to make it more modular and flexible
* Ultimately want to be able to point to a folder of videos and output all the plots of interest 

In [24]:
df.head()

Unnamed: 0,swing_idx,start_idx,end_idx
0,0,1369,1549


In [40]:
#| code-fold: True
import ffmpeg

def make_output_filename(fname, swing_idx, score=None):
    return f'{fname}_swing_{swing_idx}_score_{score}'

def make_clip(input_file_path, 
              output_folder_path,
              row, 
              #duration_frames=90,  # Changed from time='0:03'
              crf='18',
              vcodec='libx264'):   # Changed from 'copy' since we need to use filter
    fname = input_file_path.name.split('.')[0]
    swing_idx, start_frame, end_frame = row.values
    output_file_name = make_output_filename(fname, swing_idx)
    output_file_path = f'{output_folder_path}/{fname}/{output_file_name}.mp4'
    import pdb
    #pdb.set_trace()
    if os.path.isdir(output_folder_path) is False:
        os.mkdir(output_folder_path)
        
    # Use trim filter for frame-accurate cutting
    (
        ffmpeg.input(input_file_path)
        .trim(start_frame=start_frame, 
              end_frame=end_frame)
        .setpts('PTS-STARTPTS')  # Reset timestamps
        .output(output_file_path, 
                vcodec=vcodec,
                crf=crf, 
                acodec='aac')
        .global_args('-movflags', '+faststart')
        .overwrite_output()
        .run()
    )

In [43]:
# for x in range(len(df)):
#     make_clip(input_file_path=files[1], 
#               output_folder_path=parent_dir,
#               row = df.iloc[x]
#              )

In [52]:
def end_to_end_detect(fpath, start_idx=None, num_frames=None):
    df = find_each_swing(fpath, start_idx=start_idx, num_frames=num_frames,)#1500, )
    parent_dir = fpath.parent
    for x in range(len(df)):
        make_clip(input_file_path=fpath, 
                  output_folder_path=parent_dir,
                  row = df.iloc[x]
                 )
    return df

In [1]:
from fastai.vision.all import *
from auto_clipper import *

In [2]:
base_path = '../../../data/full_videos'
swing_days = ['jun8', 'aug9', 'sep14']
files = get_files(f'{base_path}/{swing_days[-1]}', extensions='.MOV')

In [3]:
files[6]

Path('../../../data/full_videos/sep14/IMG_1091.MOV')

In [4]:
end_to_end_detect(files[6], start_idx=1800)

 87%|███████████████████████████████████████████████████████████████▏         | 11665/13465 [00:30<00:04, 379.58it/s]


Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192-e32adcd4_20230314.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: backbone.cls_token



  _bootstrap._exec(spec, module)


Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth


  with torch.cuda.amp.autocast(enabled=False):
  with torch.cuda.amp.autocast(enabled=False):
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
13465it [13:01, 17.47it/s]

11/14 23:54:24 - mmengine - [4m[37mINFO[0m - the output video has been saved at ../../../data/full_videos/sep14/IMG_1091/keypoints/full_video.mp4


13465it [13:01, 17.24it/s]
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo

Unnamed: 0,swing_idx,start_idx,end_idx
0,0,405,585
1,1,2210,2390
2,2,4151,4331
3,3,6580,6760
4,4,8240,8420
5,5,10762,10942


In [None]:
lbl_clips(files[6])

In [10]:
#end_to_end_detect(files[2])
#end_to_end_detect(files[7])

In [40]:
def lbl_clips(fpath):
    fname = fpath.name.split('.')[0]
    parent_dir = fpath.parent
    clips_folder_path = f'{parent_dir}/{fname}'
    clipped_videos = [x.name for x in get_files(clips_folder_path, extensions='.mp4') if x.name[:3] == 'IMG']
    for video in clipped_videos:
        clip_video_path = f'{clips_folder_path}/{video}'
        process_label_video(clip_video_path, 
                    out_dir=f'{clips_folder_path}/keypoints')

In [43]:
#[lbl_clips(fpath) for fpath in files]

In [44]:
files

(#8) [Path('../../../data/full_videos/sep14/IMG_1090.MOV'),Path('../../../data/full_videos/sep14/IMG_1087.MOV'),Path('../../../data/full_videos/sep14/IMG_1088.MOV'),Path('../../../data/full_videos/sep14/IMG_1092.MOV'),Path('../../../data/full_videos/sep14/IMG_1086.MOV'),Path('../../../data/full_videos/sep14/IMG_1093.MOV'),Path('../../../data/full_videos/sep14/IMG_1091.MOV'),Path('../../../data/full_videos/sep14/IMG_1089.MOV')]