---
title: "Swing Normalization"
author: "Ali Zaidi"
date: "2025-11-18"
categories: [Data Engineering]
description: "Now that we have an appropriate dataclass for our swings lets work on normalizing swings so that we can handle camera perturbations"
format:
  html:
    code-fold: true
jupyter: python3
---

In [2]:
#| include: false
from fastai.vision.all import *
#from swing_data import *

In [3]:
#| include: false
base_path = '../../../data/full_videos/ymirza'
swing_days = ['jun8', 'aug9', 'sep14']
parent_dir = f'{base_path}/{swing_days[-1]}'
files = [file for file in get_files(parent_dir, extensions='.pkl') if file.name[:3] == 'IMG']
file_names = [file.name.split('.')[0] for file in files]
len(file_names)

47

In [13]:
l_sh = kp_0.l_sh
r_sh = kp_0.r_sh
l_wr = kp_0.l_wr
r_wr = kp_0.r_wr

In [14]:
l_sh.shape, l_sh[:, 0].shape

((180, 3), (180,))

## Lets flesh out some functions to derive useful distance and degree relationships

*** must keep in mind the following:
    - x increases to the right → same as normal.
    - y usually increases downwards, opposite of standard Cartesian.
    - If we want “upwards” to be positive angle (like in usual math diagrams),
         arctan2 angles will be flipped vertically compared to that intuition.
    - If we want a more “math-like” angle where up is positive y --> negate dy

In [4]:
def get_kp_xy_components(kps):
    # [0] index is horizontal value (x) and [1] is vertical (y)
    x_component = kps[:, 0]
    y_component = kps[:, 1]
    return x_component, y_component
    
def get_angle_degree(first_kps, second_kps):
    first_x_arr, first_y_arr = get_kp_xy_components(first_kps)
    second_x_arr, second_y_arr = get_kp_xy_components(second_kps)
    dx = first_x_arr - second_x_arr
    dy = first_y_arr - second_y_arr
    angle_radians = np.arctan2(dy, dx)
    angle_degree = np.degrees(angle_radians)
    return angle_degree

def get_kps_distance(first_kps, second_kps):
    first_x_arr, first_y_arr = get_kp_xy_components(first_kps)
    second_x_arr, second_y_arr = get_kp_xy_components(second_kps)
    dx = first_x_arr - second_x_arr
    dy = first_y_arr - second_y_arr
    distance = np.sqrt(dx**2 + dy**2)
    return distance

From Claude: <br>
** Image Coordinate System ** <br>
    --> In video/image coordinates, the y-axis increases downward (not upward like mathematical convention). This means: <br>
	•	An angle of 0° points right <br>
	•	An angle of 90° points down (not up) <br>
	•	An angle of -90° points up (not down) <br>
 
** Angle Range ** <br>
    --> The function returns angles in [-180, 180], where negative angles represent clockwise rotation from the positive x-axis and positive angles represent counter-clockwise rotation.

### We need to account for potential perturbations -- such as from the camera orientation and also potenital shake in the camera itself, a simple way to manage this is to normalize the coordinates relative to something within the body
    - This way absolute position values won't matter --> if the camera moves up, the height values of the feet will have shifted in the frame, but the player did not move up in the vertical dimension
    - Can be done by dividing all keypoint values by a refernce distance that remains anatomically stable frame by frame
    - These people tried out a bunch of different normalization techniques and a bunch worked well:
        - https://www.mdpi.com/1424-8220/22/11/4245
        - Look at the points separated in their own space once normalized on right!

![](paper_img.png)

### Why is this important?

#### Camera distance, focal length, and resolution create different pixel-space scales for the same physical movements. Normalization converts absolute pixel distances into scale-invariant relative proportions, making measurements comparable across videos. For example, torso-based normalization maintains consistent body proportions while allowing limb movements to vary naturally

## After normalization, our get_distance() returns values like 0.99 (99% of torso length) instead of 158 pixels
### Enabling direct comparison between different recording setups

In [6]:
# Use one diagonal to normalize
def normalize_by_torso_diagonal(kps, l_sh_to_r_hip=True):
    if l_sh_to_r_hip: #left shoulder to right hip
        shoulder = kps.l_sh
        hip = kps.r_hi
    else:
        shoulder = kps.r_sh
        hip = kps.l_hi
    
    torso_diagonal = np.sqrt(np.sum((shoulder - hip)**2, axis=1))
    normalized_kps = kps.kps / torso_diagonal[:, np.newaxis, np.newaxis]
    
    return normalized_kps

# Use BOTH diagonals to normalize
def normalize_by_average_torso(kps):
    left_shoulder = kps.l_sh
    right_shoulder = kps.r_sh
    left_hip = kps.l_hi
    right_hip = kps.r_hi
    
    diagonal1 = np.sqrt(np.sum((left_shoulder - right_hip)**2, axis=1))
    diagonal2 = np.sqrt(np.sum((right_shoulder - left_hip)**2, axis=1))
    avg_torso = (diagonal1 + diagonal2) / 2.0
    return kps.kps / avg_torso[:, np.newaxis, np.newaxis]

#normalize_by_average_torso(kp_0).shape

### One more normalization technique.....

Imagine you took a picture of your friend, but you held the camera crooked. Now your friend looks like they are leaning sideways, even though they were standing straight.

If you wanted to measure how tall they are or where their hands are, it's hard to do because everything is tilted.

This code fixes the picture.

The Pin (Anchor Point): First, it takes a pin and sticks it through a specific spot on the photo (like their hip or the center of their chest) and pins it to the wall.

The Shoulders: Then, it looks at their left shoulder and their right shoulder. It draws a line between them.

The Spin: It grabs the picture and spins it on the wall until that shoulder line is perfectly flat (horizontal).

The Result: Now, no matter how crooked the camera was originally, your friend is standing perfectly straight up and down in the data. This makes it much easier for the computer to compare this swing to the next swing.

In [7]:
def align_to_body_frame(keypoints, left_shoulder, right_shoulder, anchor_point=None):
    """Rotate so shoulders are horizontal. Creates rotation + translation invariance.
    
    Returns keypoints in body-centric frame where:
    - X-axis: along shoulder line
    - Y-axis: perpendicular to shoulders
    """
    if anchor_point is not None:
        keypoints = make_relative_to_anchor(keypoints, anchor_point)
        left_shoulder = left_shoulder - anchor_point
        right_shoulder = right_shoulder - anchor_point
    
    shoulder_vec = left_shoulder - right_shoulder
    angle = np.arctan2(shoulder_vec[..., 1], shoulder_vec[..., 0])
    
    cos_a = np.cos(-angle)
    sin_a = np.sin(-angle)
    
    if keypoints.ndim == 2:
        rotation_matrix = np.array([[cos_a, -sin_a], [sin_a, cos_a]])
        return keypoints @ rotation_matrix.T
    else:
        rotation_matrices = np.array([[cos_a, -sin_a], [sin_a, cos_a]])
        if rotation_matrices.ndim == 3:
            rotation_matrices = np.transpose(rotation_matrices, (2, 0, 1))
        return np.einsum('...ij,...jk->...ik', keypoints, rotation_matrices)

### Ok so we have some distance and angle functionality and we also have a way to normalize our keypoints so lets add the potential for some additional geometric relationships

#### [1] Vector level features between two points with a unit vector
   - tells you where second point is relative to the first <br>
   ** Independent of distance

In [8]:
def get_unit_vector(first_kps, second_kps):
    """
    Calculates unit vector from A to B.
    Shape: Inputs (N, 2) -> Output (N, 2)
    """
    vec = second_kps[...,:2] - first_kps[...,:2]
    
    # Manual norm is faster than np.linalg.norm
    norm = np.sqrt(vec[..., 0]**2 + vec[..., 1]**2)
    
    # Avoid division by zero safely
    norm = np.where(norm == 0, 1, norm)
    
    # Reshape norm to (N, 1) to allow broadcasting
    return vec / norm[..., None]

print(f'"get_unit_vector()" function is useful becuase models often like \
separate  unit vectors in the x and y dimension rather than a single \
value')
#get_unit_vector(kp_0.l_sh, kp_0.l_el).shape

"get_unit_vector()" function is useful becuase models often like separate  unit vectors in the x and y dimension rather than a single value


#### [2] Angle between segments --> use joint angles ABC to find angle @ B
   - elbow angle (shoulder, elbow, wrist)
   - wrist cock (forearm - club shaft angle)
   - spine angle (hip - shoulder - head)

In [9]:
def get_angle_fast(A, B, C):
    """
    Vectorized function to calculate angle at point B given A, B, C.
    Inputs A, B, C can be shape (2,) or (N, 2).
    Returns angle in degrees (0-180).
    """
    # Create vectors BA and BC
    ba = A - B
    bc = C - B

    # Calculate angle using arctan2 (returns radians between -pi and pi)
    # arctan2(y, x) handles the quadrants correctly
    ang_ba = np.arctan2(ba[..., 1], ba[..., 0])
    ang_bc = np.arctan2(bc[..., 1], bc[..., 0])

    # Calculate relative angle and convert to degrees
    angle = np.abs(np.degrees(ang_ba - ang_bc))
    
    # Normalize to 0-180 (inner angle)
    # E.g., if angle is 270, inner angle is 90.
    angle = np.where(angle > 180, 360 - angle, angle)
    
    return angle

#get_angle_fast(kp_0.l_sh, kp_0.l_el, kp_0.l_wr).shape

In [10]:
def elbow_angle(shoulder, elbow, wrist):
    '''elbow angle (shoulder, elbow, wrist) '''
    return get_angle_fast(shoulder, elbow, wrist)

# def wrist_cock_angle(elbow, wrist, club_end):
#     '''wrist cock (forearm - club shaft angle)'''
#     return get_angle_fast(elbow, wrist, club_end)

def spine_angle(hip, shoulder, head):
    '''spine angle (hip - shoulder - head)'''
    return get_angle_fast(hip, shoulder, head)

#### [3] Pose relative / frame-relative geormetry
   - Instead of absolute coordianates, make everything relative to some anchor <br>
     - subtract pelvis/hip center from all key points <br>
     - OPTIONALLY: rotate coordinates so shoulders define a horizontal “body x-axis” <br>
<b> * Gives us features that are: <b> <br>
     - translation-invariant (location in frame irrelevant)<br>
     - closer to “true” body pose, especially if the camera jitters <br>

In [11]:
def normalize_keypoints(kps, 
                        hip_idxs=(11, 12), 
                        shoulder_idxs=(5, 6), 
                        align_shoulders=False):
    """
    Normalizes pose keypoints to be translation and (optionally) rotation invariant.
    
    Args:
        kps (np.ndarray): Input keypoints of shape (Frames, K, 2) or (Frames, K, 3).
                          Expects [x, y] or [x, y, confidence].
        hip_idxs (tuple): Indices for (Left Hip, Right Hip). Default COCO: (11, 12).
        shoulder_idxs (tuple): Indices for (Left Shoulder, Right Shoulder). Default COCO: (5, 6).
        align_shoulders (bool): If True, rotates poses so the shoulder line is horizontal.
        
    Returns:
        np.ndarray: Normalized keypoints of the same shape as input.
    """
    # Copy to avoid modifying original data
    norm_kps = kps.copy()
    
    # 1. CENTERING: Subtract Pelvis/Hip Center
    # Average of Left and Right Hip (x, y)
    # Shape: (Frames, 2)
    hip_centers = (kps[:, hip_idxs[0], :2] + kps[:, hip_idxs[1], :2]) / 2.0
    
    # Broadcast subtraction across all keypoints (N, K, 2) - (N, 1, 2)
    norm_kps[:, :, :2] -= hip_centers[:, None, :]

    if not align_shoulders:
        return norm_kps

    # 2. ROTATION: Align Shoulders to Horizontal Axis
    # Vector from Left Shoulder to Right Shoulder
    # We use the *centered* coordinates to compute this, though vectors are translation-invariant anyway.
    left_s = norm_kps[:, shoulder_idxs[0], :2]
    right_s = norm_kps[:, shoulder_idxs[1], :2]
    
    shoulder_vecs = right_s - left_s  # Shape: (Frames, 2)
    
    # Calculate angle of shoulder vector relative to global X-axis
    # theta is the angle we need to rotate *by* to get back to 0 (horizontal)
    thetas = np.arctan2(shoulder_vecs[:, 1], shoulder_vecs[:, 0])
    
    # We want to rotate by -theta to flatten the line.
    # Rotation matrix for counter-clockwise rotation by alpha:
    # [[cos(alpha), -sin(alpha)],
    #  [sin(alpha),  cos(alpha)]]
    # Here alpha = -theta.
    # cos(-t) = cos(t), sin(-t) = -sin(t)
    c, s = np.cos(thetas), np.sin(thetas)
    
    # Construct rotation matrices: (Frames, 2, 2)
    # R = [[c, s], [-s, c]] represents rotation by -theta
    R = np.stack([
        np.stack([c, s], axis=-1),
        np.stack([-s, c], axis=-1)
    ], axis=-2)
    
    # Apply rotation to all keypoints: x_new = R @ x_old
    # Einsum explanation:
    # n: frames, k: keypoints, i: old coords (x,y), j: new coords (x,y)
    # We multiply the rotation matrix (n, j, i) by the keypoint vector (n, k, i)
    # Transpose R to match standard multiplication or just map indices correctly.
    # Correct mapping for "matrix R applied to vector v": v_new = R @ v
    # kps shape is (N, K, 2). We want (N, K, 2).
    # R shape is (N, 2, 2).
    norm_kps[:, :, :2] = np.einsum('nij,nkj->nki', R, norm_kps[:, :, :2])
    
    return norm_kps

#### [4] Signed area / “rotation” type features (points ABC)
Signed area + cross product tells us:
 - How "curved" or rotated the triplet is
 - On which side one point lies relative to a line<br>
--> ** Will help when detecting: **
- open vs closed shoulders/hips
- left vs right tilt etc

In [13]:
def get_signed_area(p1, p2, p3):
    """
    Calculates the signed area of the triangle formed by triplets (p1, p2, p3).
    This serves as a proxy for 'rotation' or curvature.
    
    Mathematical Property:
        - Positive (+): p3 is to the LEFT of the line p1->p2 (Counter-Clockwise)
        - Negative (-): p3 is to the RIGHT of the line p1->p2 (Clockwise)
        - Zero (0): Points are collinear
    
    Args:
        p1, p2, p3 (np.ndarray): Arrays of shape (N, 2) representing (x, y) coordinates.
                                 Can also handle shape (2,) for single points.

    Returns:
        np.ndarray: Signed area values.
    """
    # Ensure inputs are numpy arrays
    p1, p2, p3 = map(np.array, (p1, p2, p3))

    # Vector A: p1 -> p2
    vec_a = p2 - p1
    
    # Vector B: p1 -> p3
    vec_b = p3 - p1
    
    # 2D Cross Product (Determinant): x1*y2 - x2*y1
    # This calculates the z-component of the cross product in 3D space
    cross_prod = vec_a[..., 0] * vec_b[..., 1] - vec_a[..., 1] * vec_b[..., 0]
    
    # Signed Area is 0.5 * Cross Product
    return 0.5 * cross_prod

ELI5 from gemini:
Imagine you are holding a clock. <br>
The "signed area" (or cross product) tells you which way the clock hands are moving and how big the clock is. <br>
We have three points: A, B, and C. <br>
A is the center of the clock. <br>
B is the number 12. <br>
C is where the hour hand is pointing. <br>
The math checks where C is, compared to the line from A to B.

1. The "Which Way" (Sign)

- Positive (+): The hand moved backwards (Counter-Clockwise). Point C is to the Left.
- Negative (-): The hand moved forwards (Clockwise). Point C is to the Right.
- Zero (0): The hand is pointing straight at 12 (or 6). All three points are in a straight line.

2. The "How Big" (Number)
- Small Number: The hand is very close to the 12. It's barely a triangle; it's almost a straight line.
- Big Number: The hand is far away (like at 3 o'clock or 9 o'clock). The triangle is big and "open."

--> Why is this useful for bodies? <br>
Imagine A and B are your shoulders (making a line). <br>
If C (your hip) is directly underneath them, the number is Zero. You are standing straight. <br>
If you twist your hips forward, the number becomes Positive.

If you twist your hips backward, the number becomes Negative.

It gives you a single number that tells you how much you twisted and which direction you twisted