This is our attempt to recreate / reproduce the model described in https://github.com/YoungSeng/DiffuseStyleGesture
This won the reproducability award in the 2023 Genea gesture animation challenge. This is the reason that we are reproducing it. 


Or more specifically, we are reproducing it to improve our understanding of the field and existing approaches. This is with the goal of creating a system of our own at the end.

Learning goals are:
- We will understand the structure of the data and how to work with it
- We want to understand how to preprocess the data
- We want build a gesture syntehesis ML model from scratch
- We want to train a advanced ML model, which we have never done before

We are unsure of what the best way to start is, but we will begin with transforming data from bvh to some numpy representation that we can do machine learning with, and back to bvh. DiffuseStyleGesture extract all sorts of extra information from the animation data, such as acceleration, rotational ecceleration, velocity, etc etc. We should probably try to do some of the same.

In [115]:
from typing import List, Optional
import bvh

class Bone:
    def __init__(self, name: str, offset: List[float], channels: List[str], parent: Optional["Bone"]):
        self.name = name
        self.offset = offset
        self.channels = channels
        self.parent = parent

        self.children: List["Bone"] = []  # Initialize an empty list for children

    def add_child(self, child: "Bone"):
        self.children.append(child)

    def print(self, indent: int = 0):
        print("  " * indent + self.name + "   --    " + str(self.offset) + " " + str(self.channels))
        for child in self.children:
            child.print(indent + 1)

    # @classmethod
    # def from_bvh_node_recursive(cls, bvh_node: bvh.BvhNode, parent: Optional["Bone"] = None):
    #     # Construct the Bone object from the bvhNode (assuming bvh_node is a dictionary or similar structure)
    #     if len(bvh_node.value) > 0:
    #         if bvh_node.value[0] == 'JOINT':
    #             name=bvh_node.value[1]
    #             # Create the Bone object
    #             bone = cls(name=name, offset=[], channels=[], parent=parent)
    #             parent.add_child(bone)
    #         if bvh_node.value[0] == 'OFFSET':
    #             parent.offset=[float(bvh_node.value[1]), float(bvh_node.value[2]), float(bvh_node.value[3])]
    #         if bvh_node.value[0] == 'CHANNELS':
    #             parent.channels=[bvh_node.value[1], bvh_node.value[2], bvh_node.value[3], bvh_node.value[4], bvh_node.value[5], bvh_node.value[6]]
    #     for child_node in bvh_node.children:
    #         cls.from_bvh_node_recursive(child_node, bone)
    
    # @classmethod
    # def from_bvh_node(cls, bvh_node: bvh.BvhNode):
    #     root = cls(name="root", offset=[], channels=[], parent=None)
    #     cls.from_bvh_node_recursive(bvh_node, root)
    #     return root


    @classmethod
    def parse_hierarchy(cls, lines):
        stack = []
        root = None
        current_bone = None
        
        for line in lines:
            line = line.strip()
            if not line:
                continue
            
            if line.startswith("ROOT") or line.startswith("JOINT"):
                name = line.split()[1]
                new_bone = Bone(name, [], [], current_bone)
                if current_bone:
                    current_bone.add_child(new_bone)
                if not root:
                    root = new_bone
                stack.append(new_bone)
                current_bone = new_bone
            
            elif line.startswith("OFFSET"):
                current_bone.offset = list(map(float, line.split()[1:]))
            
            elif line.startswith("CHANNELS"):
                current_bone.channels = line.split()[2:]
            elif line == "{":
                continue  # Ignore opening brackets
            elif line == "}":
                if stack:
                    stack.pop()
                    current_bone = stack[-1] if stack else None
        return root

In [None]:
# import bvh

# Load BVH file
# with open("./genea2023_dataset/trn/main-agent/bvh/trn_2023_v0_000_main-agent.bvh") as f:
#     bvh_data = bvh.Bvh(f.read())

# Example usage
# with open("./genea2023_dataset/trn/main-agent/bvh/trn_2023_v0_000_main-agent.bvh") as f:
#     lines = f.readlines()

# root_bone = Bone.parse_hierarchy(lines)
# if root_bone:
#     root_bone.print()


import bvh_loader
bvh_data = bvh_loader.load("./genea2023_dataset/trn/main-agent/bvh/trn_2023_v0_000_main-agent.bvh")

print(bvh_data)

# def print_hierarchy(bone: bvh.BvhNode, depth=0, parent: bvh.BvhNode = None):
#     if len(bone.value) > 0:
#         if bone.value[0] == 'JOINT':
#             print("  " * depth + bone.value[1])
#         if bone.value[0] == 'OFFSET':
#             
#         # print(bone.name)
#         # print("  " * depth + str(bone))
#     for child in bone.children:
#         print_hierarchy(child, depth + 1, bone)

# print_hierarchy(bvh_data.root)


{'rotations': array([[[ 0.00000e+00,  0.00000e+00, -0.00000e+00],
        [-1.34330e+01, -1.37752e+01, -8.50880e+00],
        [ 3.59529e+00,  3.97883e+00, -5.75931e-01],
        ...,
        [ 7.54916e-14, -7.00000e-01,  1.78601e-12],
        [-4.37326e-15,  0.00000e+00,  1.59028e-15],
        [-1.37610e+01,  1.55862e+01,  1.81960e+01]],

       [[ 0.00000e+00,  0.00000e+00, -0.00000e+00],
        [-1.33363e+01, -1.38614e+01, -8.62397e+00],
        [ 3.49525e+00,  3.97982e+00, -5.62715e-01],
        ...,
        [ 3.19639e-12, -7.00000e-01,  4.03006e-12],
        [-3.57812e-15,  1.11319e-14, -0.00000e+00],
        [-1.36274e+01,  1.55278e+01,  1.81644e+01]],

       [[ 0.00000e+00,  0.00000e+00, -0.00000e+00],
        [-1.32492e+01, -1.38521e+01, -8.71572e+00],
        [ 3.49525e+00,  3.97982e+00, -5.62715e-01],
        ...,
        [-2.72549e-12, -7.00000e-01,  2.78637e-12],
        [ 3.97569e-15,  4.77083e-15, -4.77083e-15],
        [-1.35232e+01,  1.54228e+01,  1.83335e+01]],

     

We have found and downloaded pymo. https://omid.al/projects/pymo/
This is a library for doing machine learning with motion capture data. We are using files prepared for the genea challenge that make use of this library to convert from bvh to features (a numpy tensor)

The script for the genea challenge extends the pipeline with a root normalisation step, which is intended to make sure all subjects are pointed the same way. For now we skip this step as we will start by just training on the main subject and not take the conversation partner into account.

One idea we have gotten for data augmentation to increase the dataset size is to duplicate the dataset and mirror it.
