Go to mesh-and-bones-to-rig directory.

In [12]:
import os
os.chdir('D:/mesh-and-bones-to-rig') # Change to the right directory.

## Install the dependencies

First go through this cell when setting up the virtual environment.

In [2]:
!pip install --upgrade pip setuptools wheel
!pip install --upgrade build
!pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
!pip install torch_geometric==2.6.1 torch_scatter==2.1.2 torch_sparse==0.6.18 torch_cluster==1.6.3 -f https://data.pyg.org/whl/torch-2.6.0+cu124.html

Looking in indexes: https://download.pytorch.org/whl/cu124
Looking in links: https://data.pyg.org/whl/torch-2.6.0+cu124.html


After that install the requirements.

In [3]:
!pip install -r requirements.txt



In [13]:
import os
import gdown
import zipfile
import shutil
import random
import numpy as np
from joblib import Parallel, delayed

from data.utils import load_obj, parse_rig_info
from preprocess_utils.surface_geo import compute_surface_geodesic_features
from preprocess_utils.volumetric_geo import compute_volumetric_geodesic

#### Note: The notebook here is divided in subsections depending on what you want to do. If you don't want to download a lot of things, I encourage you not to run all cells in it.

In [14]:
# Folder in which to store the data. Run this cell first.
root_output_path = "data/model_data/"

### Download the reduced preprocessed data (includeing what I computed)

Like I said in the previous notebook (0.) I will be working with a reduced subset of the dataset as I don't have the time. That is the reason why you need not download GB and GB of data, so I have prepared a smaller version of the preprocessed data with 100-9-9 train-val-test split (https://drive.google.com/file/d/1vqbsFsrwykWwydWjpjiKAY6G87TFV4Lx/view?usp=drive_link). The code for that will be available in the notebook if one has the original preprocessed dataset and the preprocessed data which I computed.

In [37]:
# URL of the Google Drive file
url = "https://drive.google.com/uc?id=1vqbsFsrwykWwydWjpjiKAY6G87TFV4Lx"

# Path where you want to save the downloaded file
output_zip = os.path.join(root_output_path, "downloaded_file.zip")

# Target extraction path
extract_path = os.path.join(root_output_path, "ModelResource_MeshAndBonesToRig_preproccessed_reduced_subset")

In [38]:
# Create if it doesn't exist
if not os.path.exists(root_output_path):
    os.makedirs(root_output_path)

# Download the file
gdown.download(url, output_zip, quiet=False)

# Unzip only the specified files and folders
with zipfile.ZipFile(output_zip, 'r') as zip_ref:
   zip_ref.extractall(extract_path)

# Delete the zip file
os.remove(output_zip)

print("Download, extraction, and cleanup completed.")

Downloading...
From (original): https://drive.google.com/uc?id=1vqbsFsrwykWwydWjpjiKAY6G87TFV4Lx
From (redirected): https://drive.google.com/uc?id=1vqbsFsrwykWwydWjpjiKAY6G87TFV4Lx&confirm=t&uuid=ad4d0336-56b7-4cc5-a7c8-a2c3c605a7e7
To: D:\mesh-and-bones-to-rig\data\model_data\downloaded_file.zip
100%|██████████| 1.87G/1.87G [00:30<00:00, 61.3MB/s]


Download, extraction, and cleanup completed.


### If you want to download the original preprocessed data

This step might take a while. We are downloading the original preprocessed data from Google Drive https://drive.google.com/uc?id=1-B6hJ4423rw1LrTForHp7oaG5qRAbJx3.

In [4]:
# URL of the Google Drive file
url = "https://drive.google.com/uc?id=1-B6hJ4423rw1LrTForHp7oaG5qRAbJx3"

# Path where you want to save the downloaded file
output_zip = os.path.join(root_output_path, "downloaded_file.zip")

# Target extraction path
extract_path = os.path.join(root_output_path, "ModelResource_RigNetv1_preproccessed")

In [7]:
# Create if it doesn't exist
if not os.path.exists(root_output_path):
    os.makedirs(root_output_path)

# List of files to extract
# We will only need remeshed meshes and rig data for them
files_to_extract = [
    "obj_remesh/*",
    "rig_info_remesh/*",
    "test_final.txt",
    "train_final.txt",
    "val_final.txt"
]

# Download the file
gdown.download(url, output_zip, quiet=False)

# Unzip only the specified files and folders
with zipfile.ZipFile(output_zip, 'r') as zip_ref:
    all_files = zip_ref.namelist()  # List all contents of the zip file

    for item in files_to_extract:
        if item.endswith("/"):  # Extract entire folder (including subfolders)
            for file in all_files:
                if file.startswith(item):  # Check if the file belongs to the folder
                    zip_ref.extract(file, extract_path)

        elif item.endswith("/*"):  # Extract only direct files inside a folder (no subfolders)
            folder_path = item[:-2]  # Remove "/*" to get the folder name
            for file in all_files:
                if file.startswith(folder_path) and "/" not in file[len(folder_path) + 1:]:
                    zip_ref.extract(file, extract_path)

        else:  # Extract individual files
            if item in all_files:
                zip_ref.extract(item, extract_path)

# Delete the zip file
os.remove(output_zip)

print("Download, selective extraction, and cleanup completed.")

Downloading...
From (original): https://drive.google.com/uc?id=1-B6hJ4423rw1LrTForHp7oaG5qRAbJx3
From (redirected): https://drive.google.com/uc?id=1-B6hJ4423rw1LrTForHp7oaG5qRAbJx3&confirm=t&uuid=372e5224-8422-48dc-be31-890274cd8e72
To: D:\mesh-and-bones-to-rig\data\model_data\downloaded_file.zip
100%|██████████| 1.06G/1.06G [00:24<00:00, 42.7MB/s]


Download, selective extraction, and cleanup completed.


If one wants to download the original dataset which are meshes in fbx format, the link is https://drive.google.com/file/d/1yojBwl5eHPqgXZ1Uh4j26S-yKK-2loPu/view.

### My preprocessing of the original preprocessed data (the code for it and a link)

You can downlaod the preprocessed data which I generated for the whole dataset from: (it is ~100GB (40GB compressed) mainly because of the full surface geodesic matrices): https://drive.google.com/file/d/1XxYvhjHMr7yN6kYINDhLv4p3CZqxu7eZ/view?usp=drive_link.

In [2]:
def process_mesh(base_name, obj_dir, rig_dir, cache_dir):
    """
    Process a single mesh: load the mesh and rig info, compute the volumetric geodesic distances
    and the full surface geodesic matrix, and cache the results as .npy files.

    Parameters:
        base_name (str): The base name of the mesh (without extension).
        obj_dir (str): Directory containing OBJ files.
        rig_dir (str): Directory containing rig info text files.
        cache_dir (str): Directory to save cached files.

    Returns:
        A tuple (base_name, status) where status is one of "skipped", "error", or "done".
    """
    rig_path = os.path.join(rig_dir, base_name + ".txt")
    obj_path = os.path.join(obj_dir, base_name + ".obj")

    vol_geo_path = os.path.join(cache_dir, base_name + "_volgeo.npy")
    surf_geo_path = os.path.join(cache_dir, base_name + "_surfgeo.npy")

    # If both cached files exist, skip processing.
    if os.path.exists(vol_geo_path) and os.path.exists(surf_geo_path):
        print(f"[{base_name}] Cache exists, skipping computation.")
        return base_name, "skipped"

    try:
        # Load the mesh.
        mesh, vertices_np, faces_np, normals_np = load_obj(obj_path)
        N_obj = vertices_np.shape[0]

        # Parse rig info.
        bone_positions_np, root_joint, bone_hierarchy, skin_weights_dict, bone_names = parse_rig_info(rig_path)

        # Compute volumetric geodesic distances (shape: (N_obj, B)).
        vol_geo_np = compute_volumetric_geodesic(vertices_np, faces_np, bone_positions_np)
        # Compute full surface geodesic matrix (shape: (N_obj, N_obj)).
        surf_geo_np = compute_surface_geodesic_features(obj_path)

        # Save the computed features.
        np.save(vol_geo_path, vol_geo_np)
        np.save(surf_geo_path, surf_geo_np)
        print(f"[{base_name}] Done with file.")
        return base_name, "done"

    except Exception as e:
        print(f"[{base_name}] Error processing mesh: {e}. Skipping this mesh.")
        return base_name, "error"


In [9]:
def process_all_meshes(base_names, obj_dir, rig_dir, cache_dir, n_jobs=4):
    results = Parallel(n_jobs=n_jobs)(delayed(process_mesh)(base_name, obj_dir, rig_dir, cache_dir)
                                       for base_name in base_names)
    for base_name, status in results:
        print(f"[{base_name}] Status: {status}")

In [None]:
# Ensure extract_path is defined.
root_dir = os.path.join(root_output_path, "ModelResource_RigNetv1_preproccessed")
# Directories needed to perform preprossing of mesh data (compute surface geodesic distance for each pair of vertices, compute volumetric geodesic distances for each pair vertex-bone).
obj_dir = os.path.join(root_dir, "obj_remesh/")
rig_dir = os.path.join(root_dir, "rig_info_remesh/")
cache_dir = os.path.join(root_dir, "precomputed")
os.makedirs(cache_dir, exist_ok=True)

In [11]:
# List all OBJ files and get their base names.
base_names = [os.path.splitext(f)[0] for f in os.listdir(obj_dir) if f.lower().endswith(".obj")]

process_all_meshes(base_names, obj_dir, rig_dir, cache_dir, n_jobs=4)



[10000] Status: skipped
[10006] Status: skipped
[10009] Status: skipped
[10017] Status: skipped
[10021] Status: skipped
[10029] Status: skipped
[10034] Status: skipped
[10047] Status: skipped
[10057] Status: skipped
[10058] Status: skipped
[10059] Status: skipped
[10061] Status: skipped
[10062] Status: skipped
[10063] Status: skipped
[10065] Status: skipped
[10067] Status: skipped
[10068] Status: skipped
[10069] Status: skipped
[10070] Status: skipped
[10071] Status: skipped
[10072] Status: skipped
[10073] Status: skipped
[10074] Status: skipped
[10075] Status: skipped
[10076] Status: skipped
[10077] Status: skipped
[10078] Status: skipped
[10079] Status: skipped
[10080] Status: skipped
[10082] Status: skipped
[10083] Status: skipped
[10084] Status: skipped
[10085] Status: skipped
[10087] Status: skipped
[10088] Status: skipped
[10089] Status: skipped
[10090] Status: skipped
[10091] Status: skipped
[10092] Status: skipped
[10093] Status: skipped
[10094] Status: skipped
[10095] Status: 

### How to create the 100-9-9 train-val-test split of the preprocessed data

You have to have the original preprocessed data and the processed data which I have computed in order to run this successfully.

In [15]:
# Set a random seed for reproducibility
RANDOM_SEED = 37
random.seed(RANDOM_SEED)

In [16]:
# Define source and destination folders
source_folder = os.path.join(root_output_path, "ModelResource_RigNetv1_preproccessed")
destination_folder = os.path.join(root_output_path, "ModelResource_MeshAndBonesToRig_preproccessed_reduced_subset")
os.makedirs(destination_folder, exist_ok=True)

In [19]:
with open(os.path.join(source_folder, "train_final.txt"), 'r') as f:
    train_names = [line.strip() for line in f.readlines()]
with open(os.path.join(source_folder, "val_final.txt"), 'r') as f:
    val_names = [line.strip() for line in f.readlines()]
with open(os.path.join(source_folder, "test_final.txt"), 'r') as f:
    test_names = [line.strip() for line in f.readlines()]

In [20]:
# Randomly sample with reproducibility
selected_train_names = random.sample(train_names, 100)
selected_val_names = random.sample(val_names, 9)
selected_test_names = random.sample(test_names, 9)

In [21]:
# Combine selected files
selected_files = selected_train_names + selected_val_names + selected_test_names

In [23]:
len(selected_files)

118

In [24]:
# Create the destination folders, the last folder is repreated twise as my preproccessing code put the computed files in the same folder (my bad).
folders = ["obj_remesh", "rig_info_remesh", "precomputed", "precomputed"]
for folder in folders:
    os.makedirs(os.path.join(destination_folder, folder), exist_ok=True)

In [25]:
# File suffixes
suffixes = [".obj", ".txt", "_volgeo.npy", "_surfgeo.npy"]

In [26]:
for folder, suffix in zip(folders, suffixes):
    for file_name in selected_files:
        source_path = os.path.join(source_folder, folder, file_name + suffix)
        if os.path.exists(source_path):  # Ensure file exists before moving
            destination_path = os.path.join(destination_folder, folder, file_name + suffix)
            shutil.move(source_path, destination_path)
            print(f"Moved: {source_path} → {destination_path}")

Moved: data/model_data/ModelResource_RigNetv1_preproccessed\obj_remesh\13173.obj → data/model_data/ModelResource_MeshAndBonesToRig_preproccessed_reduced_subset\obj_remesh\13173.obj
Moved: data/model_data/ModelResource_RigNetv1_preproccessed\obj_remesh\9811.obj → data/model_data/ModelResource_MeshAndBonesToRig_preproccessed_reduced_subset\obj_remesh\9811.obj
Moved: data/model_data/ModelResource_RigNetv1_preproccessed\obj_remesh\1112.obj → data/model_data/ModelResource_MeshAndBonesToRig_preproccessed_reduced_subset\obj_remesh\1112.obj
Moved: data/model_data/ModelResource_RigNetv1_preproccessed\obj_remesh\3624.obj → data/model_data/ModelResource_MeshAndBonesToRig_preproccessed_reduced_subset\obj_remesh\3624.obj
Moved: data/model_data/ModelResource_RigNetv1_preproccessed\obj_remesh\708.obj → data/model_data/ModelResource_MeshAndBonesToRig_preproccessed_reduced_subset\obj_remesh\708.obj
Moved: data/model_data/ModelResource_RigNetv1_preproccessed\obj_remesh\9999.obj → data/model_data/ModelRe

In [30]:
txt_file_names = ["test_final.txt", "train_final.txt", "val_final.txt"]
selected_files_lists = [selected_test_names, selected_train_names, selected_val_names]

In [32]:
# Save the selected files to a text file
for txt_file_name, selected_files_list in zip(txt_file_names, selected_files_lists):
    with open(os.path.join(destination_folder, txt_file_name), 'w') as f:
        for file_name in selected_files_list:
            f.write(file_name + "\n")
    print(f"Selected files saved to {txt_file_name}")

Selected files saved to test_final.txt
Selected files saved to train_final.txt
Selected files saved to val_final.txt


In [43]:
# Get all unique filenames
filenames_with_normals = set({filename.split('_')[0] for filename in os.listdir(os.path.join(destination_folder, "precomputed"))})
len(filenames_with_normals)


117

We have wrotten luck that out of 118 chosen meshes 1 are without their normals and hence we have no surface geodesic or volumetric geodesic data computed for them... We will have to skip it... I should have forseen that the probability of getting such meshes is too high... and should have added a check to choose from those that have normals...

In [44]:
set(selected_files) - filenames_with_normals


{'1112'}

In [45]:
selected_train_names.index("1112")

2

We will have to ignore that mesh when training...
