Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom data #1

Open
weihaosky opened this issue Nov 30, 2023 · 10 comments
Open

Custom data #1

weihaosky opened this issue Nov 30, 2023 · 10 comments

Comments

@weihaosky
Copy link

Congratulations on this excellent work!
I wonder how to run this work on my own data. For example, after capturing a monocular video, how to run your method? How should I process the data for training?
Many thanks!

@JiahuiLei
Copy link
Owner

Thanks for your interest. Our code was developed guided by InstantAvatar https://github.com/tijiang13/InstantAvatar, and indeed, we have a data loader for their preprocessing data format. Our data loading subroutine for UBC data uses the InstantAvatarWildDataset class, which you can modify a little bit and use for the preprocessed data from InstantAvatar preprocessing. (you may take some time to compile the openpose .etc for their preprocessing)

@weihaosky
Copy link
Author

weihaosky commented Dec 1, 2023

Thanks for your reply. In the paper you write you use ReFit to obtain the human pose. May I ask how to process the results from ReFit for the training of GART?

@JiahuiLei
Copy link
Owner

We tried both using InstantAvatar pre-processing to estimate video poses (with temporal optimization), and ReFit https://yufu-wang.github.io/refit_humans/ to estimate per-frame poses. It turns out that under the challenging UBC sequences, Ours and InstantAvatar will work better with ReFit poses. So we first estimate per-frame poses and I manually turn the poses into the same format as InstantAvatar preprocessing. So the data loader is actually loading in the Instant-Avatar preprocessing format.

@weihaosky
Copy link
Author

We tried both using InstantAvatar pre-processing to estimate video poses (with temporal optimization), and ReFit https://yufu-wang.github.io/refit_humans/ to estimate per-frame poses. It turns out that under the challenging UBC sequences, Ours and InstantAvatar will work better with ReFit poses. So we first estimate per-frame poses and I manually turn the poses into the same format as InstantAvatar preprocessing. So the data loader is actually loading in the Instant-Avatar preprocessing format.

Thanks. Could you share the script for transferring the ReFit pose into InstantAvatar format?

@weihaosky
Copy link
Author

weihaosky commented Dec 4, 2023

I have tried to perform the conversion from ReFit result to InstantAvatar format, but the converted data does not work. The code for conversion is as follows:

# self.smpl_poses is the results of ReFit
# K is computed according to focal = np.sqrt(height**2 + width**2)
pose = []
for x in self.smpl_poses['pred_rotmat'][idx]:
    d, angle = mat2axangle(x)
    pose.append(d * angle)
pose = np.stack(pose).astype(np.float32)
# pose[0] = -pose[0]

ret = {
    "rgb": img.astype(np.float32),
    "mask": msk,
    "K": self.K.copy(),
    "smpl_beta": self.smpl_poses['pred_shape'][idx],  # ! use the first beta ???
    "smpl_pose": pose,
    "smpl_trans": self.smpl_poses["trans_full"][idx, 0],
    "idx": idx,
}

May I ask where is the problem? Many thanks!

@JiahuiLei
Copy link
Owner

# convert our mono-pose estimation to ubc fashion dataset
import numpy as np
import os, os.path as osp
import imageio
from pytorch3d.transforms import matrix_to_axis_angle
import torch
from tqdm import tqdm
from pycocotools import mask as masktool
import cv2


def process(seq):
    img_src = f"../data/ubcfashion/train_frames/{seq}/"
    msk_fn = f"../data/ubcfashion/train_mask/{seq}.npy"
    pose_fn = f"../data/ubcfashion/train_smpl/{seq}.npz"
    dst = f"../data/insav_wild/ourpose_ubc_{seq}/"
    os.makedirs(dst, exist_ok=True)

    pose_data = np.load(pose_fn)
    smpl_shape = pose_data["pred_shape"].mean(0)  # Use the average shape
    smpl_pose_list, smpl_global_trans = (
        pose_data["pred_rotmat"],
        pose_data["pred_trans"],
    )
    smpl_pose_list = matrix_to_axis_angle(torch.from_numpy(smpl_pose_list))
    smpl_pose_list = smpl_pose_list.numpy()

    focal, center = pose_data["img_focal"], pose_data["img_center"]
    K = np.eye(3)
    K[0, 0], K[1, 1] = focal, focal
    K[0, 2], K[1, 2] = center[0], center[1]

    pose_save_dict = {
        "betas": smpl_shape,
        "global_orient": smpl_pose_list[:, 0],
        "body_pose": smpl_pose_list[:, 1:].reshape(-1, 69),
        "transl": smpl_global_trans.squeeze(1),
    }
    np.savez_compressed(osp.join(dst, "poses_optimized.npz"), **pose_save_dict)

    image_save_dst = osp.join(dst, "images")
    mask_save_dst = osp.join(dst, "masks")
    os.makedirs(image_save_dst, exist_ok=True)
    os.makedirs(mask_save_dst, exist_ok=True)

    mask_data = np.load(msk_fn, allow_pickle=True)
    masks = [masktool.decode(m).astype(np.bool).astype(np.float32) for m in mask_data]

    for img_fn in tqdm(sorted(os.listdir(img_src))):
        image_id = int(img_fn.split(".")[0])
        mask = masks[image_id]
        img = cv2.imread(osp.join(img_src, img_fn))
        cv2.imwrite(osp.join(image_save_dst, img_fn), img)
        cv2.imwrite(osp.join(mask_save_dst, img_fn), mask * 255)

    cam_save_dict = {
        "intrinsic": K,
        "extrinsic": np.eye(4),
        "height": img.shape[0],
        "width": img.shape[1],
    }
    np.savez_compressed(osp.join(dst, "cameras.npz"), **cam_save_dict)


if __name__ == "__main__":
    # seqs = sorted(os.listdir("../data/ubcfashion/train_frames"))
    seqs = ["91+bCFG1jOS"]

    for seq in seqs:
        process(seq)

# def load_insav_smpl_param(path):
#     smpl_params = dict(np.load(str(path)))
#     if "thetas" in smpl_params:
#         smpl_params["body_pose"] = smpl_params["thetas"][..., 3:]
#         smpl_params["global_orient"] = smpl_params["thetas"][..., :3]
#     return {
#         "betas": smpl_params["betas"].astype(np.float32).reshape(1, 10),
#         "body_pose": smpl_params["body_pose"].astype(np.float32),
#         "global_orient": smpl_params["global_orient"].astype(np.float32),
#         "transl": smpl_params["transl"].astype(np.float32),
#     }


# insav_pose_fn = "../data/insav_wild/91+20mY7UJS/poses_optimized.npz"
# load_insav_smpl_param(insav_pose_fn)
# insav_cam_fn = "../data/insav_wild/91+20mY7UJS/cameras.npz"
# insav_cam = dict(np.load(insav_cam_fn, allow_pickle=True))
# for k, v in insav_cam.items():
#     print(k, v.shape, v.dtype)

# print()

This is the script I used to convert the poses into instant-avatar format, hope it may help you.

@uniBruce
Copy link

Hi, I am trying to use Refit to get SMPL and camera parameters from monocular videos and it seems that the estimation of focal deeply relies on some assumption and cannot be used directly. The result is shown below, could you please provide some advice about this case? Or how did you estimate the focal for the UBC dataset?
example

@yufu-wang
Copy link

yufu-wang commented Dec 21, 2023

@uniBruce Hi. When the ground truth focal is unavailable, we estimate it from the dimension of the image as $\sqrt{h^2+w^2}$ as in here. Your image seems to be cropped from another image, so the focal estimation won't be accurate (this may not affect GART though). However, your ReFit result looks expected from my experience because it's not trained with crop-augmentation so the accuracy will drop a bit when the human is cropped like this.

I also added a script in the ReFit repo (here) that runs on a folder of images and save pose results compatiable for GART. Please give it a try.

@muximuxi
Copy link

muximuxi commented Jan 30, 2024

We tried both using InstantAvatar pre-processing to estimate video poses (with temporal optimization), and ReFit https://yufu-wang.github.io/refit_humans/ to estimate per-frame poses. It turns out that under the challenging UBC sequences, Ours and InstantAvatar will work better with ReFit poses. So we first estimate per-frame poses and I manually turn the poses into the same format as InstantAvatar preprocessing. So the data loader is actually loading in the Instant-Avatar preprocessing format.

hi! when I use the data which processed by the instantAvatar, after ./scripts/fit.sh , the resutl is bad, like this: can you give me some suggestions?
test

@anhnb206110
Copy link

anhnb206110 commented Sep 4, 2024

Hi, when I use real person video with da_pose pose, data preprocessed like InstantAvatar and trained GART for more than 25000 steps, the result is quite good but when you zoom in the image below, you can see RGB noise in the pants and shoes (pants should be really black, sleeves and shirt should be "smooth"), how should I edit the config or loss function to make the image sharper or the problem lies in my ground truth image? There is a problem in animate stage as the second image.

This is my config in ubc_mlp.yaml

TOTAL_steps: 50000 #15000 #30000
SEED: 12345

VIZ_INTERVAL: 500

CANO_POSE_TYPE: da_pose #da_pose #t_pose #da_pose
VOXEL_DEFORMER_RES: 64 #128 #64 #64 #128 #64 #128 #64

W_CORRECTION_FLAG: True
W_REST_DIM: 32 #0 #16
W_REST_MODE: pose-mlp #delta-list #pose-mlp
W_MEMORY_TYPE: voxel #voxel #point

F_LOCALCODE_DIM: 0

MAX_SCALE: 1.0
MIN_SCALE: 0.0 #0.0003 #0.003 #3
MAX_SPH_ORDER: 4
INCREASE_SPH_STEP: [3000, 5000, 6000, 7000] #[3000, 5000, 6000, 7000] #[1000, 2000, 3000]

INIT_MODE: on_mesh #near_mesh #near_mesh
OPACITY_INIT_VALUE: 0.99

ONMESH_INIT_SUBDIVIDE_NUM: 1
ONMESH_INIT_SCALE_FACTOR: 1.0
ONMESH_INIT_THICKNESS_FACTOR: 0.5

NEARMESH_INIT_NUM: 10000
NEARMESH_INIT_STD: 0.1
SCALE_INIT_VALUE: 0.01 # only used for random init

###########################

LR_P: 0.00016
LR_P_FINAL: 0.0000016
LR_Q: 0.001
LR_S: 0.005
LR_O: 0.05

LR_SPH: 0.0025
LR_SPH_REST: 0.0005

W_START_STEP: 500 #1000 #500 #2000 #300 #2000
LR_W: 0.0002 # 1 # 0.00001
LR_W_FINAL: 0.00002

LR_W_REST: 0.0002
LR_W_REST_FINAL: 0.00002
LR_W_REST_BONES: 0.0003 # for mlp

LR_F_LOCAL: 0.0

# Pose Optimize
POSE_R_BASE_LR: 0.0001
POSE_R_BASE_LR_FINAL: 0.00001
POSE_R_REST_LR: 0.0003
POSE_R_REST_LR_FINAL: 0.00001
POSE_T_LR: 0.0001
POSE_T_LR_FINAL: 0.00001
POSE_OPTIMIZE_START_STEP: 500 #1000

# Reg Terms
LAMBDA_MASK: 0.0 #0.01
MASK_LOSS_PAUSE_AFTER_RESET: 100

# other optim
N_POSES_PER_STEP: 1 #50 #1 #3 # increasing this does not help
RAND_BG_FLAG: True #True #True #True
# DEFAULT_BG: [0.0, 0.0, 0.0]
NOVEL_VIEW_PITCH: 0.0
IMAGE_ZOOM_RATIO: 1.0
VIEW_BALANCE_FLAG: True #True # True #True #False
BOX_CROP_PAD: 50

# GS Control
# densify
MAX_GRAD: 0.0002 #0.0003 #0.0005 #0.0006 # 0.0002
PERCENT_DENSE: 0.005 #0.01
DENSIFY_START: 500
DENSIFY_INTERVAL: 100 #300 #500 #1000 #300
DENSIFY_END: 9000 #10000 #15000
# prune
PRUNE_START: 500
PRUNE_INTERVAL: 300
OPACIT_PRUNE_TH: 0.01
RESET_OPACITY_STEPS: [3000, 5000] #[3000, 5000] #5000 #3000
OPACIT_RESET_VALUE: 0.01
# regaussian
REGAUSSIAN_STD: 0.015 #0.02 #0.02 #0.01
REGAUSSIAN_STEPS: [7000. 14000]

CANONICAL_SPACE_REG_K: 6
LAMBDA_STD_Q: 0.01
LAMBDA_STD_S: 0.01

LAMBDA_STD_O: 0.01
LAMBDA_STD_CD: 0.03
LAMBDA_STD_CH: 0.03
# LAMBDA_STD_W: 0.3
# LAMBDA_STD_W_REST: 0.3
LAMBDA_STD_W: 0.3
LAMBDA_STD_W_REST: 0.1
LAMBDA_KNN_DIST: 0.00

LAMBDA_W_NORM: 0.01
LAMBDA_W_REST_NORM: 0.1

START_END_SKIP: [0, 400, 1]

gart

The walking animation is as below, how to fix the error when animate?

novel_pose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants