Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues about DataPreprocessor when running the code #22

Closed
Hao-tianWang opened this issue Mar 11, 2024 · 9 comments
Closed

Issues about DataPreprocessor when running the code #22

Hao-tianWang opened this issue Mar 11, 2024 · 9 comments

Comments

@Hao-tianWang
Copy link

When I run

python scripts/train_expressive.py --config=config/pose_diffusion_expressive.yml

An issue occured on DataPreprocessor:

class DataPreprocessor:
    def __init__(self, clip_lmdb_dir, out_lmdb_dir, n_poses, subdivision_stride,
                 pose_resampling_fps, mean_pose, mean_dir_vec, disable_filtering=False):
        self.n_poses = n_poses
        self.subdivision_stride = subdivision_stride
        self.skeleton_resampling_fps = pose_resampling_fps
        self.mean_pose = mean_pose
        self.mean_dir_vec = mean_dir_vec
        self.disable_filtering = disable_filtering

        self.src_lmdb_env = lmdb.open(clip_lmdb_dir, readonly=True, lock=False)
        with self.src_lmdb_env.begin() as txn:
            self.n_videos = txn.stat()['entries']
TypeError: Transaction.stat() missing 1 required positional argument: 'db'

Please help me solve this issue, thanks a lot.

@Advocate99
Copy link
Owner

Hi, maybe it is due to the version of the package.
Is your lmdb version 0.96?

@Hao-tianWang
Copy link
Author

I find that maybe it is due to the version of python, I degrad the version of python to 3.7.12 and the problem is solved and the dataset can be loaded successfully, but another problem occured, when i load the dataloader:

train_dataset = SpeechMotionDataset(args.train_data_path[0],
                                        n_poses=args.n_poses,
                                        subdivision_stride=args.subdivision_stride,
                                        pose_resampling_fps=args.motion_resampling_framerate,
                                        mean_dir_vec=mean_dir_vec,
                                        mean_pose=args.mean_pose,
                                        remove_word_timing=(args.input_context == 'text')
                                        )
    train_loader = DataLoader(dataset=train_dataset, batch_size=args.batch_size,
                              shuffle=True, drop_last=True, num_workers=args.loader_workers, pin_memory=True,
                              collate_fn=collate_fn
                              )
ValueError: num_samples should be a positive integer value, but got num_samples=0

I think this seems no longer a version problem. Can you help me fix it? thx

@Hao-tianWang
Copy link
Author

I find that maybe it is due to the version of python, I degrad the version of python to 3.7.12 and the problem is solved and the dataset can be loaded successfully, but another problem occured, when i load the dataloader:

train_dataset = SpeechMotionDataset(args.train_data_path[0],
                                        n_poses=args.n_poses,
                                        subdivision_stride=args.subdivision_stride,
                                        pose_resampling_fps=args.motion_resampling_framerate,
                                        mean_dir_vec=mean_dir_vec,
                                        mean_pose=args.mean_pose,
                                        remove_word_timing=(args.input_context == 'text')
                                        )
    train_loader = DataLoader(dataset=train_dataset, batch_size=args.batch_size,
                              shuffle=True, drop_last=True, num_workers=args.loader_workers, pin_memory=True,
                              collate_fn=collate_fn
                              )
ValueError: num_samples should be a positive integer value, but got num_samples=0

I think this seems no longer a version problem. Can you help me fix it? thx

And I find that when turn the argument "shuffle" to false, the problem is solved, but it is right to set "shuffle" to false?

@Advocate99
Copy link
Owner

Hi, whether using the shuffle does not change the sample number.
It is due to the wrong path I guess. Can you check your path to the data? It is required to set the data as the readme or check it correct yourself.

@Hao-tianWang
Copy link
Author

Hi, whether using the shuffle does not change the sample number. It is due to the wrong path I guess. Can you check your path to the data? It is required to set the data as the readme or check it correct yourself.

Hi, I have checked the datapath and it is totally right, as shown in following log:

2024-03-11 18:32:48,213: Reading data 'data/ted_expressive_dataset/train'...
2024-03-11 18:32:48,214: Found the cache data/ted_expressive_dataset/train_cache

but if I set shuffle=True, this problem will still occure, so I a bit confused.

@Advocate99
Copy link
Owner

Hi, if the structure inside that directory is correct, maybe it is because the data file is damaged. Can you try the readme again?
And, you shall manually check the num of samples when you have the shuffle=False because it does not change the file number. Or the training cannot be normal.

@Hao-tianWang
Copy link
Author

Thanks a lot! I will re-download the data file and try again later.

@Hao-tianWang
Copy link
Author

Hi, sorry to bother again. It seems like my original data file has been broken, I re-download the ted-expressive dataset and I run into this warning when I start to create train data cache, as follows:

RuntimeWarning: invalid value encountered in true_divide 
v1_u = v1 / np.linalg.norm(v1)

I wonder if it is normal and whether it will cost harm to the training process? Thank you.

@Advocate99
Copy link
Owner

Hi, I have no idea of this, and I am not sure how the data break. Maybe finish the training and check the performance first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants