You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, I have a question about the deterministic batch input of image_data_layer when doing parallel training. Suppose we have a dataset which only contains four batches named A, B, C, D, respectively. And we have 4 solvers (S1,S2,S3,S4) by using 4 GPUs. We also suppose that the dataset will not be randomly shuffled during training. I have checked the implementation of BasePrefetchingDataLayer to find it is only guaranteed that different solvers get their input batch sequentially but not in fixed order. Then I wonder we may encounter the following problem: at T-th iteration, the input batch for S1S4 may be A, B,C, D, respectively, but at the next iteration, it is quite probable the input batch for S1S4 might become B, C, A, D or something else. Such non-deterministic behavior may be dangerous in some cases. Could anyone kindly tell me whether my above doubt are correct?
Besides, could anyone please explain to me why the following sentences "using Params::size_; using Params::data_; using Params::diff_;" are used in the definition of classes: GPUParams and P2PSync (defined in parallel.hpp)? Personally, the using-declarations are generally to solve the problem that members in base class are shadowed in derived class, which however seems not the case for GPUParams and P2PSync. Therefore, I wonder if these sentences are necessary.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Currently only the LMDB and LevelDB layers are deterministic, by going through data_reader. We are trying to fix the other layers by switching to a skip approach like in #4563. The using statements are only there to avoid having to type this-> every time. C++ should allow direct access to protected fields, but it doesn't for templated classes for some reason.
Hi all, I have a question about the deterministic batch input of image_data_layer when doing parallel training. Suppose we have a dataset which only contains four batches named A, B, C, D, respectively. And we have 4 solvers (S1,S2,S3,S4) by using 4 GPUs. We also suppose that the dataset will not be randomly shuffled during training. I have checked the implementation of BasePrefetchingDataLayer to find it is only guaranteed that different solvers get their input batch sequentially but not in fixed order. Then I wonder we may encounter the following problem: at T-th iteration, the input batch for S1
S4 may be A, B,C, D, respectively, but at the next iteration, it is quite probable the input batch for S1S4 might become B, C, A, D or something else. Such non-deterministic behavior may be dangerous in some cases. Could anyone kindly tell me whether my above doubt are correct?Besides, could anyone please explain to me why the following sentences "using Params::size_; using Params::data_; using Params::diff_;" are used in the definition of classes: GPUParams and P2PSync (defined in parallel.hpp)? Personally, the using-declarations are generally to solve the problem that members in base class are shadowed in derived class, which however seems not the case for GPUParams and P2PSync. Therefore, I wonder if these sentences are necessary.
Thanks in advance!
The text was updated successfully, but these errors were encountered: