Skip to content

Concurrency Issues on Data Downloading [BUG] #102

@SamuelGong

Description

@SamuelGong

Describe the bug
For those datasets that are not shipped by torch and thus have to be manually downloaded (e.g., cinic10, multimodal_base, pascal_voc, and tiny_imagenet), they are currently downloaded as a whole (i.e., the whole training and testing datasets) in the constructors of the respective DataSource instances.

While this design may function well in the testing environment where servers and all the clients colocate in one machine, it may come across with severe concurrency issues in some situations such as that in Deploying a Plato Federated Learning Server in a Production Environment, which Plato also aims to support.

To see that, consider the two cases separately:

  • For the former case, it is always the server who starts to call its configure() method, and only when the call returns does the server spawns clients in the same machine. In this way, when clients call their configure() independently, none of them needs to download the dataset, again, as it is well prepared as a whole during the initialization of the server.
  • For the latter case, however, the server may not colocate with clients. If a remote machine (where there is no server) hosts multiple clients and these clients are concurrently initialized, then the current design implies the possibility that these clients all (1) think that the desired data is not ready at the local storage, and thus (2) download and preprocess (at least "unzip") the data concurrently. If this is the case,
    1. network bandwidth/CPU cycles/memory will be wasted due to redundant work,
    2. program runtime will be elongated out of the same reason, and more importantly,
    3. unexpected stalls or faults may be caused for concurrent creation of the dataset at the file system.

To Reproduce
This bug should conceptually make sense. We may provide the steps for reproducing it later, if necessary.

Additional context
We spotted this bug during the development of a new feature FEMNIST. Since the solution looks like a non-trivial design problem, we prefer seeking the authors' help before working out any immature solution.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions