-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU #2870
Merged
Merged
Multi-GPU #2870
Commits on Aug 9, 2015
-
Configuration menu - View commit details
-
Copy full SHA for d94ca3f - Browse repository at this point
Copy the full SHA d94ca3fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 45d792e - Browse repository at this point
Copy the full SHA 45d792eView commit details -
Change the way threads are started and stopped
- Interrupt the thread before waiting on join - Provide a method for looping threads to exit on demand - CHECK if start and stop succeed instead of returning an error
Configuration menu - View commit details
-
Copy full SHA for 73b3d13 - Browse repository at this point
Copy the full SHA 73b3d13View commit details -
1
Configuration menu - View commit details
-
Copy full SHA for ddcdc9d - Browse repository at this point
Copy the full SHA ddcdc9dView commit details -
Add DataReader for parallel training with one DB session
- Make sure each solver accesses a different subset of the data - Sequential reading of DB for performance - Prefetch a configurable amount of data to host memory - Distribute data to solvers in round-robin way for determinism
Configuration menu - View commit details
-
Copy full SHA for bcc8f50 - Browse repository at this point
Copy the full SHA bcc8f50View commit details -
Allocate host memory through cudaMallocHost
thanks to discussion by @thatguymike and @flx42
Configuration menu - View commit details
-
Copy full SHA for d2f0457 - Browse repository at this point
Copy the full SHA d2f0457View commit details -
- Parallelize batches among GPUs and tree-reduce the gradients - The effective batch size scales with the number of devices - Batch size is multiplied by the number of devices - Split batches between GPUs, and tree-reduce the gradients - Detect machine topology (twin-GPU boards, P2P connectivity) - Track device in syncedmem (thanks @thatguymike) - Insert a callback in the solver for minimal code change - Accept list for gpu flag of caffe tool, e.g. '-gpu 0,1' or '-gpu all'. Run on default GPU if no ID given. - Add multi-GPU solver test - Deterministic architecture for reproducible runs
Configuration menu - View commit details
-
Copy full SHA for e5575cf - Browse repository at this point
Copy the full SHA e5575cfView commit details -
Detect topology corner cases and improve broadcast order
- Start with distant nodes in broadcast - Fix outside loop to loop for full tree depth
Configuration menu - View commit details
-
Copy full SHA for 335bee7 - Browse repository at this point
Copy the full SHA 335bee7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8771d0f - Browse repository at this point
Copy the full SHA 8771d0fView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.