Training and Resuming
Pages 35
- Home
- AWS EC2 GPU enabled Caffe AMI
- Borrowing Weights from a Pretrained Network
- Caffe installing script for ubuntu 16.04 support Cuda 8
- Caffe on EC2 Ubuntu 14.04 Cuda 7
- Caffe Output: .caffemodel .solverstate
- Contributing
- Development
- Excluding Layers: Train and Test Phase
- Faster Caffe Training
- Fine Tuning or Training Certain Layers Exclusively
- GeForce GTX 1080, CUDA 8.0, Ubuntu 16.04, Caffe
- IDE Nvidia’s Eclipse Nsight
- Image Format: BGR not RGB
- Install Caffe on EC2 from scratch (Ubuntu, CUDA 7, cuDNN 3)
- Installation
- Installation (OSX)
- Making Prototxt Nets with Python
- Model Zo
- Model Zoo
- Models accuracy on ImageNet 2012 val
- OpenCV 3.2 Installation Guide on Ubuntu 16.04
- Python Layer Unit Tests
- Related Projects
- Reporting Bugs and Other Issues
- Simple Example: Sin Layer
- Solver Prototxt
- The Data Layer
- The Datum Object
- Training and Resuming
- Ubuntu 14.04 ec2 instance
- Ubuntu 14.04 VirtualBox VM
- Ubuntu 16.04 or 15.10 Installation Guide
- Using a Trained Network: Deploy
- Working with Blobs
- Show 20 more pages…
Clone this wiki locally
Training and Resuming
In caffe, during training files defining the state of the network will be output: .caffemodel and .solverstate. These two files define the current state of the network at a given iteration, and with this information we are able to continue training our network in the case of a hiccup, pause for diagnosis, or a system crash.
Training
To begin training, we simply need to call the caffe binary and supply a solver:
caffe train -solver solver.prototxt
Stopping
Number of Iterations Limit
We can have our network stop after a specified number of iterations with a parameter in the solver.prototxt named max_iter.
For example, we can specify that we would like our network to stop after 60,000 iteration, thus we set the parameter accordingly: max_iter: 600000.
Manually Stopping
It is possible to manually stop a network from training by pressing the Ctrl+C key combination. When the stop signal is sent, the network will halt the forward and backwards pass, and output the current state of the network in a .caffemodel and .solverstate titled with the current iteration number.
Resuming
When a network as stopped training, either due to manual halting or by reaching the maximum iterations, we may continue training our network by telling caffe to train from where we left off. This is as simple as supplying the snapshot flag with the current .solverstate file. For example:
caffe train -solver solver.prototxt -snapshot train_190000.solverstate
In this case we will continue training from iteration 190000.