Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make unnecessary computations optional #368

Merged
merged 1 commit into from
Dec 8, 2015

Conversation

gheinrich
Copy link
Contributor

Original (20 epochs LeNet on MNIST): 187s
Now: 142s

Helps with bug #339

@lukeyeager
Copy link
Member

Dang your machine is slow. My results:

Original (20 epochs LeNet on MNIST): 30s
Now: 30s

And now I'm seeing extra output in the log. I thought this PR turned OFF confusion matrices?

2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: ConfusionMatrix:
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [[     242       0       0       0       2       0       1       0       0       0]   98.776%   [class: 0]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       0     280       0       0       1       0       0       1       2       0]   98.592%    [class: 1]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       1       0     254       0       1       0       0       2       0       0]   98.450%    [class: 2]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       0       1       2     243       0       3       0       2       0       1]   96.429%    [class: 3]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       0       1       0       0     236       0       1       1       0       6]   96.327%    [class: 4]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       1       2       0       2       0     213       4       1       0       0]   95.516%    [class: 5]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       1       1       0       0       0       0     237       0       0       0]   99.163%    [class: 6]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       0       1       4       1       2       0       0     247       0       2]   96.109%    [class: 7]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       4       0       1       1       1       4       1       2     226       3]   93.004%    [class: 8]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: [       0       1       0       1       4       2       1       1       0     242]]  96.032%    [class: 9]
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: + average row correct: 96.839545369148%
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: + average rowUcol correct (VOC measure): 93.902345299721%
2015-10-15 11:17:51 [20151015-111721-72f4] [WARNING] Train Torch Model unrecognized output: + global correct: 96.877502001601%

@gheinrich
Copy link
Contributor Author

Perhaps our datasets don't have the same size? My MNIST dataset has 45k training samples and 15k validation samples. So you're saying the patch doesn't provide any speedup? I'll double check on my end.
Before the patch, accuracy and confusion used to be computed during both training and validation. This patch makes both optional in the Lua wrapper and the caller in torch_train.py turns it ON for validation (so we can draw the validation accuracy curve + we get validation confusion for free).

@lukeyeager
Copy link
Member

Oh, that was careless of me. I misnamed my dataset and didn't notice. Whoops!

20 epochs 5 epochs
Original 202s 56s
Now 204s 52s

I'm definitely picking up the new code because I'm seeing the confusion matrix in my log.

@gheinrich
Copy link
Contributor Author

It looks like I need to review my patch!

@gheinrich
Copy link
Contributor Author

Closing as patch needs to be revisited. May re-open later.

@gheinrich gheinrich closed this Oct 16, 2015
Training accuracy is not displayed in DIGITS (not for Caffe
either) so it is not necessary to compute training accuracy
and confusion matrix.

Disabling those computations speeds training up:

LeNet (MNIST, 30 epochs): 1m54s -> 1m40s
Alexnet (CIFAR10, 2 epochs): 5m14s -> 4m38s
GoogLeNet (reduced CIFAR10, 1 epoch): 2m4s -> 2m2s
@gheinrich gheinrich reopened this Dec 4, 2015
@gheinrich
Copy link
Contributor Author

Re-opening with a new patch. These are the numbers I get:

LeNet (MNIST, 30 epochs) Alexnet (CIFAR10, 2 epochs) GoogLeNet (reduced CIFAR10, 1 epoch)
original 1m54s 5m14s 2m4s
now 1m40s 4m38s 2ms2

@lukeyeager
Copy link
Member

I've verified similar results on my machine. LGTM!

lukeyeager added a commit that referenced this pull request Dec 8, 2015
@lukeyeager lukeyeager merged commit 240eb85 into NVIDIA:master Dec 8, 2015
@gheinrich gheinrich deleted the dev/torch-optional-confusion branch April 14, 2016 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants