Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text classification example #684

Merged
merged 1 commit into from
May 11, 2016

Conversation

gheinrich
Copy link
Contributor

No description provided.

@TimZaman
Copy link
Contributor

TimZaman commented Apr 21, 2016

Hi Greg. This is awesome. Read the paper earlier as well and thought it was awesome. Thanks man, this is very cool!
Tried the regular approach (now busy on the alternative one).
I double checked and did exactly as was outlined. I do get an error.
The error is

/usr/share/lua/5.1/cudnn/TemporalConvolution.lua:92: bad argument #1 to 'size' (out of range)

And the verbose output of torch is

Last output:
Running initial validation before first train epoch..
Validation (epoch 0): loss = 3.2712374114555, accuracy = 0.071214285714286
Training (epoch 0.0005): loss = 4.3866891860962, lr = 0.01
/usr/share/lua/5.1/cudnn/TemporalConvolution.lua:92: bad argument #1 to 'size' (out of range)

Weird. I might try recreating the dataset to make absolutely sure i made it correctly. However, did seem to pass the Validation epoch (7.12% accuracy epoch 0) which tells me that the model and the data are probably legit.

update
Tried the LMDB. same thing:

ERROR: /usr/share/lua/5.1/cudnn/TemporalConvolution.lua:92: bad argument #1 to 'size' (out of range)
Last output:
Model weights will be saved as snapshot_<EPOCH>_Weights.t7
started training the model
Running initial validation before first train epoch..
Validation (epoch 0): loss = 2.9317825446471
Training (epoch 0.0002): loss = 4.5439367294312, lr = 0.01
/usr/share/lua/5.1/cudnn/TemporalConvolution.lua:92: bad argument #1 to 'size' (out of range)     

update
It has nothing to do with the dataset, because i can run a normal 32x32 grayscale model just fine.
Reinstaling torch-nv and luarocks didn't help

update 2
Found it. if i disabled cudnn in your model and used nn as a backend it would work..

@gheinrich
Copy link
Contributor Author

Hi @TimZaman thanks for the feedback! I used cudnn.TemporalConvolution to train the network (I found it to be 17 times faster than the regular nn.TemporalConvolution! Just to check: did you remember to set mean subtraction to none when you created the classification model?

@gheinrich
Copy link
Contributor Author

hi @TimZaman are you still having an issue with this example?

@TimZaman
Copy link
Contributor

TimZaman commented May 2, 2016

Hi @gheinrich, i did not use mean substraction as i was supposed to.
I have last week swapped the (bit) outdated torch-nv to the normal torch installation. This did the trick. Other than the easy install of torch-nv, are there any advantages in using torch-nv wrt the build from source?
(so i guess this can be merged, nice one.)

@lukeyeager
Copy link
Member

I tested the dataset creation and import. When I tried to test the model, I got this error:

[FAIL] ...5-02/install/share/lua/5.1/cudnn/TemporalConvolution.lua:92: bad argument #1 to 'size' (out of range)

Maybe it's because I don't have this commit: soumith/cudnn.torch#152? I had to start with a non-standard fork (16.04 issues), so I'm having trouble updating just the cudnn module. Sorry for my noobery! I'll find some time to figure out the Torch build system and keep testing this, but it looks good so far.

@gheinrich
Copy link
Contributor Author

Oh so it's the same error @TimZaman got. You could update just cudnn by doing something like:

$ cd .../torch/extra/cudnn
$ git fetch origin && git checkout origin/master
$ luarocks make cudnn-scm-1.rockspec 

@gheinrich
Copy link
Contributor Author

Other than the easy install of torch-nv, are there any advantages in using torch-nv wrt the build from source?

I don't know of other advantages...

@TimZaman
Copy link
Contributor

TimZaman commented May 2, 2016

Yeah so the cudnn stuff is not DIGITS related. @gheinrich I would recommend just leaving out the cudnn, switch to nn, albeit 17x slower (worked fine with me, but maybe make a note about cudnn being faster but you'd have to update) and just merge.

@lukeyeager
Copy link
Member

Got it working. Cool tutorial!

text-classification

@lukeyeager lukeyeager merged commit 18c6873 into NVIDIA:master May 11, 2016
@gheinrich gheinrich deleted the dev/text-classification branch November 30, 2016 16:51
SlipknotTN pushed a commit to cynnyx/DIGITS that referenced this pull request Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants