Support Word2Vec tasks #1361

pl8787 · 2014-10-26T02:44:22Z

Does caffe support Work2Vec tasks or some tasks on text?

zyfnhct · 2014-11-12T02:00:53Z

I'm trying to use caffe to modeling sentence, do you have any suggestion?

pl8787 · 2014-11-12T02:19:43Z

@zyfnhct That's great. I do the same thing.
I think the most important thing is the data input layer.

zyfnhct · 2014-11-14T07:20:16Z

@pl8787 I'm going to implement the model introduced in A Convolutional Neural Network for Modelling Sentences. I want to combine the model in the paper and the skip-gram model. But I don't know any other open project can do it easy, I want to try caffe . If you know other better project ,please tell me , thanks.

pl8787 · 2014-11-20T09:52:52Z

@zyfnhct I don't know too. But I think implement some layer in caffe is the easest way to do.

cNikolaou · 2014-11-27T13:05:47Z

I think that there is possible to use Caffe for tasks other than vision (as it is stated in the Caffe paper). You just need to find a way to represent your data as a blob.

I have not worked on such tasks, but there might be others who have done something like that, so have a look at the Caffe users mailing list. Also, the Caffe Tutorial might be useful if you try to do that by yourself.

If you manage to define a network for word tasks, maybe you would like to update the information on the tutorial (or make an example) on how to use Caffe for these kind of tasks.

shelhamer · 2015-01-16T06:37:53Z

I think that there is possible to use Caffe for tasks other than vision (as it is stated in the Caffe paper). You just need to find a way to represent your data as a blob.

That's the key. There are speech and haptics projects that encode their data as blobs and run Caffe networks.

buriy · 2015-02-20T08:58:18Z

I wouldn't use a blob, cause it's a 10-100x overhead, so you won't be able to learn on large amounts of data.
word (10-50 bytes) -> word2vec (4k bytes for 1024 dimensional w2v) -> ...
transform would be much better (and word2vec could be mapped with anything like lmdb or in-memory database).
For 1 million of words, you need 1 GB of disk.
What about learning on 6 Billion words corpus?

shelhamer · 2015-02-20T09:11:50Z

See the EmbedLayer in #1872 for tasks that are modeled as one-hot vectors like some language models. The layer works on the indices instead of the explicit vectors.

buriy · 2015-02-20T09:27:05Z

@shelhamer , Thanks, looks like it.
Are you going to add and document some standard way to load external data from a separate file (e.g. postprocesed w2v file into EmbedLayer)?

shelhamer closed this as completed Jan 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Word2Vec tasks #1361

Support Word2Vec tasks #1361

pl8787 commented Oct 26, 2014

zyfnhct commented Nov 12, 2014

pl8787 commented Nov 12, 2014

zyfnhct commented Nov 14, 2014

pl8787 commented Nov 20, 2014

cNikolaou commented Nov 27, 2014

shelhamer commented Jan 16, 2015

buriy commented Feb 20, 2015

shelhamer commented Feb 20, 2015

buriy commented Feb 20, 2015

Support Word2Vec tasks #1361

Support Word2Vec tasks #1361

Comments

pl8787 commented Oct 26, 2014

zyfnhct commented Nov 12, 2014

pl8787 commented Nov 12, 2014

zyfnhct commented Nov 14, 2014

pl8787 commented Nov 20, 2014

cNikolaou commented Nov 27, 2014

shelhamer commented Jan 16, 2015

buriy commented Feb 20, 2015

shelhamer commented Feb 20, 2015

buriy commented Feb 20, 2015