Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Investigate using text and sparse input in TensorFlow #747
Most of the text models in TensorFlow (and in any other DNN platform in general) uses an embedding layer to handle text. This is against the bag-of-word model approach where a vector is formed for the words/characters in the text. The indices of the vector refer to the words/characters and the values represent the TF/TF-IDF or any other scores computed for words/characters.
The bag-of-word model requires vectors to be represented in sparse format because number of words/characters appearing in the text is very large. However, when using models with embedding layers, sparse format is not needed because input to embedding layers is not typically that large. So, we are fine with dense format.
However, when working with text models, I found out following issues.
I currently don't see any issue with retrieving outputs from TensorFlow. I will write more if I encounter other issues.