You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use code with my dataset (training now goes).
For this i am trying to replicate structure of Ubuntu Dialog Corpus (UDC) from https://arxiv.org/pdf/1506.08909v3.pdf
In you article assumed that "The training data consists of 1,000,000 examples, 50% positive (label 1) and 50% negative (label 0)" (http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/)/
So i made a copy of dataset with Pandas and take random Utterance with flag 0.
Result is doubled dataset: each Context and Utterance appear 2 times, 1 correct Utterance and 1 incorrect.
Is that right for training?
In paper https://arxiv.org/pdf/1506.08909v3.pdf in is stated "In our experiments below, we consider both the case of 1 wrong response and 10 wrong responses." - this is completely other approach.
Should i add to train set 11 copies of Context - Utterance randomly selected to get more accurate results?
Also, i don't have ’EOS’ tag in my Context - so naturally context is not dialog merged, it is one big problem post. How this can influence ?
The text was updated successfully, but these errors were encountered:
I am trying to use code with my dataset (training now goes).
For this i am trying to replicate structure of Ubuntu Dialog Corpus (UDC) from https://arxiv.org/pdf/1506.08909v3.pdf
In you article assumed that "The training data consists of 1,000,000 examples, 50% positive (label 1) and 50% negative (label 0)" (http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/)/
So i made a copy of dataset with Pandas and take random Utterance with flag 0.
Result is doubled dataset: each Context and Utterance appear 2 times, 1 correct Utterance and 1 incorrect.
Is that right for training?
In paper https://arxiv.org/pdf/1506.08909v3.pdf in is stated "In our experiments below, we consider both the case of 1 wrong response and 10 wrong responses." - this is completely other approach.
Should i add to train set 11 copies of Context - Utterance randomly selected to get more accurate results?
Also, i don't have ’EOS’ tag in my Context - so naturally context is not dialog merged, it is one big problem post. How this can influence ?
The text was updated successfully, but these errors were encountered: