You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quora recently released a dataset of 400000 potential near duplicate sentences pairs with labels indicating whether they are indeed near-duplicates. There is also some work on this dataset done here using deep learning approaches.
It could be interesting to use this dataset as ground truth and evaluate to what extend the near duplicates can be detected with FreeDiscovery (and estimating the optimal similarity threshold).
The text was updated successfully, but these errors were encountered:
Quora recently released a dataset of 400000 potential near duplicate sentences pairs with labels indicating whether they are indeed near-duplicates. There is also some work on this dataset done here using deep learning approaches.
It could be interesting to use this dataset as ground truth and evaluate to what extend the near duplicates can be detected with FreeDiscovery (and estimating the optimal similarity threshold).
The text was updated successfully, but these errors were encountered: