Discourse analysis for DBpedia chatbot: http://chat.dbpedia.org/
Description of notebooks:
data_exploration.ipynb houses code for grouping chats w.r.t. user_id and for preliminary analysis, such as, finding average length of conversation and number of users.
In analysis.ipynb, we find -
- the most used channel (web/slack/facebook messenger)
- no. of failed responses per conversation and no. of questions that did not satisfy users
- Conversation length after a negative feedback
- character length of user-requests
- perform NER and find commonly asked topics
- if coreferences exist
- the language of user-requests
Use dependency_parsing.ipynb to get the estimate of the number of complex questions asked and to prepare input (candidate pairs) for intent clustering.
The clustering folder contains 2 implementations (KMeans and HDBSCAN) for finding the latent-intents in utterance representations. Use get_sentence_embeddings.ipynb, preferably on Google Colab, to fetch sentence embeddings for clustering user-requests based on their semantics.