Source code and dataset for "Phenomena Explanation from Text: Unsupervised Learning of Interpretable and Statistically Significant Knowledge".
Abstract: Learning knowledge from text is becoming increasingly important as the amount of unstructured content on the Web rapidly grows. Despite recent breakthroughs in natural language understanding, the explanation of phenomena from textual documents is still a difficult and poorly addressed problem. Additionally, current NLP solutions often require labeled data, are domain-dependent, and based on black box models. In this paper, we introduce POIROT, a new descriptive text mining methodology for phenomena explanation. POIROT is designed to provide accurate and interpretable results in unsupervised settings, quantifying them based on their statistical significance. We evaluated POIROT on a medical case study, with the aim of learning the "voice of patients" from short social posts. Taking Esophageal Achalasia as a reference, we automatically derived scientific correlations with an F1 score of about 79% and built useful explanations on the patients' point of view on topics such as symptoms, treatments, drugs, and foods.
- DATA 2020, Learning Interpretable and Statistically Significant Knowledge from Unlabeled Corpora of Social Text Messages: A Novel Methodology of Descriptive Text Mining. Best Paper Award.
- KDIR 2020, Unsupervised Descriptive Text Mining for Knowledge Graph Learning.
All the instructions to fully replicate the results reported in our papers are in the notebook POIROT.ipynb
.
Please contact Giacomo Frisoni (giacomo.frisoni[at]unibo.it
) or Gianluca Moro (gianluca.moro[at]unibo.it
).
For help or issues using POIROT, please write to us.