This repository contains the source code for the paper Alimova I., Tutubalina E. Automated detection of adverse drug reactions from social media posts with machine learning //Proceedings of international conference on analysis of images, social networks and texts. – 2017.
CADEC corpus is taken from Karimi, S., Metke-Jimenez, A., Kemp, M., Wang, C.: Cadec: A corpus of adverse drug event annotations. Journal of biomedical informatics 55 (2015) 73-81 and can be downloaded from https://data.csiro.au
Twitter corpus is taken from Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. Journal of biomedical informatics 53 (2015) 196-207 and can be downloaded from http://diego.asu.edu/index.php?downloads=yes
Sentiment lexicons
Opinion lexicon
article - Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and datamining, ACM (2004) 168-177
download - https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon
SentiWordnet 3.0
article - Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: LREC. Volume 10. (2010) 2200-2204
download - http://sentiwordnet.isti.cnr.it/
MPQA Subjectivity Lexicon
article - Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phraselevel sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics (2005) 347-354
download - https://github.com/kuitang/Markovian-Sentiment/blob/master/data/subjclueslen1-HLTEMNLP05.tff
ADR lexicons
DIEGO Lab ADR Lexicon
article - Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. Journal of biomedical informatics 53 (2015) 196-207
download - http://diego.asu.edu/Publications/ADRSMReview/ADRSMReview.html
SIDER
download - http://sideeffects.embl.de/
COSTART download - https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/CST/
Word vector representation
Word2vec model
article - Miftahutdinov Z. S., Tutubalina E. V., Tropsha A. E. Identifying disease-related expressions in reviews using conditional random fields //Komp’juternaja Lingvistika i Intellektual’nye Tehnologii. – 2017. – Т. 1. – №. 16. – С. 155-166.
download - https://github.com/dartrevan/ChemTextMining/tree/master/word2vec/Health_2.5mreviews.s200.w10.n5.v15.cbow.bin
Brown clusters
article - Miftahutdinov Z. S., Tutubalina E. V., Tropsha A. E. Identifying disease-related expressions in reviews using conditional random fields //Komp’juternaja Lingvistika i Intellektual’nye Tehnologii. – 2017. – Т. 1. – №. 16. – С. 155-166.
download - https://raw.githubusercontent.com/dartrevan/ChemTextMining/master/clustered_words/brown_clusters/brown_input-150/paths