Predicting Gender and Age of the author using semantic features
This project involves predicting personal information of authors like gender, age etc by training classifiers using content based and semantic features extracted from a KB like Wikipedia. There is a lot of contextual difference between blogs written by different people. This project will explore those contextual differences to predict age and gender of an author of a text. It basically consists of two phases:
-
Semantic representation of documents. This can be done by linking the entities to Wikipedia and mapping semantically related words to Wikipedia Category Network
-
Age and Gender prediction. This can be done by using any ML classifiers like SVM or KNN.
-
Recurrent Convolutional Neural Networks for Text Classification
-
Varma Et Al. 2013. Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors. Notebook for PAN at CLEF 2013.
-
Smith Et. Al. 2011. Author Age Prediction from Text using Linear Regression. In Proceedings of the ACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LATECH 2011), Portland, OR, June 2011.
-
Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstathios Stamatatos, and Giacomo Inches. Overview of the Author Profiling Task at PAN 2013. Proceedings of PAN at CLEF 2013.
-
Argamon Et. Al. 2009. Automatically profiling the author of an anonymous text. Communications of the ACM 52 (2): 119-123.
-
Schler Et. Al. 2006. Effects of Age and Gender on Blogging. In Proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, March 2006.
-
Meina Et. Al. 2013. Ensemble-based Classification for Author Profiling Using Various Features. The Notebook for 2013 PAN at CLEF 2013