Skip to content

T22sri/Personality_Recognition_NLP

Repository files navigation

Personality Recognition

Overview

The project is designed to build a ‘neuroticism’ classifier using straightforward and understandable logistic regression as a baseline and discover which feature set, resources and learning techniques are useful in extraction of personality from text(and social media data). This project provides a useful sandbox for exploring natural language processing(NLP) techniques to improve the baseline model.

Project Background

The report and the problem were based on the article “Workshop on Computational Personality Recognition: Shared Task” which discusses personality traits values and social media statuses from well-known Big 5 personality traits (also known as OCEAN for the 5 traits it defines: Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism). The article discussed two datasets (Essays and My Personality) with gold standard labels user information used for personality recognition by 8 different groups. The goal of the workshop was for each group to go about finding personality recognition solutions on the same datasets in a way they saw most fit. This idea originated from the observation that much of the research being done in personality recognition was being done with varying resources and techniques that did not permit an adequate comparison between colleagues. Thus, at the conclusion of the workshop, the groups would present their work with the performance increase or decrease that was obtained, and they would serve as a benchmark from which future projects in a similar field could compare themselves to.