Skip to content

Vega-daniel/Predicting_personality

Repository files navigation

Personality Predictor

by Daniel Vega

Table of Contents

Introduction

The Myers–Briggs Type Indicator (MBTI) is an introspective self-report questionnaire with the purpose of indicating differing psychological preferences in how people perceive the world around them and make decisions. The MBTI was constructed by Katharine Cook Briggs and her daughter Isabel Briggs Myers. It is based on the conceptual theory proposed by Carl Jung, who had speculated that humans experience the world using four principal psychological functions – sensation, intuition, feeling, and thinking – and that one of these four functions is dominant for a person most of the time.

Goal:

  • Learn more about the correlations and differences between each personality type.
  • Derive visuals and compare the personality types against each other.
  • Given sufficient text, predict the personality type of the individual.

Motivation:

I find psychology very interesting, I believe the more information people have, in this case about the personality type, the easier it will be for people to understand each other. Not to mention once we understand an individual's personality, we can help create an environment where they will succeed.

Back to top

Overview of the Data

First Dataset:

This data was collected through the PersonalityCafe forum, as it provides a large selection of people and their MBTI personality type, as well as what they have written.

  • There are 8675 observations(rows)
  • Each row has 1 individual’s personality type and their last 50 posts
  • The personality type shown is selected by the user although the forum has a link to the test for those members who do not know what personality type they belong to.
- type posts
0 INFJ 'http://www.youtube.com/watch?v=qsXHcwe3krw
1 ENTP 'I'm finding the lack of me in these posts ver..'
2 INTP 'Good one _____ https://www.youtube.com/wat...'
3 INTJ 'Dear INTP, I enjoyed our conversation the o... '
4 ENTJ 'You're fired.

Second Dataset:

This Data set comes from "MBTI Manual" published by CPP

  • Shows the frequency of each personality type in the population
- Type Frequency
0 ISFJ 13.8%
1 ESFJ 12.3%
2 ISTJ 11.6%
3 ISFP 8.8%
4 ESTJ 8.7%

Back to top

Exploratory Data Analysis

Performing EDA on our data set revealed a few things. They are summarized by the graphs below:

Data Unbalanced Questions per post
Links per post Words per post

For further EDA please look at the summary here

Back to top

Data Pipeline

Let's create a data pipeline, it will aim to do the following:

  • Standardize the text to ASCII
  • Remove weblinks
  • Tokenize the words
  • Use a stemmer on the words
  • Remove HTML decoding
  • Remove punctuation
  • Remove stopwords

The code to do this can be found here

Back to top

Model Selection

Went through different machine learning algorithms in order to find a model that can predict the personalities. Random would be 1/16 or 0.0625. That is really low, so for our model let's aim to achiece results higher than 50%. The code for this can be found here

We will use the following models:

  • Random Forest - Accuracy = 0.3614985590778098
  • Gradient Boosting Classifier - Accuracy = 0.650787552823665
  • Naive Bayes - Accuracy = 0.22051479062620052
  • Logistic Regression - Accuracy = 0.6300422589320015
  • Support Vector Machine - Accuracy = 0.6699961582789089

Back to top

Deep Learning

Creating a Neural Network gives us a much higher accuracy score. The code for this can be found here

Accuracy = 0.9865539761813292

Back to top

Emotional Analysis

Next let's dive into the emotions by each personality type. The code for this can be found here.

Extroverted Introverted

Back to top

WordClouds

Now let's go back to the data and see what we can derive

  • Created another dictionary with high frequency words by Personality Type
  • This can help us make some word clouds but first we need to clean our data
  • Created a list of the 30 most common words among all personality types
  • Removed the words in that list from our dataset

Let's get a bit fancy, instead of the default wordclouds, we can use a template for them, since we are talking about the mind, let's use a head.

Extroverted Introverted
ENTP INTP

Back to top

Conclusion and Next Steps

  • Took the datasets and performed Exploratory Data Analysis
  • Created a data pipeline
  • Built several models and picked support vector machine with stochastic gradient descent due to it's high accuracy and precision
  • Built a Neural Network which improved gave great accuracy but was overfit to the over represnted classes
  • Performed emotional analysis for each personality type
  • Created Word Clouds based on the frequancy of words used by each personality type.
  • Next step would be to gather data from another place like twitter or facebook and see if we can predict personalities based on that text

Back to top

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published