Skip to content

Machine learning prediction of the Myers Brigg personality test.

Notifications You must be signed in to change notification settings

SereneSam/mbtiPredictor

Repository files navigation

Machine learning Prediction with Myers-Brigg Type Indicator

MBTI

Description:

I. How to install?

Using 'git clone' feature, clone the project to your IDE for coding (like VSCode or Google Colab). Make sure to either pip install or conda install libraries like scipy, wordcloud, nltk, seaborn and scikit-learn to run the code.

II. How to Run?

  1. Download the datasets from the folder linked here: https://lsu.box.com/s/n58ia30eouwszswydrxkn6zejdd7co6y

mbti_cleaned.csv (used for 3730TorchClassifierBinary.ipynb and smaller_ds_ml.ipynb): mbti_cleaned.csv

MBTI500.csv ( used in Data500_prediction.ipynb): MBTI500.csv

  1. In the Jupyter Notebook, link the datasets from your local machine. Make sure to check whether the dataset that is not attached to the code folder is mentioned with the correct path from your local computer. The additional datasets, which are cleaned are given in the folder breakdowns part below.

  2. The last to run is the prediction where you will run the function preprocessed_text, then put the sentence you want to run into the variable trial_sentence and run all the cells below which will give the letters that predict the output.


File Breakdown (images.docx):

wordcloud - creates a word cloud for all the words

wordcloud_removed - creates a word cloud with certain common words removed due to redundancy(removed words: think, like, one, people, know)

topTen_bar - top ten words in every MBTI type

Folder Breakdown:

datasets_types - cleaned mbti_clean.csv split into the corresponding MBTI types

datasets_letters - cleaned mbti_clean.csv split into the corresponding MBTI dimensions (E, I, N, S, F, T, P, J)

big_datasets_types: cleaned MBTI500.csv split into the corresponding MBTI types

big_datasets_letters: MBTI500.csv split into the corresponding MBTI dimensions (E, I, N, S, F, T, P, J)


Packages Used:

  1. For Data Cleaning and Analyzing

    pandas as pd

    re

    nltk: to install stopwords and lemmatizing functionality

    nltk.corpus import stopwords

    nltk.stem import WordNetLemmatizer

    numpy as np

  2. For feature extraction

    sklearn.pipeline import Pipeline

    sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

  3. For Machine Learning Algorithms

    sklearn.model_selection import train_test_split, GridSearchCV

    sklearn.naive_bayes import MultinomialNB

    sklearn import metrics

    sklearn.linear_model import LogisticRegression

    sklearn.metrics import accuracy_score, classification_report, confusion_matrix, r2_score, mean_squared_error

    torch

    math

    torchtext.data.utils import get_tokenizer

    torchtext.vocab import build_vocab_from_iterator

    torch.utils.data import DataLoader

    torch import nn

    time

    from torch.utils.data.dataset import random_split

    from torchtext.data.functional import to_map_style_dataset

  4. For Data Visualization

    wordcloud

    matplotlib.pyplot as plt

    seaborn as sns

About

Machine learning prediction of the Myers Brigg personality test.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published