Machine learning Prediction with Myers-Brigg Type Indicator

Description:

I. How to install?

Using 'git clone' feature, clone the project to your IDE for coding (like VSCode or Google Colab). Make sure to either pip install or conda install libraries like scipy, wordcloud, nltk, seaborn and scikit-learn to run the code.

II. How to Run?

Download the datasets from the folder linked here: https://lsu.box.com/s/n58ia30eouwszswydrxkn6zejdd7co6y

mbti_cleaned.csv (used for 3730TorchClassifierBinary.ipynb and smaller_ds_ml.ipynb): mbti_cleaned.csv

MBTI500.csv ( used in Data500_prediction.ipynb): MBTI500.csv

In the Jupyter Notebook, link the datasets from your local machine. Make sure to check whether the dataset that is not attached to the code folder is mentioned with the correct path from your local computer. The additional datasets, which are cleaned are given in the folder breakdowns part below.
The last to run is the prediction where you will run the function preprocessed_text, then put the sentence you want to run into the variable trial_sentence and run all the cells below which will give the letters that predict the output.

File Breakdown (images.docx):

wordcloud - creates a word cloud for all the words

wordcloud_removed - creates a word cloud with certain common words removed due to redundancy(removed words: think, like, one, people, know)

topTen_bar - top ten words in every MBTI type

Folder Breakdown:

datasets_types - cleaned mbti_clean.csv split into the corresponding MBTI types

datasets_letters - cleaned mbti_clean.csv split into the corresponding MBTI dimensions (E, I, N, S, F, T, P, J)

big_datasets_types: cleaned MBTI500.csv split into the corresponding MBTI types

big_datasets_letters: MBTI500.csv split into the corresponding MBTI dimensions (E, I, N, S, F, T, P, J)

Packages Used:

For Data Cleaning and Analyzing
pandas as pd

re

nltk: to install stopwords and lemmatizing functionality

nltk.corpus import stopwords

nltk.stem import WordNetLemmatizer

numpy as np
For feature extraction
sklearn.pipeline import Pipeline

sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
For Machine Learning Algorithms

sklearn.model_selection import train_test_split, GridSearchCV

sklearn.naive_bayes import MultinomialNB

sklearn import metrics

sklearn.linear_model import LogisticRegression

sklearn.metrics import accuracy_score, classification_report, confusion_matrix, r2_score, mean_squared_error

torch

math

torchtext.data.utils import get_tokenizer

torchtext.vocab import build_vocab_from_iterator

torch.utils.data import DataLoader

torch import nn

time

from torch.utils.data.dataset import random_split

from torchtext.data.functional import to_map_style_dataset
For Data Visualization
wordcloud

matplotlib.pyplot as plt

seaborn as sns

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.gitattributes		.gitattributes
3730TorchClassifierBinary.ipynb		3730TorchClassifierBinary.ipynb
Data500_prediction.ipynb		Data500_prediction.ipynb
README.md		README.md
SAVETHISTOYOURLOCAL.txt		SAVETHISTOYOURLOCAL.txt
smaller_ds_ml.ipynb		smaller_ds_ml.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine learning Prediction with Myers-Brigg Type Indicator

Description:

About

Releases

Packages

Contributors 3

Languages

SereneSam/mbtiPredictor

Folders and files

Latest commit

History

Repository files navigation

Machine learning Prediction with Myers-Brigg Type Indicator

Description:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages