# Overview
We found 4 datasets on kaggle about different aspects related to the MBTI test:
1. How the MBTI types are distributed around the world: [MBTI-TYPES Data](https://www.kaggle.com/datasets/yamaerenay/mbtitypes-full)
2. Posts by people of different MBTI types: [(MBTI) Myers-Briggs Personality Type Dataset](https://www.kaggle.com/datasets/datasnaek/mbti-type)
3. MBTI types of movie charaters: [Movie Character MBTI Dataset](https://www.kaggle.com/datasets/subinium/movie-character-mbti-dataset)
4. MBTI types and birthdays: [MBTI and Birthdays](https://www.kaggle.com/datasets/dakotagravitt/mbti-and-birthdays)

First, we plan to perform an analysis on these datasets to see if they can be used for visualization.

In [1]:
import pandas as pd
import plotly.express as px

# Distribution dataset
This dataset contains 2 files:

1. countries.csv: compares distribution of MBTI-Types in 158 countries
2. types.csv: contains following information:
    - Type: type (as index)
    - Description: a to-the-point description of type
    - Nickname: nickname associated with personality
    - Definition: a formal definition of type
    - Celebrities: celebrities of that type
    - E/I/T/F/S/N/J/P: abbreviation of features like "extroverted", "sensing", "judgmental" etc.
    - and other features like strengths and weaknesses, romantic relationships…
    
We are only interested in the first one.

In [2]:
distribution_df = pd.read_csv('distribution/countries.csv')
distribution_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 158 entries, 0 to 157
Data columns (total 33 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Country  158 non-null    object 
 1   ESTJ-A   158 non-null    float64
 2   ESFJ-A   158 non-null    float64
 3   INFP-T   158 non-null    float64
 4   ESFJ-T   158 non-null    float64
 5   ENFP-T   158 non-null    float64
 6   ENFP-A   158 non-null    float64
 7   ESTJ-T   158 non-null    float64
 8   ISFJ-T   158 non-null    float64
 9   ENFJ-A   158 non-null    float64
 10  ESTP-A   158 non-null    float64
 11  ISTJ-A   158 non-null    float64
 12  INTP-T   158 non-null    float64
 13  INFJ-T   158 non-null    float64
 14  ISFP-T   158 non-null    float64
 15  ENTJ-A   158 non-null    float64
 16  ESTP-T   158 non-null    float64
 17  ISTJ-T   158 non-null    float64
 18  ESFP-T   158 non-null    float64
 19  ENTP-A   158 non-null    float64
 20  ESFP-A   158 non-null    float64
 21  INTJ-T   158 non

In [8]:
# Find the most common type in each country
distribution_df_common = distribution_df.set_index('Country')
distribution_df_common['common'] = distribution_df_common.idxmax(axis="columns")
distribution_df_common = distribution_df_common.reset_index(level=0)

fig = px.choropleth(distribution_df_common, locations="Country", color="common", locationmode="country names", projection='natural earth')
fig.show()

In [6]:
# Show proportion of each MBTI type in each country
distribution_df_melt = distribution_df.melt(id_vars=['Country'])

fig = px.line_polar(distribution_df_melt, r="value", theta="variable", line_close=True, animation_frame='Country')
fig.show()

# Post dataset

# Movie dataset

In [19]:
movie_df = pd.read_csv('movie/mbti.csv')
movie_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18741 entries, 0 to 18740
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   stat       18476 non-null  object
 1   mbti       18741 non-null  object
 2   enneagram  12066 non-null  object
 3   role       18741 non-null  object
 4   movie      18741 non-null  object
 5   img_url    18741 non-null  object
dtypes: object(6)
memory usage: 878.6+ KB


In [69]:
movie_df['movie'].unique()

array(['   Marvel Cinematic Universe', '   Star Wars',
       '   Harry Potter (franchise)', ..., '   The 6th Day (2000)',
       '   The Switch (2010)', '   Dignitate (2020)'], dtype=object)

In [73]:
# Show characters in each MBTI type in Marvel movies 
pd.set_option('mode.chained_assignment', None)
marvel_charaters = movie_df[movie_df['movie'].str.contains('Marvel')].drop_duplicates(subset=['role'])
marvel_charaters['value'] = 1
marvel_charaters = marvel_charaters[['mbti', 'role', 'value']]

fig = px.sunburst(marvel_charaters, path=['mbti', 'role'], values='value')
fig.show()

In [71]:
# Show characters in each MBTI type in Harry Potter
pd.set_option('mode.chained_assignment', None)
hp_charaters = movie_df[movie_df['movie'].str.contains('Harry Potter')].drop_duplicates(subset=['role'])
hp_charaters['value'] = 1
hp_charaters = hp_charaters[['mbti', 'role', 'value']]

fig = px.sunburst(hp_charaters, path=['mbti', 'role'], values='value')
fig.show()

# Birthday dataset