# Title: What is the Myers–Briggs Type Indicator (MBTI) Associated with the Type of Music?

Link to the dataset: https://www.kaggle.com/datasets/xtrnglc/spotify-mbti-playlists

In [1]:
library(repr)
library(tidyverse)
library(tidymodels)
options(repr.matrix.max.rows = 6)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.0     [32m✔[39m [34mrsample     [39m 1.0.0
[32m✔[39m [34mdials       [39m 1.0.0     [32m✔[39m [34mtune        [39m 1.0.0
[32m✔[39m [34minfer       [39m 1.0.2     [32m✔[39m [34mworkflows   [39m 1.0.0
[32m✔

### Introduction:
The Myers-Briggs Type Indicator (MBTI), based on the theory of psychological types described by C. G. Jung, describes personality across four axes with a total of sixteen distinct combinations (The Myers & Briggs Foundation). Research shows that musical taste is related to personality, but little evidence demonstrates how predictable the musical features are. This study attempts to classify the MBTI associated with the given musical characteristics of a song using a k-nearest neighbor classification model. The dataset, titled "Spotify MBTI Playlists", contains aggregated information on eleven different musical features for a Spotify playlist on each row.

### Preliminary exploratory data analysis:

The dataset can be read directly as a csv file and it is tidy data. However, to maximize predictability, the key count is combined into major and minor columns (see method for more information).

In [2]:
spotify_mbti <- read_csv("combine_mbti_df.csv") |>
    mutate(mbti = as_factor(mbti))
spotify_mbti

ERROR: Error: 'combine_mbti_df.csv' does not exist in current working directory ('/home/jovyan/dsci-100-2022w1-sec-005-group-28').


In [None]:
# remove function_pair, mutate columns to major and minor
spotify_mbti_cleaned <- spotify_mbti |>
    select(-function_pair) |>
    mutate(major_count = CMajor_count + `C#/DbMajor_count` + DMajor_count +
           `D#_EbMajor_count` + EMajor_count + FMajor_count + GMajor_count +
           `G#/AbMajor_count` + AMajor_count + BMajor_count + `A#/BbMajor_count` +
           `F#/GbMajor_count`,
           minor_count = Cminor_count + `C#/Dbminor_count` + Eminor_count +
          Fminor_count + `F#/Gbminor_count` + `G#/Abminor_count` + 
          Aminor_count + `A#/Bbminor_count` + Dminor_count + `D#_Ebminor_count` +
          Gminor_count + Bminor_count) |>
    select(mbti:instrumentalness_stdev, major_count, minor_count)
spotify_mbti_cleaned

In [None]:
glimpse(spotify_mbti_cleaned)

In [None]:
set.seed(1)

spotify_split <- initial_split(spotify_mbti_cleaned, prop = 0.75, strata = mbti)  
spotify_mbti_train <- training(spotify_split)   
spotify_mbti_test <- testing(spotify_split)

Using only training data, summarize the data in at least one table (this is exploratory data analysis). An example of a useful table could be one that reports the number of observations in each class, the means of the predictor variables you plan to use in your analysis and how many rows have missing data.

In [None]:
observation_count <- spotify_mbti_traing |>
    group_by(mbti) |>
    summarize(count = n())
observation_count

Using only training data, visualize the data with at least one plot relevant to the analysis you plan to do (this is exploratory data analysis). An example of a useful visualization could be one that compares the distributions of each of the predictor variables you plan to use in your analysis.

In [None]:
visualization <- spotify_mbti_train |>  
    ggplot(aes(x = danceability_mean, 
               y = energy_mean, 
               colour = mbti)) +
        labs(x = "Dabceability",
             y = "Energy",
            colour = 'MBTI') +
        geom_point() +
        theme(text = element_text(size = 20))
visualization

### Methods:
#### Explain how you will conduct either your data analysis and which variables/columns you will use. Note - you do not need to use all variables/columns that exist in the raw data set. In fact, that's often not a good idea. For each variable think: is this a useful variable for prediction?
We will conduct our data analysis by using K-nearest neighbor classification to predict MBTI from audio quality features of Spotify playlists. We will use all variables from danceability_mean to instrumentalness_stdev (removing function_pair as it is too repetitive and unnecessary), and to decrease the number of predictable variables, we  will need to put all the majors into one column and all the minors into another.

#### Describe at least one way that you will visualize the results.
One way we will visualize the results is by visualizing the confusion matrix as a bar plot, where we will plot the number of correctly identified MBTI types against the number of incorrectly identified MBTI types.

### Expected outcomes and significance:
##### What do you expect to find?

It is difficult to predict with confidence what we will find by analyzing music types and MBTI because music preference is complex and highly unique. However, we may be able to find correlations between certain MBTI types and specific factors in music, such as energy levels, danceability, key, loudness, etc. Using these trends we hope to predict an individual’s MBTI type based on variables of the music they enjoy.

##### What impact could such findings have?

Although the MBTI system is a controversial topic in the psychology community due to its inability to fully represent the intricacies of human personality, our project may prove to be useful in developing a general understanding of the relationship between music and personality. This knowledge may be purposeful in many areas, including but not limited to:

Personal development: Individuals may utilize a better understanding of the relationship between their personality and music types to make well-informed decisions on their music of choice, or explore new types" of music that may better align with their personality type.

Advertisement: A better understanding of the relationship between personality and music type may allow companies to better curate targeted marketing strategies, such as personalized advertisements for musical merchandise or specially-made playlists.

Music recommendation algorithms: Streaming platforms may be able to follow personality and music trends to suggest music more suitable to the listener’s personality type.

##### What future questions could this lead to?

There are several questions that our project could lead to, such as:
* Can musical preferences be used to suggest personality traits beyond MBTI types?
* Is there a difference in musical preference depending on mood/situation?
* How do environmental factors influence the relationship between music preferences and personality types?

Sources:
* https://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/
* https://www.verywellmind.com/music-and-personality-2795424