## Introduction

The exact definition for a music genre is sometimes difficult to judge, and people tend to classify songs based on 'feel' and loose characteristics. What if there was a more concrete and systematic way to complete this classification? This project aims to produce a model that can, given a song's numeric characteristics, predict its genre. We are using an open-source dataset on Kaggle which contains information on various song characteristics such as duration, dancability, popularity, and acousticness. The dataset was last updated in December of 2021.

Question: How are the numerical characteristics of a song related to its genre?

### Methods & Results

In [3]:
### Loading Libraries into R
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
source('tests.R')
source('cleanup.R')

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

“package ‘ggplot2’ was built under R version 4.0.1”
“package ‘tibble’ was built under R version 4.0.2”
“package ‘tidyr’ was built under R version 4.0.2”
“package ‘dplyr’ was built under R version 4.0.2”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

“package ‘tidymodels’ was built under R version 4.0.2”
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 0.1.1 ──

[32m✔

ERROR: Error in file(filename, "r", encoding = encoding): cannot open the connection


In [None]:
### data is read from online
data <- read_csv(file = "data/music_genre.csv")
data

In [None]:
### removing the columns we don't need
data_select <- data %>%
            select(-instance_id, -key,-obtained_date)
data_factor <- mutate(data_select,
                  mode = as.factor(mode),
                  music_genre = as.factor(music_genre),
                  tempo = as.double(tempo))
data_factor

In [None]:
### spliting the data into training set and testing set, 75% of the data in the training set
data_split <- initial_split(data_factor, prop = 0.75, strata = music_genre) 
data_train <- training(data_split)   
data_test <- testing(data_split)
data_train
data_test

In [None]:
### Table for each Genre
Electronic <- data_train %>%
                filter(music_genre == "Electronic") %>%
                group_by(music_genre) %>%
                summarize(Electronic_popularity = mean(popularity, na.ra = True),
                         Electronic_acousticness = mean(acousticness, na.ra = True),
                         Electronic_danceability = mean(danceability, na.ra = True),
                         Electronic_duration_ms = mean(duration_ms, na.ra = True),
                         Electronic_energy = mean(energy, na.ra = True),
                         Electronic_instrumentalness = mean(instrumentalness, na.ra = True),
                         Electronic_liveness = mean(liveness, na.ra = True),
                         Electronic_loudness = mean(loudness, na.ra = True),
                         Electronic_speechiness = mean(speechiness, na.ra = True),
                         Electronic_tempo = mean(tempo, na.ra = True),
                         Electronic_valence = mean(valence, na.ra = True))
Electronic

Anime <- data_train %>%
                filter(music_genre == "Anime") %>%
                group_by(music_genre) %>%
                summarize(Anime_popularity = mean(popularity, na.ra = True),
                         Anime_acousticness = mean(acousticness, na.ra = True),
                         Anime_danceability = mean(danceability, na.ra = True),
                         Anime_duration_ms = mean(duration_ms, na.ra = True),
                         Anime_energy = mean(energy, na.ra = True),
                         Anime_instrumentalness = mean(instrumentalness, na.ra = True),
                         Anime_liveness = mean(liveness, na.ra = True),
                         Anime_loudness = mean(loudness, na.ra = True),
                         Anime_speechiness = mean(speechiness, na.ra = True),
                         Anime_tempo = mean(tempo, na.ra = True),
                         Anime_valence = mean(valence, na.ra = True))
Anime

Jazz <- data_train %>%
                filter(music_genre == "Jazz") %>%
                group_by(music_genre) %>%
                summarize(Jazz_popularity = mean(popularity, na.ra = True),
                         Jazz_acousticness = mean(acousticness, na.ra = True),
                         Jazz_danceability = mean(danceability, na.ra = True),
                         Jazz_duration_ms = mean(duration_ms, na.ra = True),
                         Jazz_energy = mean(energy, na.ra = True),
                         Jazz_instrumentalness = mean(instrumentalness, na.ra = True),
                         Jazz_liveness = mean(liveness, na.ra = True),
                         Jazz_loudness = mean(loudness, na.ra = True),
                         Jazz_speechiness = mean(speechiness, na.ra = True),
                         Jazz_tempo = mean(tempo, na.ra = True),
                         Jazz_valence = mean(valence, na.ra = True))
Jazz

Alternative <- data_train %>%
                filter(music_genre == "Alternative") %>%
                group_by(music_genre) %>%
                summarize(Alternative_popularity = mean(popularity, na.ra = True),
                         Alternative_acousticness = mean(acousticness, na.ra = True),
                         Alternative_danceability = mean(danceability, na.ra = True),
                         Alternative_duration_ms = mean(duration_ms, na.ra = True),
                         Alternative_energy = mean(energy, na.ra = True),
                         Alternative_instrumentalness = mean(instrumentalness, na.ra = True),
                         Alternative_liveness = mean(liveness, na.ra = True),
                         Alternative_loudness = mean(loudness, na.ra = True),
                         Alternative_speechiness = mean(speechiness, na.ra = True),
                         Alternative_tempo = mean(tempo, na.ra = True),
                         Alternative_valence = mean(valence, na.ra = True))
Alternative

Country <- data_train %>%
                filter(music_genre == "Country") %>%
                group_by(music_genre) %>%
                summarize(Country_popularity = mean(popularity, na.ra = True),
                         Country_acousticness = mean(acousticness, na.ra = True),
                         Country_danceability = mean(danceability, na.ra = True),
                         Country_duration_ms = mean(duration_ms, na.ra = True),
                         Country_energy = mean(energy, na.ra = True),
                         Country_instrumentalness = mean(instrumentalness, na.ra = True),
                         Country_liveness = mean(liveness, na.ra = True),
                         Country_loudness = mean(loudness, na.ra = True),
                         Country_speechiness = mean(speechiness, na.ra = True),
                         Country_tempo = mean(tempo, na.ra = True),
                         Country_valence = mean(valence, na.ra = True))
Country

Rap <- data_train %>%
                filter(music_genre == "Rap") %>%
                group_by(music_genre) %>%
                summarize(Rap_popularity = mean(popularity, na.ra = True),
                         Rap_acousticness = mean(acousticness, na.ra = True),
                         Rap_danceability = mean(danceability, na.ra = True),
                         Rap_duration_ms = mean(duration_ms, na.ra = True),
                         Rap_energy = mean(energy, na.ra = True),
                         Rap_instrumentalness = mean(instrumentalness, na.ra = True),
                         Rap_liveness = mean(liveness, na.ra = True),
                         Rap_loudness = mean(loudness, na.ra = True),
                         Rap_speechiness = mean(speechiness, na.ra = True),
                         Rap_tempo = mean(tempo, na.ra = True),
                         Rap_valence = mean(valence, na.ra = True))
Rap

Blues <- data_train %>%
                filter(music_genre == "Blues") %>%
                group_by(music_genre) %>%
                summarize(Blues_popularity = mean(popularity, na.ra = True),
                         Blues_acousticness = mean(acousticness, na.ra = True),
                         Blues_danceability = mean(danceability, na.ra = True),
                         Blues_duration_ms = mean(duration_ms, na.ra = True),
                         Blues_energy = mean(energy, na.ra = True),
                         Blues_instrumentalness = mean(instrumentalness, na.ra = True),
                         Blues_liveness = mean(liveness, na.ra = True),
                         Blues_loudness = mean(loudness, na.ra = True),
                         Blues_speechiness = mean(speechiness, na.ra = True),
                         Blues_tempo = mean(tempo, na.ra = True),
                         Blues_valence = mean(valence, na.ra = True))
Blues

Rock <- data_train %>%
                filter(music_genre == "Rock") %>%
                group_by(music_genre) %>%
                summarize(Rock_popularity = mean(popularity, na.ra = True),
                         Rock_acousticness = mean(acousticness, na.ra = True),
                         Rock_danceability = mean(danceability, na.ra = True),
                         Rock_duration_ms = mean(duration_ms, na.ra = True),
                         Rock_energy = mean(energy, na.ra = True),
                         Rock_instrumentalness = mean(instrumentalness, na.ra = True),
                         Rock_liveness = mean(liveness, na.ra = True),
                         Rock_loudness = mean(loudness, na.ra = True),
                         Rock_speechiness = mean(speechiness, na.ra = True),
                         Rock_tempo = mean(tempo, na.ra = True),
                         Rock_valence = mean(valence, na.ra = True))
Rock

Classical <- data_train %>%
                filter(music_genre == "Classical") %>%
                group_by(music_genre) %>%
                summarize(Classical_popularity = mean(popularity, na.ra = True),
                         Classical_acousticness = mean(acousticness, na.ra = True),
                         Classical_danceability = mean(danceability, na.ra = True),
                         Classical_duration_ms = mean(duration_ms, na.ra = True),
                         Classical_energy = mean(energy, na.ra = True),
                         Classical_instrumentalness = mean(instrumentalness, na.ra = True),
                         Classical_liveness = mean(liveness, na.ra = True),
                         Classical_loudness = mean(loudness, na.ra = True),
                         Classical_speechiness = mean(speechiness, na.ra = True),
                         Classical_tempo = mean(tempo, na.ra = True),
                         Classical_valence = mean(valence, na.ra = True))
Classical

Hip_Hop <- data_train %>%
                filter(music_genre == "Hip-Hop") %>%
                group_by(music_genre) %>%
                summarize(Hip_Hop_popularity = mean(popularity, na.ra = True),
                         Hip_Hop_acousticness = mean(acousticness, na.ra = True),
                         Hip_Hop_danceability = mean(danceability, na.ra = True),
                         Hip_Hop_duration_ms = mean(duration_ms, na.ra = True),
                         Hip_Hop_energy = mean(energy, na.ra = True),
                         Hip_Hop_instrumentalness = mean(instrumentalness, na.ra = True),
                         Hip_Hop_liveness = mean(liveness, na.ra = True),
                         Hip_Hop_loudness = mean(loudness, na.ra = True),
                         Hip_Hop_speechiness = mean(speechiness, na.ra = True),
                         Hip_Hop_tempo = mean(tempo, na.ra = True),
                         Hip_Hop_valence = mean(valence, na.ra = True))
Hip_Hop

In [None]:
# proportion of music genre by mode(major, minor)
mode_proportion_plot <- data_train %>%
ggplot(aes(x = mode, fill = music_genre))+
geom_bar(position = "fill")+
labs(x = "Mode", y = "Proportion of music genre", fill = "music_genre") +
ggtitle("Proportion of music genre by mode(major, minor)")
mode_proportion_plot

## Discussion

<ul>
    <li>summarize what you found</li>
    <li>discuss whether this is what you expected to find?</li>
    <li>discuss what impact could such findings have?</li>
    <li>discuss what future questions could this lead to?</li>
</ul>

## References

<ul>
    <li>At least 2 citations of literature relevant to the project (format is your choice, just be consistent across the references).</li>
    <li>Make sure to cite the source of your data as well.</li>
</ul>