Day12-Multivariate/Day12-Multivariate-Solutions.Rmd

---
title: "Day Twelve: Multivariate"
subtitle: "SDS 192: Introduction to Data Science"
author: |
  Lindsay Poirier<br/>
  <span style = 'font-size: 70%;'>
  [Statistical & Data Sciences](http://www.smith.edu/sds), Smith College<br/>
  </span>
date: |
  Spring 2022<br/>
output: pdf_document
---

# Introduction

The goal of this lab is to provide you with practice in producing data visualizations that help to answer a research question. Topics that will be covered include:

1. Practice producing and interpreting univariate plots.
2. Practice producing and interpreting multivariate plots.
3. Reordering a categorical axis
4. Practice labeling plots
5. Practice in visualization aesthetics.

Today, we are prioritizing **joy**! If you are a regular Spotify user (Option 1), today's research question will be: How joyful are my Spotify playlists? If you are not a regular Spotify user (Option 2), today's research question will be: How joyful are popular Spotify playlists in my favorite music genre?

The music feature from Spotify's data that serves as a measure of joy is called *valence*. This is the description from their API documentation for valence:

> A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

(Pretty vague if you ask me, but today we'll go with it.) 

# Step 1: Load packages

```{r setup}
#Load packates
library(tidyverse)
library(spotifyr)
```

# Step 2: Add your Spotify credentials

Copy client id and secret from your previous lab into the chunk below, and then run the code chunk.

```{r creds, include=FALSE}
id <- Sys.getenv("SPOTIFY_CLIENT_ID")
secret <- Sys.getenv("SPOTIFY_CLIENT_SECRET")
Sys.setenv(SPOTIFY_CLIENT_ID = id)
Sys.setenv(SPOTIFY_CLIENT_SECRET = secret)
access_token <- get_spotify_access_token()
```

# Step 3, Option 1: Analyze your own Spotify data!

1. Navigate to your SDS 192 project in the Spotify developer account you created on Monday.
2. Click Edit Settings. Under the heading **Redirect URIs** copy and paste this URL: https://localhost:1410/ Click Save. This is going to allow us to authenticate our Spotify accounts through our local computers. 
3. Below replace `FILL USER NAME HERE` with your Spotify username. This is the ID that appears in the upper right hand corner when you log into your Spotify account (not your developer account.) Run the code chunk below. You will be redirected a web browser window confirming authentication. Return back to RStudio and run this code chunk again to load the data. 

```{r getdata1, include=FALSE}
spotify_playlists <- get_user_audio_features(
  username = "poiril",
  authorization = get_spotify_access_token()
) %>%
  select(-c(images, 
            track.available_markets, 
            track.artists, 
            track.album.artists, 
            track.album.available_markets, 
            track.album.images))
```

This will create a data frame with the songs your user account has stored in Spotify playlists. 

# Step 3, Option 2: Analyze playlist Spotify data!

1. Navigate to Spotify.com and create an account.
2. Navigate to your SDS 192 project in the Spotify developer account you created on Monday.
3. Click Edit Settings. Under the heading **Redirect URIs** copy and paste this URL: https://localhost:1410/ Click Save. 
4. Below replace `FILL USER NAME HERE` with your Spotify username. This is the ID that appears in the upper right hand corner when you log into your Spotify account (not your developer account.)
5. Search Spotify for your favorite music genre and select three *playlists* from the search. Playlists may be a ways down in the search results. 
6. When you click on a playlist, notice the URL in the navigation bar of your web browser. It should look something like spotify.com/playlist/LONG_STRING_OF_CHARACTERS. Copy the long string of characters at the end of the URL, and paste it into the function below where it says `FILL LONG STRING OF CHARACTERS FROM URL`. Repeat this for the other two playlists.
7. Run the code chunk below. You will be redirected a web browser window confirming authentication. Return back to RStudio and run this code chunk again to load the data. 

```{r getdata2, include=FALSE}
spotify_playlists <- get_playlist_audio_features(
  username = "poiril",
  playlist_uris = c("6Zjz0tu37mciuxwASHLZWp",
                    "2tYBsYfEo7Lxi3CiVjI2L1",
                    "40NO1fdNj3ny6VseywtmZe"),
  authorization = get_spotify_access_token()
) %>%
  select(-c(track.artists, 
            track.available_markets, 
            track.album.artists, 
            track.album.available_markets, 
            track.album.images))
```

Today we will be mostly practicing plots you've already learned. However, we will learn one new skill - reordering a *categorical* axis based on numeric values. We will do this with the `reorder()` function. 

The `reorder()` function has three arguments:
* the vector of categorical values to be reordered,
* a vector that will serve as the basis for reordering, 
* and a function to determine how values will be reordered. 

So let's say I created the following grouped boxplots visualizing the distribution of energy across key names, and I wanted to reorder the categorical axis so that the key name with the highest median energy would appear first and lowest median energy would appear last (and all other medians ordered accordingly in between).

```{r}
spotify_playlists %>%
  ggplot(aes(x = key_name, y = energy)) +
  geom_boxplot() +
  coord_flip() +
  labs(title = "Distribution of Song Energy per Key Name in Spotify Wedding Playlists, 2022",
       x = "Key Name", 
       y = "Energy")
```

I want to reorder my x-axis, so I will place the `reorder()` function around my x aesthetic, and assign the following arguments:

* the vector of categorical values to be reordered: `key_name`
* a vector that will serve as the basis for reordering: `energy`
* and a function to determine how values will be reordered: `median`

```{r}
spotify_playlists %>%
  ggplot(aes(x = reorder(key_name, energy, median), y = energy)) +
  geom_boxplot() +
  coord_flip() +
  labs(title = "Distribution of Song Energy per Key Name in Spotify Wedding Playlists, 2022",
       x = "Key Name", 
       y = "Energy")
```

What about reordering based on the height of a bar plot? 

```{r}
spotify_playlists %>%
  ggplot(aes(x = key_name)) +
  geom_bar(color = "white") +
  coord_flip() +
  labs(title = "Count Song Key Modes in Three Spotify Wedding Playlists, 2022",
       x = "Key Mode", 
       y = "Count of Songs")
```

Notice how here we don't have a separate vector to serve as the basis for reordering. Actually, we want to reorder based on the `length` of the *same vector* we want reordered:

I want to reorder my x-axis, so I will place the `reorder()` function around my x aesthetic, and assign the following arguments:

* the vector of categorical values to be reordered: `key_name`
* a vector of that will serve as the basis for reordering: `key_name`
* and a function to determine how values will be reordered: `length`

```{r}
spotify_playlists %>%
  ggplot(aes(x = reorder(key_name, key_name, length))) +
  geom_bar(color = "white") +
  coord_flip() +
  labs(title = "Count Song Key Modes in Three Spotify Wedding Playlists, 2022",
       x = "Key Mode", 
       y = "Count of Songs")
```

# Step Four: How many songs are in each playlist? Create a plot to visualize this, and order the results by the number of songs. 

Be sure to give it a descriptive title and labels covering all 5 essential components of data context.

```{r plot1}
# Create plot here

spotify_playlists %>%
  ggplot(aes(x = reorder(playlist_name, playlist_name, length))) +
  geom_bar(color = "white") +
  coord_flip() +
  labs(title = "Count of Songs in Three Spotify Wedding Playlists, 2022",
       x = "Playlist Name", 
       y = "Count of Songs")
```

# Step Five: What is the distribution of valence across all of the songs (in intervals of 0.1 valence)? Create a plot to visualize this.

Be sure to give it a descriptive title and labels covering all 5 essential components of data context.

```{r plot2}
# Create plot here

spotify_playlists %>%
  ggplot(aes(x = valence)) +
  geom_histogram(binwidth = 0.1, color = "white") +
  coord_flip() +
  labs(title = "Distribution of Valence of Songs in Spotify Wedding Playlists, 2022",
       x = "Valence", 
       y = "Count of Songs")
```

# Step Six: What is the distribution of valence across all of the songs (in intervals of 0.1 valence) *in each playlist*? Create a plot to visualize this.

Be sure to give it a descriptive title and labels covering all 5 essential components of data context.

```{r plot3}
# Create plot here

spotify_playlists %>%
  ggplot(aes(x = valence)) +
  geom_histogram(binwidth = 0.1, color = "white") +
  coord_flip() +
  labs(title = "Distribution of Valence of Songs in Spotify Wedding Playlists, 2022",
       x = "Valence", 
       y = "Count of Songs") +
  facet_wrap(vars(playlist_name))

# OR, if you have fewer than 8 playlists...

spotify_playlists %>%
  ggplot(aes(x = valence, fill = playlist_name)) +
  geom_histogram(binwidth = 0.1) +
  coord_flip() +
  labs(title = "Distribution of Valence of Songs in Spotify Wedding Playlists, 2022",
       x = "Valence", 
       y = "Count of Songs") +
  scale_fill_brewer(palette = "Dark2")
```

# Step Seven: What are differences in the summary statistics (max, min, median, etc.) of the valence of songs in each playlist? Create a plot to visualize this, and order the results by the median of valence. 

Be sure to give it a descriptive title and labels covering all 5 essential components of data context.

```{r plot4}
# Create plot here

spotify_playlists %>%
  ggplot(aes(x = reorder(playlist_name, valence, median), y = valence)) +
  geom_boxplot() +
  coord_flip() +
  labs(title = "Distribution of Valence of Songs in Spotify Wedding Playlists, 2022",
       x = "Playlist Name", 
       y = "Valence")
```

# Step Eight: Do happier songs tend to be more danceable *in each playlist*? Create a plot to visualize this.

Be sure to give it a descriptive title and labels covering all 5 essential components of data context. Also be sure to adjust your plot to address overplotting. 

```{r plot5}
# Create plot here

spotify_playlists %>%
  ggplot(aes(x = valence, y = danceability)) +
  geom_point(alpha = 0.3, size = 0.5) +
  labs(title = "Relationship between Danceability and Valence of Songs in Spotify Wedding Playlists, 2022",
       x = "Valence", 
       y = "Danceability") +
  facet_wrap(vars(playlist_name)) +
  geom_smooth(method = "lm")
```

> Note: You may wish to add a trend line to your plot with method="lm"!

# Step Nine: Do songs composed in the minor or major mode tend to be happier *in each playlist*? Create a plot to visualize this.

```{r plot6}
# Create plot here

spotify_playlists %>%
  ggplot(aes(x = mode_name, y = valence)) +
  geom_boxplot(alpha = 0.3, size = 0.5) +
  labs(title = "Comparison of Valence in Major/Minor Mode Songs in Spotify Wedding Playlists, 2022",
       x = "Mode Name", 
       y = "Valence") +
  facet_wrap(vars(playlist_name))
```

# Step Ten: Do happier songs tend to have a higher tempo across all playlists? What role might the song's mode play? Create a plot to visualize this.  

Be sure to give it a descriptive title and labels covering all 5 essential components of data context. Also be sure to adjust your plot to address overplotting. 

```{r plot7}
#Create boxplot here

spotify_playlists %>%
  ggplot(aes(x = tempo, y = valence, col = mode_name)) +
  geom_point(alpha = 0.3, size = 0.5) +
  labs(title = "Relationship between Tempo, Valence, and Mode of Songs in Spotify Wedding Playlists, 2022",
       x = "Tempo", 
       y = "Valence",
       col = "Song Mode") +
  facet_wrap(vars(playlist_name)) +
  geom_smooth(method = "lm")
```

> Note: You may wish to add a trend line to your plot with method="lm"!

> What did you learn about joy across these playlists?