# **Caltech - Machine Learning Course**

## LAB:  🎸 SPOTIFY - Cohorts of Songs

### NOTEBOOK:  Styling functions
### Project Statement
Problem Scenario:
The customer always looks forward to specialized treatment, whether shopping on an e-commerce website or watching Netflix. The customer desires content that aligns with their preferences. To maintain customer engagement, companies must consistently provide the most relevant information.

Starting with Spotify, a Swedish audio streaming and media service provider, boasts over 456 million active monthly users, including more than 195 million paid subscribers as of September 2022. The company aims to create cohorts of different songs to enhance song recommendations. These cohorts will be based on various relevant features, ensuring that each group contains similar types of songs.

Problem Objective:
As a data scientist, you should perform exploratory data analysis and cluster analysis to create cohorts of songs. The goal is to better understand the various factors that create a cohort of songs.

Data Description:
The dataset comprises information from Spotify's API regarding all albums by the Rolling Stones available on Spotify. It's crucial to highlight that each song possesses a unique ID.


| Variable 	    | Description |
| --------      | -------
| name	        | It is the name of the song. |
| album	        | It is the name of the album.
| release_date	| It is the day, month, and year the album was released. |
| track number	| It is the order in which the song appears on the album. |
| id	        | It is the Spotify ID for the song. |
| uri	        | It is the Spotify URI for the song. |
| acousticness	| A confidence measure from 0.0 to 1.0 indicates whether the track is acoustic. 1.0 represents high confidence that the track is acoustic. |
| danceability	| It describes how suitable a track is for dancing based on a combination of musical elements, including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is the least danceable, and 1.0 is the most danceable. |
| energy	    | It is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. |
| instrumentalness	| It predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal." The closer the instrumentalness value is to 1.0, the greater the likelihood that the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. |
| liveness	    | It detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides a strong likelihood that the track is live. |
| loudness	    | The overall loudness of a track in decibels (dB) and loudness values are averaged across the entire track and are useful for comparing the relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 dB. |
| speechiness	| It detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g., talk show, audiobook, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. |
| tempo	        | The overall estimated tempo of a track is measured in beats per minute (BPM). In musical terminology, the tempo is the speed or pace of a given piece and derives directly from the average beat duration. |
| valence	    | A measure from 0.0 to 1.0 describes the musical positivity conveyed by a track. Tracks with high valence sound more positive (e.g., happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g., sad, depressed, angry). |
| popularity	| The popularity of the song ranges from 0 to 100. |
|duration_ms	| It is the duration of the track in milliseconds. |


### Notebook Objective
Data Inspection and Cleaning

a.	Ensure that the data is clean and free from any missing or incorrect entries.  Inspect the data manually to identify missing or incorrect information using the functions isna() and notna(). 

b.	Based on your knowledge of data analytics, include your recommendations for treating missing and incorrect data (dropping the null values or filling them).

c.	Choose a suitable data wrangling technique—either data standardization or normalization. Execute the preferred normalization method and present the resulting data. (Normalization is the preferred approach for this problem.)

d.	Share your insights regarding the application of the GroupBy() function for either data chunking or merging, and offer a recommendation based on your analysis.

<hr/>


### Spotify Lab - Visual Styling Configuration

## Overview
This notebook contains the visual styling configuration for the Spotify Rolling Stones analysis project. It defines custom color palettes, plot styles, and styling functions inspired by the Rolling Stones aesthetic.

## Module Components

### 1. **Color Palette Definition**
- Defines `STONES_PALETTE` dictionary with Rolling Stones-inspired colors
- Includes reds, blues, yellows, greens, and neutral tones
- Each color has thematic significance related to the band

### 2. **Figure Size Standards**
- `FIGURE_SIZES` dictionary defining consistent sizing
- Small (8x6), Medium (12x8), Large (16x10), Wide (20x8)

### 3. **Plot Style Configuration**
- `PLOT_STYLES` dictionary for consistent visual elements
- Background, grid, text, title, and axis colors
- Legend and figure styling parameters

### 4. **Styling Functions**
- `set_stones_style()`: Sets global matplotlib/seaborn styling
- `apply_stones_style()`: Applies custom styling to individual plots
- Creates custom colormap for heatmaps and other visualizations

---


## Detailed Component Documentation

### Color Palette (`STONES_PALETTE`)
The color palette is inspired by the Rolling Stones' iconic imagery and album art:
- **Red (`#ED2939`)**: The famous tongue logo color - primary brand color
- **Dark Red (`#8B0000`)**: Deep passion, early blues roots
- **Orange (`#FF4500`)**: High energy performances, "Some Girls" era
- **Yellow (`#FFD700`)**: Golden hits and commercial success
- **Green (`#355E3B`)**: British racing green, representing UK origins
- **Blue (`#1E90FF`)**: Electric performances and vibrant stage presence
- **Navy (`#000080`)**: Sophisticated, classic appeal
- **Purple (`#4B0082`)**: Psychedelic era, "Their Satanic Majesties Request"
- **Neutral tones**: White, beige, silver, grey, charcoal for backgrounds and text

### Figure Sizes (`FIGURE_SIZES`)
Standardized dimensions for consistent visualization:
- **Small**: 8×6 inches - single plots, simple visualizations
- **Medium**: 12×8 inches - standard analysis plots
- **Large**: 16×10 inches - complex visualizations, detailed plots
- **Wide**: 20×8 inches - side-by-side comparisons

### Plot Styles (`PLOT_STYLES`)
Consistent styling elements across all visualizations:
- **Background**: Clean white background for readability
- **Grid**: Silver grid lines for subtle reference
- **Text**: Charcoal for high contrast and readability
- **Titles**: Dark red for emphasis and brand consistency
- **Axes**: Navy blue for professional appearance

### Functions

#### `set_stones_style()`
**Purpose**: Sets global matplotlib and seaborn styling preferences
- Applies seaborn whitegrid style as base
- Sets custom color cycle using Stones palette
- Configures rcParams for consistent appearance
- Registers custom "stones" colormap for heatmaps

#### `apply_stones_style(fig, ax, title, show_grid=True)`
**Purpose**: Applies Rolling Stones styling to individual plots
- **Parameters**:
  - `fig`: matplotlib figure object
  - `ax`: matplotlib axes object  
  - `title`: plot title string
  - `show_grid`: boolean to show/hide grid
- **Functionality**: Sets colors, grid, spines, and tick parameters

### Usage Example
```python
# Import styling
%run "./00_Py_Spotify_Styling.ipynb"

# Apply global styling
set_stones_style()

# Create plot
fig, ax = plt.subplots(figsize=FIGURE_SIZES['medium'])
# ... plotting code ...

# Apply custom styling
apply_stones_style(fig, ax, "My Plot Title")
plt.show()
```


In [5]:
import typing as tp
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy as sp
import os

AUDIO_FEATURES: tp.List[str] = ['acousticness', 'danceability', 'energy', 'instrumentalness', 'liveness', 'loudness', 'speechiness', 'tempo', 'valence']
# Custom color palette inspired by the Rolling Stones
STONES_PALETTE = {
    # Reds
    'red': '#ED2939',        # Tongue red, iconic color from the Rolling Stones logo
    'dark_red': '#8B0000',   # Deep red, reminiscent of their early blues roots and passion

    # Oranges
    'orange': '#FF4500',     # Bright orange, energetic like their performances and album "Some Girls"

    # Yellows
    'yellow': '#FFD700',     # Gold, representing their golden hits and success

    # Greens
    'green': '#355E3B',      # British racing green, nod to their British roots and "Green Lady" artwork

    # Blues
    'blue': '#1E90FF',       # Electric blue, vibrant like their electric performances
    'navy': '#000080',       # Navy blue, sophisticated and classic, like their enduring appeal

    # Purples
    'purple': '#4B0082',     # Deep purple, inspired by their psychedelic era and "Their Satanic Majesties Request"

    # Browns
    'brown': '#8B4513',      # Saddle brown, earthy tone reminiscent of their roots rock sound

    # Whites and Greys
    'white': '#FFFFFF',      # White, clean and iconic, like their "Sticky Fingers" album cover
    'beige': '#F5DEB3',      # Beige, inspired by their "Exile on Main St." album cover
    'silver': '#C0C0C0',     # Silver, for their "Steel Wheels" era and longevity
    'grey': '#808080',       # Steel grey, representing their gritty, urban sound
    'charcoal': '#36454F',   # Charcoal, for a gritty, rock n' roll feel, like "Paint It Black"

    # Black
    'black': '#000000',      # Classic black, timeless and rebellious like the band itself
}

FIGURE_SIZES = {
    'small': (8, 6),
    'medium': (12, 8),
    'large': (16, 10),
    'wide': (20, 8),
}

PLOT_STYLES = {
    'background': STONES_PALETTE['white'],
    'grid_color': STONES_PALETTE['silver'],
    'text_color': STONES_PALETTE['charcoal'],
    'title_color': STONES_PALETTE['dark_red'],
    'axis_color': STONES_PALETTE['navy'],
    'legend_facecolor': STONES_PALETTE['beige'],
    'legend_edgecolor': STONES_PALETTE['brown'],
    'figure_facecolor': STONES_PALETTE['white'],
    'primary_color': STONES_PALETTE['red'],
    'secondary_color': STONES_PALETTE['blue'],
}

def set_stones_style():
    """Set the plot style using the Stones-inspired palette."""
    plt.style.use('seaborn-v0_8-whitegrid')
    sns.set_palette([STONES_PALETTE[color] for color in ['red', 'blue', 'yellow', 'green', 'purple', 'orange']])
    
    plt.rcParams.update({
        'figure.facecolor': PLOT_STYLES['figure_facecolor'],
        'axes.facecolor': PLOT_STYLES['background'],
        'axes.grid': True,
        'axes.edgecolor': PLOT_STYLES['axis_color'],
        'axes.labelcolor': PLOT_STYLES['axis_color'],
        'grid.color': PLOT_STYLES['grid_color'],
        'text.color': PLOT_STYLES['text_color'],
        'xtick.color': PLOT_STYLES['axis_color'],
        'ytick.color': PLOT_STYLES['axis_color'],
        'legend.facecolor': PLOT_STYLES['legend_facecolor'],
        'legend.edgecolor': PLOT_STYLES['legend_edgecolor'],
        'figure.titlesize': 16,
        'axes.titlesize': 14,
        'axes.labelsize': 12,
        'font.size': 10,
        'legend.fontsize': 10,
    })
    
    # Custom colormaps
    from matplotlib.colors import LinearSegmentedColormap
    stones_cmap = LinearSegmentedColormap.from_list("stones", 
        [STONES_PALETTE['blue'], STONES_PALETTE['white'], STONES_PALETTE['red']])
    plt.colormaps.register(cmap=stones_cmap)
    
    # Set default color cycle
    plt.rcParams['axes.prop_cycle'] = plt.cycler(color=[STONES_PALETTE[color] for color in 
        ['red', 'blue', 'yellow', 'green', 'purple', 'orange', 'dark_red', 'navy']])

def apply_stones_style(fig, ax, title, show_grid=True):
    """Apply additional Stones-inspired styling to a figure and axis."""
    fig.patch.set_facecolor(PLOT_STYLES['figure_facecolor'])
    ax.set_facecolor(PLOT_STYLES['background'])
    if show_grid:
        ax.grid(color=PLOT_STYLES['grid_color'], linestyle='--', linewidth=0.5, alpha=0.7)
    else:
        ax.grid(False)
    ax.set_title(title, color=PLOT_STYLES['title_color'], fontweight='bold')
    ax.spines['bottom'].set_color(PLOT_STYLES['axis_color'])
    ax.spines['left'].set_color(PLOT_STYLES['axis_color'])
    ax.tick_params(colors=PLOT_STYLES['axis_color'])