# Milestone 1

## Part I: Initial Exploration

### Data Abstraction

| Variable Name        | Semantics                                                       | Type             | Cardinality |
| -------------------- | -------------------------------------------------------------- | ----------------- | ----------- |
| track_name           | Name of the song.                                               | Nominal           |   943          |
| artist(s)_name       | Name of the artist(s) of the song.                               | Nominal           |    645         |
| artist_count         | Number of artists contributing to the song.                     | Quantitative      | 8 [1, 8]     |
| released_year        | Year when the song was released.                                | Temporal          | 94 [1930, 2023] |
| released_month       | Month when the song was released.                               | Temporal          | 12 [1, 12]   |
| released_day         | Day of the month when the song was released.                    | Temporal          | 31 [1, 31]   |
| in_spotify_playlists | Number of Spotify playlists the song is included in.            | Quantitative      | 52868 [31, 52898] |
| in_spotify_charts    | Presence and rank of the song on Spotify charts.               | Ordinal   | 148 [0, 147]  |
| streams              | Total number of streams on Spotify.                            | Quantitative      |  3703892312   [2762, 3703895074]         |
| in_apple_playlists   | Number of Apple Music playlists the song is included in.       | Quantitative      | 673 [0, 672]  |
| in_apple_charts      | Presence and rank of the song on Apple Music charts.           | Ordinal   | 276 [0, 275]  |
| in_deezer_playlists  | Number of Deezer playlists the song is included in.            | Quantitative      | 59 [0, 58]    |
| in_deezer_charts     | Presence and rank of the song on Deezer charts.                | Ordinal   | 59  [0, 58]          |
| in_shazam_charts     | Presence and rank of the song on Shazam charts.                | Ordinal   |  954   [0, 953]        |
| bpm                  | Beats per minute, a measure of song tempo.                     | Quantitative      | 142 [65, 206] |
| key                  | Key of the song.                                                | Ordinal           |  11           |
| mode                 | Mode of the song (major or minor).                              | Nominal           |    2         |
| danceability_%       | Percentage indicating how suitable the song is for dancing.    | Quantitative      | 74 [23, 96]  |
| valence_%            | Positivity of the song's musical content.                       | Quantitative      | 94 [4, 97]   |
| energy_%             | Perceived energy level of the song.                            | Quantitative      | 89 [9, 97]   |
| acousticness_%       | Amount of acoustic sound in the song.                           | Quantitative      | 98 [0, 97]   |
| instrumentalness_%   | Amount of instrumental content in the song.                     | Quantitative      | 98 [0, 97]   |
| liveness_%           | Presence of live performance elements.                          | Quantitative      | 92 [0, 91]   |
| speechiness_%        | Amount of spoken words in the song.                             | Quantitative      | 62 [3, 64]   |


### Exploratory Data Analysis

#### Importing the dataset

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
import numpy as np
import altair as alt

spotify_data = pd.read_csv('data/spotify_2023.csv', delimiter=',')

# These features were encoded as the incorrect data type in the original data set, so we are tranforming them into numeric here before creating visuals
spotify_data['streams'] = pd.to_numeric(spotify_data['streams'], errors='coerce')
spotify_data['in_shazam_charts'] = pd.to_numeric(spotify_data['in_shazam_charts'], errors='coerce')
spotify_data['in_deezer_playlists'] = pd.to_numeric(spotify_data['in_deezer_playlists'], errors='coerce')

spotify_data.head()

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
0,Seven (feat. Latto) (Explicit Ver.),"Latto, Jung Kook",2,2023,7,14,553,147,141381703.0,43,...,125,B,Major,80,89,83,31,0,8,4
1,LALA,Myke Towers,1,2023,3,23,1474,48,133716286.0,48,...,92,C#,Major,71,61,74,7,0,10,4
2,vampire,Olivia Rodrigo,1,2023,6,30,1397,113,140003974.0,94,...,138,F,Major,51,32,53,17,0,31,6
3,Cruel Summer,Taylor Swift,1,2019,8,23,7858,100,800840817.0,116,...,170,A,Major,55,58,72,11,0,11,15
4,WHERE SHE GOES,Bad Bunny,1,2023,5,18,3133,50,303236322.0,84,...,144,A,Minor,65,23,80,14,63,11,6


#### Data size

In [3]:
num_rows, num_columns = spotify_data.shape
print(f"The data has {num_rows} rows and {num_columns} columns.")

The data has 953 rows and 24 columns.


#### Numeric summaries
We have opted for creating _frequency tables_ for the non-quantitative attributes as they help us understand the distribution and occurrences of categorical values. Additionally, we are using a _central tendency table_ for the quantitative features to summarize their central values, such as mean, median, and mode, providing insights into their typical characteristics.

In [4]:
non_numeric_features = spotify_data.select_dtypes(exclude=[int, float])

# Removing track_name since it is not very informative to explore its frequency table
non_numeric_features = non_numeric_features.drop(columns=['track_name'])

frequency_tables = {}

for column in non_numeric_features.columns:
    frequency_table = non_numeric_features[column].value_counts().reset_index()
    frequency_table.columns = ['Value', 'Frequency']

    frequency_tables[column] = frequency_table

for column, frequency_df in frequency_tables.items():
    print("Frequency table for column:", column)
    display(frequency_df)
    print("\n")

Frequency table for column: artist(s)_name


Unnamed: 0,Value,Frequency
0,Taylor Swift,34
1,The Weeknd,22
2,Bad Bunny,19
3,SZA,19
4,Harry Styles,17
...,...,...
640,"Karol G, Ovy On The Drums",1
641,"Coolio, L.V.",1
642,Kordhell,1
643,Kenia OS,1




Frequency table for column: key


Unnamed: 0,Value,Frequency
0,C#,120
1,G,96
2,G#,91
3,F,89
4,B,81
5,D,81
6,A,75
7,F#,73
8,E,62
9,A#,57




Frequency table for column: mode


Unnamed: 0,Value,Frequency
0,Major,550
1,Minor,403






In [5]:
numeric_features = spotify_data.select_dtypes(include=[np.number])

pd.options.display.float_format = '{:.2f}'.format  

central_tendency = pd.DataFrame({
    'Mean': numeric_features.mean(),
    'Median': numeric_features.median(),
    'Mode': numeric_features.mode().iloc[0]  # In case of multiple modes, we are selecting the first one
})

central_tendency

Unnamed: 0,Mean,Median,Mode
artist_count,1.56,1.0,1.0
released_year,2018.24,2022.0,2022.0
released_month,6.03,6.0,1.0
released_day,13.93,13.0,1.0
in_spotify_playlists,5200.12,2224.0,86.0
in_spotify_charts,12.01,3.0,0.0
streams,514137424.94,290530915.0,156338624.0
in_apple_playlists,67.81,34.0,0.0
in_apple_charts,51.91,38.0,0.0
in_deezer_playlists,109.74,36.5,0.0


#### Visual summaries
We created histograms for both quantitative and categorical features within the dataset, thereby allowing us to visualize their distributions. This approach aids in identifying intriguing patterns and areas of exploration for our project. 

##### Histograms for quantitative features

In [6]:
features = numeric_features.columns

charts = []
for feature in features:
    chart = alt.Chart(numeric_features).mark_bar().encode(
        alt.X(feature, bin=alt.Bin(maxbins=20), title=feature),
        y='count()',
    ).properties(
        width=180,
        height=150,
        title=feature
    )
    charts.append(chart)

faceted_histograms1 = alt.hconcat(*charts[:5])
faceted_histograms2 = alt.hconcat(*charts[5:10])
faceted_histograms3 = alt.hconcat(*charts[10:15])
faceted_histograms4 = alt.hconcat(*charts[15:19])

all_faceted_histograms = alt.vconcat(faceted_histograms1, faceted_histograms2, faceted_histograms3, faceted_histograms4)
all_faceted_histograms

##### Histograms for qualitative features
Observation: with over 600 different artists in the dataset, we chose to filter the data to focus on the top 50 most frequent artists. This approach helps us gain a clearer understanding of the distribution of artists, particularly where variations in artist counts are most noticeable.

In [7]:
artist_counts = spotify_data['artist(s)_name'].value_counts()
top_50_artists = artist_counts.head(50).index
filtered_df = spotify_data[spotify_data['artist(s)_name'].isin(top_50_artists)]

artist_chart = alt.Chart(filtered_df).mark_bar(color = 'coral').encode(
        alt.X('artist(s)_name:N', title='Artist name', sort = '-y'),
        y='count()',
        tooltip= alt.Tooltip(['artist(s)_name', 'count()'])
    ).properties(
        width=650,  
        height=150,  
        title='Artist name'
    )

key_chart = alt.Chart(spotify_data).mark_bar(color = 'coral').encode(
        alt.X('key:N', title='Key'),
        y='count()',
    ).properties(
        width=180,  
        height=150,  
        title='Key'
    )

mode_chart = alt.Chart(spotify_data).mark_bar(color = 'coral').encode(
        alt.X('mode:N', title='Mode'),
        y='count()',
    ).properties(
        width=180,  
        height=150,  
        title='Mode'
    )
    
faceted_quali_histograms = alt.hconcat(artist_chart, key_chart, mode_chart)
faceted_quali_histograms

#### Multivariate visual summaries

**Graph 1:** Exploring the top 20 streamed songs on Spotify and categorizing them by key:

In [8]:
most_streamed_songs = spotify_data.nlargest(20, 'streams')

chart = alt.Chart(most_streamed_songs).mark_bar().encode(
    alt.X('streams:Q', title='Streams'),
    alt.Y('track_name:N', title='Track Name', sort='-x'),
    alt.Color('key:N', title='Key'),
    alt.Tooltip(['artist(s)_name', 'danceability_%', 'liveness_%'])
).properties(width=400,
             height=300,
             title='Top 20 Streamed Songs on Spotify')

chart

**Graph 2:** Exploring the relationship between danceability and energy levels of songs:

In [9]:
scatter_plot = alt.Chart(spotify_data).mark_circle().encode(
    x='danceability_%:Q',
    y='energy_%:Q',
    tooltip=['track_name', 'danceability_%', 'energy_%']
).properties(width=500, 
             height=250, 
             title='Song Distribution based on Danceability and Energy')

scatter_plot

**Graph 3:** Investigating relationship between release date and streaming performance of songs:

In [10]:
heatmap = alt.Chart(spotify_data).mark_rect().encode(
    y='released_month:N',
    x='released_year:N',
    color='streams:Q'
).properties(width=700, 
             height=200, 
             title='Relationship between Release Time and Streaming Performance')

heatmap

**Graph 4:** Understanding patterns of streaming over the months based on release date of the song:

In [11]:
time_series_chart = alt.Chart(spotify_data).mark_line().encode(
    x='released_month',
    y='sum(streams):Q',
    tooltip=['sum(streams)']
).properties(width=600, 
             height=200, 
             title='Time Series of Streams Over the Months')

time_series_chart

**Graph 5:** Exploring popularity of songs by their musical characteristics (danceability and energy):

In [12]:
bubble_chart = alt.Chart(spotify_data).mark_circle().encode(
    x=alt.X('danceability_%:Q', title='Danceability %'),
    y=alt.Y('streams:Q', title='Streams'),
    color=alt.Color('energy_%:Q', title='Energy %', scale=alt.Scale(scheme='blues')),
    tooltip=['artist(s)_name', 'streams', 'danceability_%', 'energy_%']
).properties(width=600, 
             height=200, 
             title='Bubble Chart of Artist Popularity vs. Musical Characteristics')

bubble_chart

## PART II : Project Scope
### Introduction
**Title of the Project**: "Analyzing of Streaming Data Across Apple Music and Spotify"

**Description**: In this project, we want to extract insights from a data set of music streaming information to uncover correlations and trends within the music industry between chart performance, release timing, artist success, among other musical characteristics of the song. This project will be valuable to people in a variety of music-related fields, providing real-world applications of data science. Artists can gain a deeper understanding of the factors contributing to song popularity and gain insight into what will constitute a successful song.

**Intended Audience**: Musical artists, producers, record labels, streaming platforms. For each of these audiences, there are different expectations:
- _Musical Artists_: Musical Artists can see which song characteristics contribute to the number of streaming of their songs. This could help them understand what their audience likes. 
- _Producers_: Producers can choose emerging trends and genres, guiding production efforts.
- _Record Labels_: Labels could identify emerging artists with certain music styles that have the most success on different streaming platforms. They could also see what the best time of the year is to achieve successful releases.
- _Streaming Platforms_: Determine the artist have the best streaming performance and this could help them decide which artists they would want to have contracts with.

### Task Analysis
To ensure diversity in task, we will use Stasko’s taxonomy for low-level tasks
- At least five distinct tasks pertaining to at least four distinct attributes. (if your group size is 4 then you should have 7 distinct tasks, that pertain to at least 6 distinct attributes)
- Sufficient overall complexity that visualizing multiple attributes using multiple views is necessary.
- All tasks must be plausible and must be addressable by the dataset(s) you are using.
- Your tasks must be complex enough that the design requirements in the Overview are met.

## PART III : Visualization Ideas

### Preliminary Sketches
Write out each task and below each task, include the following
- Three sketches (low fidelity) suited for the task
- A critique of all three
- Sketch (high fidelity) of the final one selected
- How the sketch you selected adheres to theoretical principles you have been exposed to this term.

## PART IV: Next Steps

We have decided to assign the following roles between our group members to better organize ourselves for the next phase of the project:
- Responsible for submission and making sure timeline is being respected: Angela
- Text proofreading and formatting reviewer: Divya
- Code proofreader: Ece
- Rubric enforcer: everyone (we will do three rounds of rubric check to ensure everything has been covered)

Our team has selected the following tasks as our next five steps in the project:
1. Splitting up the work of Milestone 2 in a variety of tasks and assign these to each of the team members based on preference and abilities.
2. Learning more deeply about custom Altair visualizations and exploring how we can achieve visuals that are similar to our high-fidelity mock-ups.
3. Ensure 100% of the tasks devised in the **Task Analysis** section of Milestone 1 have a corresponding visualization and description.
4. Implement the two required interactive elements of visualizations outlined in Milestone 2 instructions.
5. Finalize report, review and submit 2 days before the official deadline to account for possible delays.

The tasks above have been distributed according to the following timeline:
| Month and Week | Tasks |
|----------------|--------------------------------------------|
| November, week 1     | Exploration of creating custom Altair visualizations and aligning them with mock-ups; assign tasks to each person based on their preferences and abilities.             |
| November, week 2     | Completion of 3 out of 5 tasks; finalization of task descriptions.     |
| November, week 3     | Completion of 100% of tasks and task descriptions; Implementation of interactions.         |
| November, week 4     | Rounds of review, finalization of the report, and submission according to internal deadline (Nov. 26th). |


## Task 1

In [13]:
import math

In [14]:
sorted_df = spotify_data.sort_values(by='streams', ascending=False)

sorted_df['danceability_%'] = sorted_df['danceability_%'].div(100).round(2) * math.pi
sorted_df['acousticness_%'] = sorted_df['acousticness_%'].div(100).round(2) * math.pi
sorted_df['energy_%'] = sorted_df['energy_%'].div(100).round(2) * math.pi
sorted_df['valence_%'] = sorted_df['valence_%'].div(100).round(2) * math.pi

filtered_df = sorted_df.head(10)

filtered_df = filtered_df.sort_values(by='streams', ascending=False)
filtered_df = filtered_df.reset_index()
filtered_df

Unnamed: 0,index,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
0,55,Blinding Lights,The Weeknd,1,2019,11,29,43899,69,3703895074.0,...,171,C#,Major,1.57,1.19,2.51,0.0,0,9,7
1,179,Shape of You,Ed Sheeran,1,2017,1,6,32181,10,3562543890.0,...,96,C#,Minor,2.61,2.92,2.04,1.82,0,9,8
2,86,Someone You Loved,Lewis Capaldi,1,2018,11,8,17836,53,2887241814.0,...,110,C#,Major,1.57,1.41,1.29,2.36,0,11,3
3,620,Dance Monkey,Tones and I,1,2019,5,10,24529,0,2864791672.0,...,98,F#,Minor,2.58,1.7,1.85,2.17,0,18,10
4,41,Sunflower - Spider-Man: Into the Spider-Verse,"Post Malone, Swae Lee",2,2018,10,9,24094,78,2808096550.0,...,90,D,Major,2.39,2.86,1.57,1.7,0,7,5
5,162,One Dance,"Drake, WizKid, Kyla",3,2016,4,4,43257,24,2713922350.0,...,104,C#,Major,2.42,1.13,1.98,0.03,0,36,5
6,84,STAY (with Justin Bieber),"Justin Bieber, The Kid Laroi",2,2021,7,9,17050,36,2665343922.0,...,170,C#,Major,1.85,1.51,2.39,0.13,0,10,5
7,140,Believer,Imagine Dragons,1,2017,1,31,18986,23,2594040133.0,...,125,A#,Minor,2.42,2.32,2.45,0.13,0,23,11
8,725,Closer,"The Chainsmokers, Halsey",2,2016,5,31,28032,0,2591224264.0,...,95,G#,Major,2.36,2.01,1.63,1.29,0,11,3
9,48,Starboy,"The Weeknd, Daft Punk",2,2016,9,21,29536,79,2565529693.0,...,186,G,Major,2.14,1.54,1.85,0.5,0,13,28


In [15]:
theta_dance_1 = filtered_df['danceability_%'].iloc[0]
theta_dance_2 = filtered_df['danceability_%'].iloc[1]
theta_dance_3 = filtered_df['danceability_%'].iloc[2]
theta_dance_4 = filtered_df['danceability_%'].iloc[3]
theta_dance_5 = filtered_df['danceability_%'].iloc[4]
theta_dance_6 = filtered_df['danceability_%'].iloc[5]
theta_dance_7 = filtered_df['danceability_%'].iloc[6]
theta_dance_8 = filtered_df['danceability_%'].iloc[7]
theta_dance_9 = filtered_df['danceability_%'].iloc[8]
theta_dance_10 = filtered_df['danceability_%'].iloc[9]

dance_arc_1 = alt.Chart(pd.DataFrame(filtered_df.iloc[0]).T).mark_arc(
    radius=165, radius2=153, theta=theta_dance_1, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_2 = alt.Chart(pd.DataFrame(filtered_df.iloc[1]).T).mark_arc(
    radius=150, radius2=138, theta=theta_dance_2, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_3 = alt.Chart(pd.DataFrame(filtered_df.iloc[2]).T).mark_arc(
    radius=135, radius2=123, theta=theta_dance_3, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_4 = alt.Chart(pd.DataFrame(filtered_df.iloc[3]).T).mark_arc(
    radius=120, radius2=108, theta=theta_dance_4, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_5 = alt.Chart(pd.DataFrame(filtered_df.iloc[4]).T).mark_arc(
    radius=105, radius2=93, theta=theta_dance_5,
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_6 = alt.Chart(pd.DataFrame(filtered_df.iloc[5]).T).mark_arc(
    radius=90, radius2=78, theta=theta_dance_6, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_7 = alt.Chart(pd.DataFrame(filtered_df.iloc[6]).T).mark_arc(
    radius=75, radius2=63, theta=theta_dance_7, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_8 = alt.Chart(pd.DataFrame(filtered_df.iloc[7]).T).mark_arc(
    radius=60, radius2=48, theta=theta_dance_8, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_9 = alt.Chart(pd.DataFrame(filtered_df.iloc[8]).T).mark_arc(
    radius=45, radius2=33, theta=theta_dance_9, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

dance_arc_10 = alt.Chart(pd.DataFrame(filtered_df.iloc[9]).T).mark_arc(
    radius=30, radius2=18, theta=theta_dance_10, 
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)


layered_dance = alt.layer(dance_arc_1, dance_arc_2, dance_arc_3, dance_arc_4, dance_arc_5, dance_arc_6, 
          dance_arc_7, dance_arc_8, dance_arc_9, dance_arc_10)

layered_dance

In [16]:
theta_energy_1 = filtered_df['energy_%'].iloc[0]
theta_energy_2 = filtered_df['energy_%'].iloc[1]
theta_energy_3 = filtered_df['energy_%'].iloc[2]
theta_energy_4 = filtered_df['energy_%'].iloc[3]
theta_energy_5 = filtered_df['energy_%'].iloc[4]
theta_energy_6 = filtered_df['energy_%'].iloc[5]
theta_energy_7 = filtered_df['energy_%'].iloc[6]
theta_energy_8 = filtered_df['energy_%'].iloc[7]
theta_energy_9 = filtered_df['energy_%'].iloc[8]
theta_energy_10 = filtered_df['energy_%'].iloc[9]

energy_arc_1 = alt.Chart(pd.DataFrame(filtered_df.iloc[0]).T).mark_arc(
    radius=165, radius2=153,
    theta2=((2*math.pi) - theta_energy_1),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_2 = alt.Chart(pd.DataFrame(filtered_df.iloc[1]).T).mark_arc(
    radius=150, radius2=138,
    theta2=((2*math.pi) - theta_energy_2),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_3 = alt.Chart(pd.DataFrame(filtered_df.iloc[2]).T).mark_arc(
    radius=135, radius2=123,
    theta2=((2*math.pi) - theta_energy_3),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_4 = alt.Chart(pd.DataFrame(filtered_df.iloc[3]).T).mark_arc(
    radius=120, radius2=108,
    theta2=((2*math.pi) - theta_energy_4),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_5 = alt.Chart(pd.DataFrame(filtered_df.iloc[4]).T).mark_arc(
    radius=105, radius2=93,
    theta2=((2*math.pi) - theta_energy_5),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_6 = alt.Chart(pd.DataFrame(filtered_df.iloc[5]).T).mark_arc(
    radius=90, radius2=78,
    theta2=((2*math.pi) - theta_energy_6),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_7 = alt.Chart(pd.DataFrame(filtered_df.iloc[6]).T).mark_arc(
    radius=75, radius2=63,
    theta2=((2*math.pi) - theta_energy_7),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_8 = alt.Chart(pd.DataFrame(filtered_df.iloc[7]).T).mark_arc(
    radius=60, radius2=48,
    theta2=((2*math.pi) - theta_energy_8),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_9 = alt.Chart(pd.DataFrame(filtered_df.iloc[8]).T).mark_arc(
    radius=45, radius2=33,
    theta2=((2*math.pi) - theta_energy_9),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

energy_arc_10 = alt.Chart(pd.DataFrame(filtered_df.iloc[9]).T).mark_arc(
    radius=30, radius2=18,
    theta2=((2*math.pi) - theta_energy_10),
    stroke="white", strokeWidth=2
).encode(
    color=alt.Color(field="key", type="nominal"),
    tooltip=[
        alt.Tooltip("artist(s)_name:N", title="Artist"),
        alt.Tooltip("track_name:N", title="Song"),
        alt.Tooltip("energy_%:Q", title="Energy"),
        alt.Tooltip("danceability_%:Q", title="Danceability")
    ]
)

layered_energy = alt.layer(
    energy_arc_1, energy_arc_2, energy_arc_3, energy_arc_4, energy_arc_5,
    energy_arc_6, energy_arc_7, energy_arc_8, energy_arc_9, energy_arc_10
)

layered_energy

In [17]:
layered_all = alt.layer(layered_energy, layered_dance)
layered_all

### Top of the dashboard

1. count of records vs count of streams
2. time series of releases per year

In [34]:
# count of records vs count of streams
histogram_streams = alt.Chart(spotify_data).mark_bar().encode(
        alt.X('streams', bin=alt.Bin(maxbins=40), title='Number of streams'),
        y='count()',
    ).properties(
        width=600,  
        height=250,  
        title='Distribution of songs per stream bracket'
    )

histogram_streams

In [39]:
# time series of releases per year
time_series = alt.Chart(spotify_data).mark_line().encode(
    alt.X('released_month', title='Month of release'),
    y='count()',
).properties(width=600,  
             height=250, 
             title='Time series of release per month'
)

time_series