 ##  Streamed vs. Radio

## Introduction

This project focuses on popularity trends within the music industry and the potential disparity between corporate chhoice versus user choice. Specifically, this study examines the variation in radio and streaming music preferences from 2013 to 2022. Using the Billboard APIs, we can observe correlation between streamed music and radio music. Further, using the Genius API, we will look at the number of unique words in radio music versus streamed music. I am interested in whether or not the vocabulary of radio music and streamed songs have progressed at a similar rate during the last decade. 

## Hypothesis

My hypothesis is that streaming preferences were once in alignment with the songs on the radio, but this has occurred less and less over time. Factors responsible for this change could include an increase in streaming availability, making all types of music more accessible to audiences.

## APIs

This project uses the Billboard and LyricGenius APIs. 

In [17]:
import sys
!{sys.executable} -m pip install numpy billboard.py lyricsgenius

Collecting billboard.py
  Downloading billboard.py-7.0.0-py2.py3-none-any.whl (7.0 kB)
Collecting lyricsgenius
  Downloading lyricsgenius-3.0.1-py3-none-any.whl (59 kB)
[K     |████████████████████████████████| 59 kB 960 kB/s eta 0:00:01
Installing collected packages: lyricsgenius, billboard.py
Successfully installed billboard.py-7.0.0 lyricsgenius-3.0.1


In [18]:


# Import billoard and genius APIs
import billboard
import lyricsgenius

#Import Python tools for handling numbers and statistics
import numpy
import statistics

# Using my unique genius API key. 
genius = lyricsgenius.Genius("R68yfEAxTTnQh-X7dkseqDysn96StA92nZJQn6gqy5JrZhmQbm6kWZvT7WH22s1s")

print("Percent of top streaming songs matching top radio songs")

# start of a delimited output report, header
report = "year\tpercent radio matching streaming\tstreaming avg vocab\tradio avg vocab\tstreaming median vocab\tradio median vocab\n"

# this is the available range of data from billboard API, 2013-2022
for year in range(2013, 2022):
    # get the chart of top streaming songs
    chartStreaming = billboard.ChartData('streaming-songs', year=year)
    
    # get the chart of top radio songs
    chartRadio = billboard.ChartData('radio-songs', year=year)
    
    # keep track of unqiue words per song
    wordsStreaming = []
    wordsRadio = []

    songsStreaming = []
    print("processing streaming ", year)
    # get the lyrics for each of the songs returned above, retry if API fails
    for song in chartStreaming: # for each song in chartStreaming
        lyric = None 
        keepTryingGenius = True
        retryCount = 20
        while keepTryingGenius == True and retryCount > 0:
            try:
                lyric = genius.search_song(song.title, song.artist) # get the lyrics to the song
                keepTryingGenius = False # stop even if lyric is empty (not found)
            except:
                print("We need to retry ", song.title, " by ", song.artist)
                retryCount = retryCount - 1
        if lyric != None:
            count = len(set(lyric.lyrics.split())) # count the unique words in song lycrics
            wordsStreaming.append(count) # append the count to the list of streaming words
        songsStreaming.append(song.title) # in theory titles with different artists could match; in practice, not

    # same as above, but for radio
    # a generic function could be written to reduce duplication
    songsRadio = []
    print("processing radio ", year)
    for song in chartRadio:
        lyric = None 
        keepTryingGenius = True
        retryCount = 20
        if(song.title.startswith("2017")): # the 2017 grammys is not a single song. ignore it.
            keepTryingGenius = False # ignore a series of grouped songs, special case, not relevant
        while keepTryingGenius == True and retryCount > 0:
            try:
                lyric = genius.search_song(song.title, song.artist)
                keepTryingGenius = False # stop even if lyric is empty (not found)
            except:
                print("We need to retry ", song.title, " by ", song.artist)
                retryCount = retryCount - 1
        if lyric != None:
            count = len(set(lyric.lyrics.split()))
            wordsRadio.append(count)
        songsRadio.append(song.title)

    d = [value for value in songsStreaming if value in songsRadio] # d becomes count of matching songs between radio and streaming per year
    percentMatch = len(d) / len(songsRadio) * 100 # calculate percent match instead of absolute number
    # add to report
    report = report + str(year) + "\t" + str(percentMatch) + "%\t" +  str(numpy.average(wordsStreaming)) + "\t" + str(numpy.average(wordsRadio)) + "\t" + str(statistics.median(wordsStreaming)) + "\t" + str(statistics.median(wordsRadio)) + "\n"

# when loop is complete processing all years, print the report
print(report)





Percent of top streaming songs matching top radio songs
processing streaming  2013
Searching for "Harlem Shake" by Baauer...
Done.
Searching for "Gangnam Style" by PSY...
Done.
Searching for "Thrift Shop" by Macklemore & Ryan Lewis Featuring Wanz...
Done.
Searching for "Radioactive" by Imagine Dragons...
Done.
Searching for "We Can't Stop" by Miley Cyrus...
Done.
Searching for "Wrecking Ball" by Miley Cyrus...
Done.
Searching for "Blurred Lines" by Robin Thicke Featuring T.I. + Pharrell...
Done.
Searching for "Can't Hold Us" by Macklemore & Ryan Lewis Featuring Ray Dalton...
Done.
Searching for "Started From The Bottom" by Drake...
Done.
Searching for "Sail" by AWOLNATION...
Done.
Searching for "Get Lucky" by Daft Punk Featuring Pharrell Williams...
Done.
Searching for "When I Was Your Man" by Bruno Mars...
Done.
Searching for "Roar" by Katy Perry...
Done.
Searching for "The Way" by Ariana Grande Featuring Mac Miller...
Done.
Searching for "Mirrors" by Justin Timberlake...
Done.
Search



|year         | percent radio matching streaming           | streaming avg vocab   | radio avg vocab | streaming median vocab | radio median vocab |
| ----------- |:---------:| ------:| ------:| ------:| ------:|
|2013 | 60% | 269 | 256 | 148.5 | 139 |
|2014 | 68% | 194 | 147 | 138 | 137.5 |
|2015 | 64% | 178 | 155 | 156.5 | 143 |
|2016 | 66% | 210 | 362 | 156 | 148 |
|2017 | 56% | 217 | 185 | 202 | 145 |
|2018 | 46% | 222 | 171 | 194 | 149 |
|2019 | 58% | 204 | 174 | 179 | 144.5 |
|2020 | 56% | 233 | 201 | 176 | 144 |
|2021 | 44% | 247 | 184 | 212.5 | 149 |




![Radio Streaming Percent Match](radio_streaming_match.jpg) ![Streaming_Vs_Radio_Vocab](streaming_vs_radio_vocab.jpg)


## Conclussion 

These results indicate that people once streamed similar music to what was played on the radio, but this pattern has been occurring less and less from 2013 to 2022. Further, the vocabulary of streamed music versus radio was once the same, but the number of unique words in streaming music has drastically increased, while the number of unique words in radio music has remained static. 

These results leave me with more significant questions, such as why the number of unique words in streaming music continues to grow. Perhaps radio is meant for a wide general audience, making it impossible for the preferences of all audience members to be exemplified. Has pandemic lockdown impacted the time that people spend listening to the radio, making listeners more likely to stream new songs? Streaming versus radio preferences began to take a big shift in 2014, indicating that this hypothesis could not be the only factor responsible for the relationship between radio and streaming.
