![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Finteresting-problems&branch=main&subPath=notebooks/song-lyrics-pie-charts.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Song Lyrics Pie Charts

Inspired by [this tweet](https://twitter.com/SalMathGuy/status/1285585953155715072), this is a Jupyter notebook that allows you to create word frequency visualizations from song lyrics. If you are viewing it on [GitHub](https://github.com) it won't be interactive, click [Open in Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Finteresting-problems&branch=main&subPath=notebooks/song-lyrics-pie-charts.ipynb&depth=1) to be able to run the code.

Find the lyrics for a song you are interested in, and paste the text on the lines between the `"""` in the cell below. 

Then click `▶Run` button to import the lyrics, or the `▶▶` button to run all of the cells in this notebook.

In [None]:
song_title = "Hey Jude"

lyrics = """
Replace this with your song lyrics or other text

"""

print('Lyrics imported for '+song_title)

`▶Run` the next cell to create a pie chart.

You can also change the number in the `how_many_words = 5` line to see 

In [None]:
how_many_words = 5

import pandas as pd
from collections import Counter
import plotly.express as px
# replace some punctuation
lyrics = lyrics.replace('\n',' ').replace('-',' ')
for punctuation in ['(',')','?','!','.',',']:
    lyrics = lyrics.replace(punctuation, '')
# convert to a list of lowercase words
words = lyrics.lower().split(' ')
# drop any empty values
words = [word for word in words if word != '']
# create a dataframe of word frequencies
df = pd.DataFrame.from_dict(Counter(words), orient='index', columns=['Frequency'])
df = df.reset_index().sort_values(by='Frequency', ascending=False)
# calculate the frequency of words that are not in the top 5
other_words_frequency = len(words) - df.head(how_many_words)['Frequency'].sum()
new_row = pd.DataFrame({'index':'other words', 'Frequency':other_words_frequency}, index=[0])
df = pd.concat([df, new_row])
# create a percent column
df['Percent'] = df['Frequency']/len(words)*100
# rename the index column to Word
df = df.rename(columns={'index':'Word'})
# sort the values
df = df.sort_values(by='Frequency', ascending=False)
# create a pie chart
px.pie(df.head(how_many_words+1), values='Percent', names='Word', title='Top '+str(how_many_words)+' Words in '+song_title)

`▶Run` the next cell to create a horizontal bar chart for the 10 most frequent words, not including the "other words" row.

In [None]:
most_frequent = 10

df2 = df[df['Word']!='other words'].sort_values(by='Frequency').tail(most_frequent)
px.bar(df2, y='Percent', x='Word', title=str(most_frequent) + ' Most Frequent Words in '+song_title)

## Next Steps

You can run this notebook again using text of the lyrics from another song, or with any text.

If you are interested in more text analysis, check out [Shakespeare and Statistics](https://github.com/callysto/curriculum-notebooks/blob/master/EnglishLanguageArts/ShakespeareStatistics/shakespeare-and-statistics.ipynb).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)