![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=holiday-songs-linegraphs-scatterplots/holiday-songs-linegraph-scatterplot.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto's Weekly Data Visualization
## Christmas Tunes Popularity
### Recommended grade level: 6-12

### Instructions:
#### Step 1 (your only step): “Run” the cells to see the graphs
Click “Cell” and select “Run All.” This will import the data and run all the code to make this week's data visualizations (scroll to the top after you’ve run the cells). **You don’t need to do any coding.**

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

After a code cell runs, a number appears in the top left corner. If the code cell experiences a technical error some red script will appear below the cell.

### About This Notebook:

Callysto's Weekly Data Visualization is a learning resource that helps Grades 5-12 teachers and students grow and develop data literacy skills. We do this by providing a data visualization, like a graph, and asking teachers and students to interpret it. This companion resource walks learners through how the data visualization is created and interpreted using the data science process. The steps of this process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need.
3. Organize - Arrange the data so that you can easily explore it.
4. Explore - Examine the data to look for evidence to answer our question. This includes creating visualizations.
5. Interpret - Explain how the evidence answers our question.
6. Communicate - Reflect on the interpretation.

## 1. Question
#### Which pop songs compete with "All I want For Christmas is You" in terms of holiday popularity?

You’ve likely heard this holiday tune before. Mariah Carey’s *All I Want for Christmas is You* is one of the [best-selling singles of all time](https://en.wikipedia.org/wiki/All_I_Want_for_Christmas_Is_You). 

There are a lot of different ways to decide what makes a song "popular". We will use [Google Trends](https://trends.google.com/trends/?geo=US) data. This is data of Google searches for a given topic – in our case, a small collection of popular Christmas pop songs


## 2. Gather

The code below will import the Python programming libraries, a software function that helps us gather and organize the data to answer our question. This notebook will attempt to collect new data from Google. But, if anything goes wrong with the connection between the notebook and Google, the notebook will use backup data. The backup data was sourced from [Google Trends](https://trends.google.com/trends/?geo=US) on November 16, 2020.

First, we will import the Python libraries we need.

In [None]:
%pip install -q pyodide_http plotly nbformat
import pyodide_http
pyodide_http.patch_all()
# Import python libraries
import pandas as pd
import plotly.graph_objects as go
import os

Then, we try to connect to Google to grab current information. If that fails, we need to use the backup Google Trends data (.csv) that is saved alongside this notebook.

In [None]:
try:
    #import additional libraries needed
    from pytrends.request import TrendReq
    from datetime import datetime
    from datetime import date
    
    #set up the connection to google
    pytrends = TrendReq(hl='en-US', tz=360)
    
    #create string holding todays date
    today = date.today()
    today = str(today)

    #keys found by manually finding the 'mid' matching the desired search term using 'pytrends.suggestions()'
    mariah_key = '/g/1s05ybgbj'
    wham_key = '/g/1q5jd15tj'
    kelly_key = '/g/1q5j6dsqk'

    #call pytrends to grab google trends data
    pytrends.build_payload(kw_list=[mariah_key, kelly_key, wham_key],
                      timeframe='2013-11-01 '+today)
    
    #from the trend data collected from google, save the interest over time data to a datframe
    df = pytrends.interest_over_time()
    
    #organize the data to look a bit more like the data grabbed from google manually
    df.reset_index(inplace=True)
    df.drop(['isPartial'], axis=1, inplace=True)
    
    #convert the 'datetime' object to a string so it can be more easily manipulated 
    df.date = df.date.map(lambda x: x.strftime("%Y-%m"))
    print("Notebook succesfully connected to Google Trends.\nUsing current data.")
    
except:
    # Create a pandas dataframe from our saved data
    path = os.path.join('datasets', 'christmas-songs.csv')
    df = pd.read_csv(path, skiprows=1)
    print("Notebook could not connect to Google Trends.\nUsing backup csv.")

#show the first 5 rows of the data
df

## 3. Organize
This data is fairly organized. However, the column names will be made clearer. We will rename the columns. Additionaly, if the backup data was used there are '<1' scores that need to be replaced.

Our first step in organizing will be creating string variables for the song titles and renaming the columns to something easier to work with.

In [None]:
# Create variables to hold the column/song names
mariah = 'All I Want For Christmas Is You'
wham = 'Last Christmas'
kelly = 'Underneath the Tree'
# rename the columns
df.columns=['Month', mariah, kelly, wham]

Next, we are going to replace any '<1' score with '0' so we can deal with all scores as numeric data.

In [None]:
# replace any cell showing '<1' with 0
df.loc[:,mariah] = df[mariah].map(lambda x: 0 if x == '<1' else x)
df.loc[:,wham] = df[wham].map(lambda x: 0 if x == '<1' else x)
df.loc[:,kelly] = df[kelly].map(lambda x: 0 if x == '<1' else x)
# show the first 5 rows of data
df.head()

Finally, we create a second data set only looking at December data points since those are likely the most interesting to us.

In [None]:
# Create a separate dataframe only showing rows for December
# So, only show rows where the Column Month ends in '12'
df_december = df[df['Month'].map(lambda x: x[5]+x[6] == '12')]
df_december.head()

## 4. Explore
The code below will be used to create a line graph and scatter plot chart to explore the question: "Which pop songs compete with All I Want For Christmas is You in terms of holiday popularity?"

The code cell below creates two plots: 

* A line graph that looks at our more complete data set 
* A scatter plot that looks at some details of the December data set. 

The cell below does not show the plots. It just creates them and they will be displayed when '.show()' is called in a later cell.

In [None]:
# This code creates a line graph using the three song columns for the y-axis and uses the Month column as the x-axis
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['Month'], y=df[mariah],
                        mode='lines', name='All I Want for Christmas is You by Mariah Carey'))
fig.add_trace(go.Scatter(x=df['Month'], y = df[wham],
                        mode='lines', name='Last Christmas by Wham!'))
fig.add_trace(go.Scatter(x=df['Month'], y=df[kelly],
             mode='lines', name='Underneath The Tree by Kelly Clarkson'))

# Label and formats the interactive plot:
fig.update_layout(title='Popular Christmas Songs According to Google Trends',
                 xaxis_title="Date", yaxis_title='Relative Popularity',
                 legend=dict(orientation="h",
                             yanchor="bottom", y=1.02,
                             xanchor="right", x=1),
                 hovermode='x')


# This code creates a scatterplot showing the difference in score between Wham! and Mariah Carey
fig2 = go.Figure()
fig2.add_trace(go.Scatter(x=df_december['Month'], y=df_december[mariah]-df_december[wham],
                          name='Test', 
                          hovertemplate='Score difference: Mariah - Wham! = %{y}<extra></extra>',
                          marker=dict(
                                size=16,
                                cmax=39,
                                cmin=0,
                                color=df_december[mariah]-df_december[wham],
                                colorbar=dict(title="Score Difference"),
                                colorscale=['#ff7f0e', 'blue']),
                            mode="markers"))

# Label and formats the interactive plot
fig2.update_layout(title='Mariah and Wham! Google Trends December Score Differences',
                 xaxis_title="Date", yaxis_title='\'All I want for Christmas is You\' minus \'Last Christmas\'',
                  hovermode='x');

We can now show the first plot.

In [None]:
# show the line graph
fig.show()

The line graph shows how popular the three songs are. The Google Trends data uses Google searches to measure popularity and assigns the most popular data point a score of 100. Every score less than the 100 score is scored relatively. So, a score of 50 is half as popular of a search than the 100 scored search for Mariah Carey in December 2014. 

The second plot is shown below.

In [None]:
fig2.show()

This scatterplot has 7 dots: one for each December between 2012 and 2020. The x-axis shows the difference between the score of *All I want for Christmas is You* and *Last Christmas*. So the higher the score, the higher Mariah scored relative to Wham! and the lower the score the more highly scored Wham! was.

## 5. Interpret
### Answering Our Question
Wham! released *Last Christmas* in 1986. Mariah Carey's *All I Want For Christmas is You* came out in 1994. Because of the release dates, it is very interesting that Google searches for *Last Christmas* were higher than *All I Want For Christmas is You* in December 2019. The second plot, the scatterplot, seems to show *Last Christmas* gaining in popularity as a search term in December over the last few years.

Kelly Clarkson's 2013 hit, *Under The Christmas Tree* never even made it to the double digits. This means the song never got to 10% of the popularity that *All I Want For Christmas is You* had in it's December 2013 peak.

None of the songs seemed relatively popular outside of the holiday season


## 6. Communicate
Below are some writing prompts to help you reflect on the new information that is presented from the data. When we look at the evidence, think about what you perceive about the information. Is this perception based on what the evidence shows? If others were to view it, what perceptions might they have?
- "I used to think __ but now I know __."
- "I wish I knew more about __."
- "These data visualizations reminds me of __."
- "I really like __."
- "Other data sources that would be interesting include ___."
- "Another popular song to look at could be ___ ."
- "I think the song by Wham! that came out in the 80s was popular last year because ___ .”
- "I think the most popular December 2020 will be___”

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)