# Nested Loops Lab

### Introduction

In this lesson, we'll review working with lists of dictionaries.  Let's get started. 

> Do for one, then do for all.

### Loading our Data

In [53]:
url = "https://en.wikipedia.org/wiki/List_of_most-streamed_songs_on_Spotify"
import pandas as pd

dfs = pd.read_html(url)

In [54]:
df = dfs[0]

In [55]:
songs = df.to_dict('records')

### Working with our data

Now that we've downloaded our data, let's start exploring it.  Begin by selecting the first element from our list of songs, and assign it to the variable `first_song`.

In [56]:
first_song = songs[0]

first_song

{'Rank': '1',
 'Song': '"Shape of You"',
 'Artist(s)': 'Ed Sheeran',
 'Album': '÷',
 'Streams(Millions)': '2943',
 'Date published': '6 January 2017',
 'Ref.': '[11]'}

Now take a look at the last three albums in the list.

In [57]:
songs[-3:]

[{'Rank': '99',
  'Song': '"Cheerleader (Felix Jaehn Remix)"',
  'Artist(s)': 'Omi and Felix Jaehn',
  'Album': 'Me 4 U',
  'Streams(Millions)': '1217',
  'Date published': '19 May 2014',
  'Ref.': nan},
 {'Rank': '100',
  'Song': '"Can\'t Feel My Face"',
  'Artist(s)': 'The Weeknd',
  'Album': 'Beauty Behind the Madness',
  'Streams(Millions)': '1210',
  'Date published': '8 June 2015',
  'Ref.': '[12]'},
 {'Rank': 'As of 19 October 2021',
  'Song': 'As of 19 October 2021',
  'Artist(s)': 'As of 19 October 2021',
  'Album': 'As of 19 October 2021',
  'Streams(Millions)': 'As of 19 October 2021',
  'Date published': 'As of 19 October 2021',
  'Ref.': 'As of 19 October 2021'}]

It looks like the last album is not, an album at all, so instead select all about the last album and assign it to a list called selected albums.

In [58]:
selected_songs = songs[:-1]

In [59]:
len(selected_albums)

100

And let's confirm that the last song is in fact a song.  Select the last song.

In [60]:
last_song = selected_songs[-1]
last_song

# {'Rank': '100',
#  'Song': '"Can\'t Feel My Face"',
#  'Artist(s)': 'The Weeknd',
#  'Album': 'Beauty Behind the Madness',
#  'Streams(Millions)': '1210',
#  'Date published': '8 June 2015',
#  'Ref.': '[12]'}

{'Rank': '100',
 'Song': '"Can\'t Feel My Face"',
 'Artist(s)': 'The Weeknd',
 'Album': 'Beauty Behind the Madness',
 'Streams(Millions)': '1210',
 'Date published': '8 June 2015',
 'Ref.': '[12]'}

Selecting data.  Ok, now from the above list of dictionaries, let's create a a list of just the name of the each of the songs.

In [65]:
names = []

for song in songs:
    names.append(song['Song'])

In [66]:
names[:3]

['"Shape of You"', '"Blinding Lights"', '"Dance Monkey"']

Now use what we know about lists and sets to find the *number* of albums that were listed twice on top 100 songs list.

In [None]:
album_names = []

for song in songs:
    album_names.append(song['Album'])
    
len(album_names) - len(set(album_names))

# 17

Ok, now if we return to our original list of dictionaries, there is certain data that does not look like its of the correct type.

Change the `Rank`, `Streams(Millions)` to integers. It also looks like each of the songs names have an extra single or double quotation mark at the beginning and end of the songs.  Remove these extra quotation marks from each of the songs.

Assign this new list of songs to the variable `coerced_songs`.

In [72]:
coerced_songs = []
for song in selected_songs:
    copied_song = song.copy()
    copied_song['Rank'] = int(copied_song['Rank'])
    copied_song['Streams(Millions)'] = int(copied_song['Streams(Millions)'])
    copied_song['Song'] = copied_song['Song'][1:-1]
    coerced_songs.append(copied_song)

In [73]:
coerced_songs[:2]

[{'Rank': 1,
  'Song': 'Shape of You',
  'Artist(s)': 'Ed Sheeran',
  'Album': '÷',
  'Streams(Millions)': 2943,
  'Date published': '6 January 2017',
  'Ref.': '[11]'},
 {'Rank': 2,
  'Song': 'Blinding Lights',
  'Artist(s)': 'The Weeknd',
  'Album': 'After Hours',
  'Streams(Millions)': 2578,
  'Date published': '29 November 2019',
  'Ref.': '[12]'}]

Now if we select the `Rank` and `Streams(Millions)` from any of the dictionaries, we should see that they are of type integer.

In [74]:
first_coerced = coerced_songs[0]

type(first_coerced['Rank'])

int

In [75]:
type(first_coerced['Streams(Millions)'])

int

And if we view the title of even the last song, we should see that the first character is no longer a quotation mark but a letter.

In [83]:
coerced_songs[-1]['Song'][:1]

'C'

Now that we have this list of `coerced_songs`, let update our list of dictionaries even further.  If we look at one of the dictionaries, we'll see that date is hard to parse.

In [84]:
coerced_songs[0]['Date published']

'6 January 2017'

We'd like to create three new keys on each of the dictionaries and of day, month and year.  Also remove the date published key, as the information in this key would then be duplicative.

> You can delete a key from a dictionary with the pop method. 

In [86]:
blinding_lights = {'Rank': 2,
  'Song': 'Blinding Lights',
  'Artist(s)': 'The Weeknd',
  'Album': 'After Hours',
  'Streams(Millions)': 2578,
  'Date published': '29 November 2019',
  'Ref.': '[12]'}

blinding_lights.pop('Date published')

'29 November 2019'

In [87]:
blinding_lights

{'Rank': 2,
 'Song': 'Blinding Lights',
 'Artist(s)': 'The Weeknd',
 'Album': 'After Hours',
 'Streams(Millions)': 2578,
 'Ref.': '[12]'}

Ok, so create three new keys of `day`, `month` and year for each of our `coerced_songs` and then remove the `'Date published'` key.

Assign the new list to the variable `dated_songs`.

In [91]:
coerced_songs[0]['Date published'].split()

['6', 'January', '2017']

In [94]:
dated_songs = []
for song in coerced_songs:
    copied_song = song.copy()
    day, month, year = copied_song['Date published'].split()
    copied_song['month'] = month
    copied_song['day'] = int(day)
    copied_song['year'] = int(year)
    
    copied_song.pop('Date published')
    dated_songs.append(copied_song)

In [95]:
dated_songs[:2]

[{'Rank': 1,
  'Song': 'Shape of You',
  'Artist(s)': 'Ed Sheeran',
  'Album': '÷',
  'Streams(Millions)': 2943,
  'Ref.': '[11]',
  'month': 'January',
  'day': 6,
  'year': 2017},
 {'Rank': 2,
  'Song': 'Blinding Lights',
  'Artist(s)': 'The Weeknd',
  'Album': 'After Hours',
  'Streams(Millions)': 2578,
  'Ref.': '[12]',
  'month': 'November',
  'day': 29,
  'year': 2019}]

### Bonus

Ok, now remember that we like to convert as many values as possible to numbers.  One of the attributes that perhaps should be a number is the month.  We'd like to convert `January` to `1` and `November` to `11` for example.  We'll get you started with this by creating a dictionary that has the keys and corresponding value for each month.  

In [96]:
month_nums = {'January': 1, 'February': 2, 'March': 3, 'April': 4, 'May': 5, 'June': 6, 'July': 7, 'August': 8, 'September': 9, 'October': 10, 'November': 11, 'December': 12} 

And now notice that if we pass the month as any key, we are returned the corresponding value.

In [97]:
month_nums['January']

1

So use the above, to convert each of the `dated_songs` month attributes to the corresponding number.  Assign the result to the list `formatted_songs`.

In [99]:
formatted_songs = []
for dated_song in dated_songs:
    copied_song = dated_song.copy()
    month_text = copied_song['month']
    month_num = month_nums[month_text]
    copied_song['month'] = month_num
    formatted_songs.append(copied_song)

And now we can see that each of the songs is represented by a number.

In [101]:
formatted_songs[0]['month']

1

In [102]:
formatted_songs[1]['month']

11

### Summary

In this lesson, we practiced...