<a href="https://colab.research.google.com/github/Tzvi-H/jigsaw-labs/blob/main/6-top-songs/albums-lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Albums and Songs Lab

### Introduction

In this lesson, we'll use the skills we have learned over the past several lessons to answer questions about the top songs, artists and albums over the past fifty years.

### Working with Songs

Let's start by working with data regarding top 500 albums according to the Rolling Stone Magazine.

In [1]:
import pandas as pd
url = "https://raw.githubusercontent.com/eng-6-22/mod-1-a-data-structures/master/6-top-songs/data.csv"
df = pd.read_csv(url)
albums = df.to_dict('records')

In [3]:
albums[0]

{'number': 1,
 'year': 1967,
 'album': "Sgt. Pepper's Lonely Hearts Club Band",
 'artist': 'The Beatles',
 'genre': 'Rock',
 'subgenre': 'Rock & Roll, Psychedelic Rock'}

In [4]:
len(albums)

478

> Well, 478.

Let's write some functions to help us better explore the data.

* `all_albums` - Takes an argument of albums and returns the list of album names.

* `all_artists` - Takes argument of list of albums and returns a list of all artists (where each element is a string), and no artist is repeated. 

* `find_by_name` - Has one argument of `album_name`. Returns a dictionary of the correct album, or `None` if no album is found.

* `find_by_ranks` - Takes `begin_rank` and `end_rank` as arguments.  Also possible to execute the function by just providing the `begin_rank` or `end_rank` (and not both).  If no arguments are provided the entire list of albums are returned.

* `find_by_years` - Takes `begin_year` and `end_year` as arguments, and returns a list of dictionaries for albums between those years.  Also possible to execute the function by just providing the `begin_year` or `end_year` (and not both).

In [8]:
def all_albums(albums):
  return [album['album'] for album in albums]

In [12]:
def all_artists(albums):
  return [album['artist'] for album in albums]
# all_artists(albums)[:5]
# all_artists(albums)[-5:]

In [17]:
def find_by_name(album_name):
  for album in albums:
    if album['album'] == album_name:
      return album
  return None
# find_by_name('The Dark Side of the Moon')

In [25]:
def find_by_ranks(begin_rank=1, end_rank=500):
  return [album for album in albums if begin_rank <= album['number'] <= end_rank]
# len(find_by_ranks(400, end_rank=399))
len(find_by_ranks())

478

In [32]:
def find_by_years(begin_year=float('-inf'), end_year=float('inf')):
  return [album for album in albums if begin_year <= album['year'] <= end_year]
find_by_years(end_year=1955)

[{'number': 101,
  'year': 1955,
  'album': 'In the Wee Small Hours',
  'artist': 'Frank Sinatra',
  'genre': 'Jazz, Pop',
  'subgenre': 'Big Band, Ballad'}]

### Working with Songs

Next, let's load up data related to songs, and data that connects albums and songs.

In [33]:
import pandas as pd
songs_url = "https://raw.githubusercontent.com/eng-6-22/mod-1-a-data-structures/master/6-top-songs/top-500-songs.txt"
songs_df = pd.read_csv(songs_url, sep='\t', header = None, names = ['rank', 'song', 'artist', 'year'])
songs = songs_df.to_dict('records')

track_url = "https://raw.githubusercontent.com/eng-6-22/mod-1-a-data-structures/master/6-top-songs/track_data.json"
albums_and_tracks = pd.read_json(track_url)
albums_tracks = albums_and_tracks.to_dict('records')

In [36]:
songs[:5]

[{'rank': 1,
  'song': 'Like a Rolling Stone',
  'artist': 'Bob Dylan',
  'year': 1965},
 {'rank': 2,
  'song': 'Satisfaction',
  'artist': 'The Rolling Stones',
  'year': 1965},
 {'rank': 3, 'song': 'Imagine', 'artist': 'John Lennon', 'year': 1971},
 {'rank': 4, 'song': "What's Going On", 'artist': 'Marvin Gaye', 'year': 1971},
 {'rank': 5, 'song': 'Respect', 'artist': 'Aretha Franklin', 'year': 1967}]

In [37]:
albums_tracks[0]

{'artist': 'The Beatles',
 'album': "Sgt. Pepper's Lonely Hearts Club Band",
 'tracks': ["Sgt. Pepper's Lonely Hearts Club Band - Remix",
  'With A Little Help From My Friends - Remix',
  'Lucy In The Sky With Diamonds - Remix',
  'Getting Better - Remix',
  'Fixing A Hole - Remix',
  "She's Leaving Home - Remix",
  'Being For The Benefit Of Mr. Kite! - Remix',
  'Within You Without You - Remix',
  "When I'm Sixty-Four - Remix",
  'Lovely Rita - Remix',
  'Good Morning Good Morning - Remix',
  "Sgt. Pepper's Lonely Hearts Club Band (Reprise) - Remix",
  'A Day In The Life - Remix',
  "Sgt. Pepper's Lonely Hearts Club Band - Take 9 And Speech",
  'With A Little Help From My Friends - Take 1 / False Start And Take 2 / Instrumental',
  'Lucy In The Sky With Diamonds - Take 1',
  'Getting Better - Take 1 / Instrumental And Speech At The End',
  'Fixing A Hole - Speech And Take 3',
  "She's Leaving Home - Take 1 / Instrumental",
  'Being For The Benefit Of Mr. Kite! - Take 4',
  'Within You

* Write functions that perform the following: 

In [53]:
song_names = [song['song'] for song in songs]

def song_in_songs_other(songs_a, songs_b):
  return [song for song in songs_a if song in songs_b]

def album_most_top_songs():
  max_album = None
  max_count = float('-inf')

  for album in albums_tracks:
    tracks = album['tracks']
    count = len(song_in_songs_other(tracks, song_names))
    if count > max_count:
      max_count = count
      max_album = album

  return max_album['artist'], max_album['album'], max_count    
album_most_top_songs()

('Elvis Presley', 'Elvis Presley', 8)

In [64]:
def top_ten_albums_by_songs():
  albums_with_top_tracks = []
  top_ten_albums = {}

  for album in albums_tracks:
    top_tracks = song_in_songs_other(album['tracks'], song_names)
    if (len(top_tracks) > 0):
      albums_with_top_tracks.append({
          'tracks': top_tracks,
          'album_name': album['album']
      })
  
  top_albums = sorted(albums_with_top_tracks, key=lambda x: len(x['tracks']), reverse=True)

  for album in top_albums[:10]:
    album_name = album['album_name']
    tracks = album['tracks']
    top_ten_albums[album_name] = len(tracks)
  
  return top_ten_albums

top_ten_albums_by_songs()

{'Elvis Presley': 8,
 'The Sun Records Collection': 6,
 'Are You Experienced': 4,
 'Portrait of a Legend 1951-1964': 4,
 'Highway 61 Revisited': 3,
 'Bringing It All Back Home': 3,
 'Star Time': 3,
 'Led Zeppelin II': 3,
 'I Never Loved a Man the Way I Love You': 3,
 'All the Young Dudes': 3}

`album_most_top_songs` - 
    * Returns the name of the artist and album that has that most songs featured on the top 500 songs list

`top_ten_albums_by_songs` - returns a dictionary with the 10 albums that have the most songs that appear in the top songs list. The album names should be the keys and the corresponding values should be the number of songs that appear on the top 500 list.