# Instructions:

## Getting Started
In this exercise, we will be using data from `rolling stone's top 500 albums`. We have this data contained in the `data.csv` file. We will be building out the following functions to answer questions and interact with this data.

> **remember:** reading from a csv file in python looks like the following:

```python 
import csv

    with open(file_name) as f:
        # we are using DictReader because we want our information to be in dictionary format.
        reader = csv.DictReader(f)
        # some more code
```

Once we have our `reader` reading our file as dictionaries, we want our data to be a list of dictionaries. So, we need to loop through our `reader` and create a list. *hint: list comprehension / for loops are your friend"

```python
# our data will look something like this once we have read it and turned it into a list of `OrderedDict`s
# don't worry, the ordered dicts look different but we can interact with them the same way we do normal dicts
[OrderedDict([('number', '1'), ('year', '1967'), ('album', "Sgt. Pepper's Lonely Hearts Club Band"), ('artist', 'The Beatles'), ('genre', 'Rock'), ('subgenre', 'Rock & Roll, Psychedelic Rock')]), OrderedDict([('number', '2'), ('year', '1966'), ('album', 'Pet Sounds'), ('artist', 'The Beach Boys'), ('genre', 'Rock'), ('subgenre', 'Pop Rock, Psychedelic Rock')]), OrderedDict([('number', '3'), ('year', '1966'), ('album', 'Revolver'), ('artist', 'The Beatles'), ('genre', 'Rock'), ('subgenre', 'Psychedelic Rock, Pop Rock')])]
```

After we have our data formated the way we want it, we can now begin working on defining our functions.

In [1]:
%load_ext autoreload
%autoreload 2 

from functions import album_name

ImportError: cannot import name 'album_name'

In [None]:
import csv  #imports a cvs file

In [None]:
top_500_albums = []      #creates top 500 albums list

with open('data.csv') as read_file:      #open the data as a read file
    albums = csv.DictReader(read_file)   #sets the data file as albums 

    for album in albums:                  #for each album in the data file
        top_500_albums.append(album)      #add it to the top 500 albums list 

In [None]:
top_500_albums[25] #return the top 25 entries from out list


### Functions to build-out:

Each of the following functions can be defined in the `functions.py` file. 

* **Searching functions**
  * Find by name - Takes in a string that represents the name of an album. Should return a dictionary with the correct album, or return `None`.
  * Find by rank - Takes in a number that represents the rank in the list of top albums and returns the album with that rank. If there is no album with that rank, it returns `None`.
  * Find by year - Takes in a number for the year in which an album was released and returns a list of albums that were released in that year. If there are no albums released in the given year, it returns an empty list.
  * Find by years - Takes in a start year and end year. Returns a list of all albums that were released on or between the start and end years. If no albums are found for those years, then an empty list is returned. 
  * Find by ranks - Takes in a start rank and end rank. Returns a list of albums that are ranked between the start and end ranks. If no albums are found for those ranks, then an empty list is returned.
* **All functions**
  * All titles - Returns a list of titles for each album.
  * All artists - Returns a list of artist names for each album.
* **Questions to answer / functions**
  * Artists with the most albums - Returns the artist with the highest amount of albums on the list of top albums 
  * Most popular word - Returns the word used most in amongst all album titles
  * Histogram of albums by decade - Returns a histogram with each decade pointing to the number of albums released during that decade.
  * Histogram by genre - Returns a histogram with each genre pointing to the number of albums that are categorized as being in that genre.

In [None]:
def album_name(name):                 #creates a funtion called album
    for album in top_500_albums:      #that calls a for loop that for each album in our top_500_albums list 
        if album['album'] == name:    #if that album title is equal to the name of the album that is entered
            return album              #if the album is found return 
        else:
            None                      #otherwise return none
album_name('Revolve') #calls the function on a test entry. 

In [None]:
def album_rank(rank):                 #creates a function called album rank 
    for album in top_500_albums:      #thats calls a for loop that for each album in our top 500
        if album['number'] == rank:   #if that albums rank is equal to the rank entered
            return album              #will return the album
        else:          
            False                     #otherwise return none

album_rank('2')    #calls the function on a test entry

In [None]:
def album_year(year):                     #creates a function called album year 
    album_x_years=[]                      # creates a new empty list 
    for album in top_500_albums:          #calls a for loop that for each album in our top 500 list
        if album['year'] == year:         #if the album year matches the one entered
            album_x_years.append(album)   #adds that album to our new list
    return album_x_years                  #and return that new list now appended

album_year('1955')  #calls this function

In [None]:
def album_year_range(startyear, endyear):     #creates a function called album_year_range 
    album_rangex_years=[]
    for album in top_500_albums:
        if int(album['year']) >= startyear and int(album['year']) <= endyear:
            album_rangex_years.append(album)
    return album_rangex_years
album_year_range(1966, 1969)

In [None]:
def album_start_rank(startrank, endrank):
    album_start_rank=[]
    for album in top_500_albums:
        if int(album['number']) >= startrank and int(album['number']) <= endrank:
            album_start_rank.append(album)
    return album_start_rank
album_start_rank(2, 4)

In [None]:
#All title - return a list of titles for each album  
list_titles = []
for d in top_500_albums: 
    list_titles.append(d['album']) 
list_titles[:5]

In [None]:
#All artist - return a list of all artist for each album 
list_o_artists = [d['artist'] for d in top_500_albums]
list_o_artists[:5]

In [None]:
# Artists with the most albums - Returns the artist with the highest amount of albums on the list of top albums 
top_artist = {}          #create an empty dictionary
for lartist in list_o_artists:  #starts a fuction loop for each artist (known as lartist) in the list of artist
    if lartist not in top_artist: 
        top_artist[lartist]=1 
    else: 
        top_artist[lartist]+=1 

sorted_artist = sorted(top_artist.items(), key= lambda x: x[1], reverse=True) #sorts our list by the key in desc order

#def top_top_artist(List):  #this returns one artist and it was something we tried
#    return max(set(List), key = List.count)
#top_top_artist(list_o_artists) 

sorted_artist[:5] #calls the top 5 from the list

In [None]:
highest = max(top_artist.values()) #find thes max values. 

print([k for k, v in top_artist.items() if v == highest]) 

In [None]:
list_of_words=[] # creates a list 
for album in top_500_albums: #splits the album names into individual word 
    list_of_words.extend([word.lower() for word in album['album'].split()]) 

list_of_words

In [None]:
flat_list = [] #creates an empty list 
for word in list_of_words: #pull each word from each list in or list of words add it to this list 
    for item in word:
        flat_list.append(item)

flat_list


In [None]:
flatlist_lower = [word.lower() for word in flat_list]

In [None]:
most_pop_word = {}          #create an empty dictionary
for pword in list_of_words:  #starts a fuction loop for each word to this new list and add a count
    if pword not in most_pop_word: 
        most_pop_word[pword]=1 
    else: 
        most_pop_word[pword]+=1 

sorted_pwords = sorted(most_pop_word.items(), key= lambda x: x[1], reverse=True) #sorts our list by the key in desc order 

sorted_pwords

In [None]:
def most_popular_word(list):  #defines a function 
    return max(set(list), key = list.count) #taking the count of the key and returning the highest value 
most_popular_word(flat_list)

In [None]:
highest2 = max(most_pop_word.values()) #find thes max values. 

print([k for k, v in most_pop_word.items() if v == highest2]) #prints the key for key, value combo in given list if the value is the highest number as defined prev 

In [None]:
album_years = [int(d['year']) for d in top_500_albums] #pull all the years from all albums, turn them into an integer, and return them to a single list

years_decades = [ (year//10)*10 for year in album_years ] #take the year and divide by 10 and times it by 10 for each year in our years list 

In [None]:
import matplotlib.pyplot as plt  #imports matplotlib
%matplotlib inline

fig, ax = plt.subplots(figsize=(10,5))     #creates a figure subplot with stated size 

ax.hist(years_decades, bins=7, edgecolor='black') #that is a histogram with 7 bin and an black outline
ax.set_title('Number of Albums by Decade')  #sets the histogram title
ax.set_xlabel('Decades')  #sets the label for the x axis
ax.set_ylabel('Number of Albums') #sets the label for 7 value


In [None]:
album_genre = [(d['genre']) for d in top_500_albums]  #compiles the genres for each album into one list 

album_genre_bins = {}          #create an empty dictionary
for genre in album_genre:  #starts a fuction loop for each genre in our data set 
    if genre not in album_genre_bins: #if the genre is not in our list
        album_genre_bins[genre]=1  #add it with a count of 1 
    else: 
        album_genre_bins[genre]+=1 #otherwise increase the count by 1 
        
sorted_genre_bins = sorted(album_genre_bins.items(), key= lambda x: x[1], reverse=True) #sorts the list album genre list bins int desc order
        
sorted_genre_bins[:10] #calls the top 10 of sorted genre bins list

In [None]:
#Histogram by genre - Returns a histogram with each genre pointing to the number of albums that are categorized as being in that genre. 

fig, ax = plt.subplots(figsize=(18,8))     #creates a new fig with an explicit size

genre_hist1=ax.hist(album_genre, bins=7, edgecolor='black') #designates the fig as a histogram with 7 bins and black edge color using using the original created full list of genres
ax.set_title('Number of Albums by Genre') #sets the title for this figure
ax.set_xlabel('Decades') #sets the x axis label
ax.set_ylabel('Number of Albums') #sets the y axis label


In [None]:
sorted_album_genre = sorted(album_genre_bins.items(), key=lambda x: x[1], reverse=True) #sorts our album genre bins and put in desc order 
S = sorted_album_genre[:15] #assigns our sorted list to a variable
top_15_genre = [k for k,v in S] #for every key in the key,value combo return the key 
top_15_genre  #calls the list created above

In [None]:
top_15_count = [v for k,v in S] #for every value for in the key,value pair return the value
top_15_count     #calls the above list

In [None]:
fig, ax = plt.subplots(figsize =(18,8)) #create a figure and sets an explicit size
ax.barh(top_15_genre, top_15_count, color='magenta', edgecolor ='black') #designates our figure as a bar chart with our top 15 genres list as the y values and top 15 count as the x values also sets a main and edge color
ax.set_title('Top 15 Genres of Top 500 Albums',size = 18) #sets our bar chart title and title size
ax.set_xlabel('Count', size = 15) #labels our bar chart x axis 
ax.set_ylabel('Genre', size = 15) #labels our bar chart y axis

## Next Steps

In [None]:
text_file = open('top-500-songs.txt', 'r')
lines = text_file.readlines()

print(lines[:4])

In [None]:
#list_lines=[line.split('\t') for line in lines] # splits the line at the 't' and returns the list in that line
#list_lines[:5] #old code not being used left here for ref

In [None]:
list_line_new =[]       #creates a new list 
for line in lines:   #creates a for loop that for each str line in our list
    list_line_new.append(line.strip()) #we remove the /n which is called an of strings line string
list_line_new
list_lines=[line.split('\t') for line in list_line_new] #removes the /t from (that represent a new string line) from all list in our list of list 
list_lines[:5] #calls our list 

In [None]:
top_500_songs = []    #Creates an empty list

for line in list_lines:         #Creates a for loop where for all lists (in our list of lists) in the given list it assign them to a dictionary as follows:
    list_line_dict = {'rank':line[0],            #the first item in the given list as the rank
                        'name':line[1],          #the second item as the name 
                        'artist':line[2],        #the third item as the artist
                        'year': int(line[3])}          # the fourth item as year as an integer
    top_500_songs.append(list_line_dict)    #adds the newly created dictionaries to our new list
    
top_500_songs[:3]       #returns the created dictionary of list

Now that we have our functions querying our Album data, let's compare that data with the top 500 songs. We have another file -- a text file! -- that contains the data we need for the top 500 songs. Reading a text file is pretty similar to reading a csv file, however, it tends to need a bit more massaging to get your data formatted the way you want it.

```python
    # open the text file in read
    text_file = open('top-500-songs.txt', 'r')
    # read each line of the text file
    # here is where you can print out the lines to your terminal and get an idea 
    # for how you might think about re-formatting the data
    lines = text_file.readlines()

    print(lines)
    # the output will look something like this:
['1\tLike a Rolling Stone\tBob Dylan\t1965\n', '2\tSatisfaction\tThe Rolling Stones\t1965\n', '3\tImagine\tJohn Lennon\t1971\n', "4\tWhat's Going On\tMarvin Gaye\t1971\n", '5\tRespect\tAretha Franklin\t1967\n', '6\tGood Vibrations\tThe Beach Boys\t1966\n', '7\tJohnny B. Goode\tChuck Berry\t1958\n', '8\tHey Jude\tThe Beatles\t1968\n', ...]
```

It looks like `\t` is how the text file is separating each element on a line. So, we need a function that can separate a string into a list. Then we can tell this function on what to split our string (i.e. `\t`). From there we will have a list of lists that are formatted like the following:
```python
["RANK", "NAME", "ARTIST", "YEAR"]
```
We need to use our knowledge of iterating to go through each of these elements and turn them into dictionaries with the keys, "rank", "name", "artist", "year", pointing to the appropriate values. So, for song number 1, we want it to look like:
```python
{'rank': 1, 'name': "Like a Rolling Stone", 'artist'"Bob Dylan": , 'year': 1965}
```

Once we have a list of songs that are formatted like the above, we can move on to figuring out which songs are from the top albums and which albums and artists have the most 'top songs'.

In [None]:
def song_name(name):          #defines a function that
    for song in top_500_songs:   #for each song in our song list
        if song['name'] == name: #compares the name of the song in song list to the entred song name
            return song        #and if it matches returns entire song dictionary
        else:
            False            #otherwise returns none
song_name('Hey Jude')   #calls this function

In [None]:
def song_rank(rank):                  #define a function that
    for song in top_500_songs:        #for each song in our song list 
        if song['rank'] == rank:      #compares the ranks in our songs list to the rank given
            return song               #and returns a match if one is found
        else:
            False                     #otherwise returns none

song_rank('2')         #calls this function

In [None]:
def song_year(year):                    #defines a function that
    song_x_years=[]                     #create a new list
    for song in top_500_songs:          #and uses a for loop to iterate thru all songs in song list
        if song['year'] == year:        #comparing all the years in our list to the given year
            song_x_years.append(song)   #and if there is a match, adds it to our new list 
    return song_x_years                 #and then returns our new song list once all relevant songs have been added

song_year(1966)      #calls this function

In [None]:
def song_year_range(startyear, endyear):    #calls a function that
    song_rangex_years=[]                    #creates a new list
    for song in top_500_songs:              #for each song in our song lists
        if song['year'] >= startyear and song['year'] <= endyear: #iterates thru all years looking for matches that are the same as our given start year and end year
            song_rangex_years.append(song)   #if there is match it will add it to our newly created list
    return song_rangex_years                 #returns our new list 
song_year_range(1966, 1968)       #calls this function

In [None]:
def song_start_rank(startrank, endrank): #calls a function 
    song_start_rank=[]      #that creates a new list
    for song in top_500_songs:    #where for each song in our songs list
        if int(song['rank']) >= startrank and int(song['rank']) <= endrank: #compares the song rank to the given start and rank 
            song_start_rank.append(song) #if a match is the same as or between the start and rank adds it to our new list
    return song_start_rank #returns our new song list
song_start_rank(2, 7) #calls this function

In [None]:
list_song_titles = []                    #creates a new list
for d in top_500_songs:                  # creats a for loop where for each variable 'd' in our song list 
    list_song_titles.append(d['name'])   #adds the song name only to our new list
list_song_titles[:5]    #calls our new list

In [None]:
list_song_artist = []                      #creates a new list
for a in top_500_songs:                    #creates a for loop where for each variable 'a' in our song list
    list_song_artist.append(a['artist'])   #adds the artist name to our new list
list_song_artist[:5] #calls our new list

In [None]:
list_songs = [song['name'] for song in top_500_songs]   #does the same as above in a list comprehension for the song name
list_songs[:5]      

In [None]:
list_artist = [song['artist'] for song in top_500_songs] #does the same as above in a list comprehension for the artist name
list_artist[:5]

In [None]:
top_song_artist = {}                           #creates a new dictionary 
for artist in list_artist:                     #creates a for loop that for each artist in our list_artist list
    if artist not in top_song_artist:          #if an artist is not on the list
        top_song_artist[artist]=1              #creates an entry for the artist with a count of 1
    else: 
        top_song_artist[artist]+=1             #otherwise if the artist is on the list increases the count by 1


top_song_artist_sorted = sorted(top_song_artist.items(), key= lambda x: x[1], reverse=True) #sorts our list by the count value in dec order


top_song_artist_sorted[:3] #calls our list

In [None]:
highest = max(top_song_artist.values())   #defines the variable "highest" as the number equal to the highest number in our top number of songs artist list
print([k for k, v in top_song_artist.items() if v == highest]) #prints the name of the artist(s) who name match the number value of "highest"

In [None]:
list_of_songs=[]       #creates a new list 
for word in top_500_songs:    #a for loop that for each variable[word] that is in our top songs list
    list_of_songs.append(word['name'].split()) #splits the words into separate objects in a list and add those list to our new list
list_of_songs[4]     #calls the list

In [None]:
flat_list_song = []   #creates a new list
for word in list_of_songs: #for loop that for each variable[word] that is in of our list of songs
    for item in word:      #pulls each word in individual lists 
        flat_list_song.append(item) #and adds them to our new list
flat_list_song[:5] #calls a list 

In [None]:
top_word_song = {}     #creates a new dictionary
for word in flat_list_song:  #where for each word in our flat list of songs
    if word not in top_word_song: #if the word is not in our top list of words
        top_word_song[word]=1    #adds them to our new dictionary with a count value of 1
    else: 
        top_word_song[word]+=1 #otherwise adds 1 to the count value
top_word_song #calls dictionary

In [None]:
highest2 = max(top_word_song.values())  #defines the variable highest2 as the max value from our song word count list
print([key for key, value in top_word_song.items() if value == highest2]) #prints the word whos frequency count matches our highest2 value

### Working with the top 500 songs

If we can't already re-use our searching functions (i.e. Find by name, Find by rank, Find by year, Find by years, Find by ranks), all functions (i.e. all titles, all artists), and questions-to-answer functions (i.e. Artists with the most albums (or songs), Most popular word, Histogram by decade, Histogram by genre) with the song data we just formatted, then refactor these functions so that they can be used with either set of data. This is a good practice for ensuring that our code is as reusable and modular as possible, which is important when writing code for any project, especially when it comes time to scale a project. Things are easier to read, and there is less code to worry about (and more importantly there is less code to debug when something goes wrong).

Once we have our functions working for both sets of data, we can start writing new functions!

Luckily for us, this next dataset is already made for us. We were curious to find out which songs on the top 500 songs overlapped with the top albums and vice versa. So, we created a data set that is a list of dictionaries in JSON format. Each dictionary contains the name of the artist, the album, and the tracks (songs) on that given album. We can use this data to check which songs on the top 500 list are featured on the albums on the top albums list.

To load our JSON file we will write:

```python
import json

file = open('track_data.json', 'r')
json_data = json.load(file)

print(json_data)
# output will look like this:
[{'artist': 'The Beatles', 'album': "Sgt. Pepper's Lonely Hearts Club Band", 'tracks': ["Sgt. Pepper's Lonely Hearts Club Band - Remix", 'With A Little Help From My Friends - Remix', 'Lucy In The Sky With Diamonds - Remix', 'Getting Better - Remix', 'Fixing A Hole - Remix', "She's Leaving Home - Remix", 'Being For The Benefit Of Mr. Kite! - Remix', 'Within You Without You - Remix', "When I'm Sixty-Four - Remix", 'Lovely Rita - Remix', 'Good Morning Good Morning - Remix', "Sgt. Pepper's Lonely Hearts Club Band (Reprise) - Remix", 'A Day In The Life - Remix', "Sgt. Pepper's Lonely Hearts Club Band - Take 9 And Speech", 'With A Little Help From My Friends - Take 1 / False Start And Take 2 / Instrumental', 'Lucy In The Sky With Diamonds - Take 1', 'Getting Better - Take 1 / Instrumental And Speech At The End', 'Fixing A Hole - Speech And Take 3', "She's Leaving Home - Take 1 / Instrumental", 'Being For The Benefit Of Mr. Kite! - Take 4', 'Within You Without You - Take 1 / Indian Instruments', "When I'm Sixty-Four - Take 2", 'Lovely Rita - Speech And Take 9', 'Good Morning Good Morning - Take 8', "Sgt. Pepper's Lonely Hearts Club Band (Reprise) - Speech And Take 8", 'A Day In The Life - Take 1 With Hums', 'Strawberry Fields Forever - Take 7', 'Strawberry Fields Forever - Take 26', 'Strawberry Fields Forever - Stereo Mix 2015', 'Penny Lane - Take 6 / Instrumental', 'Penny Lane - Stereo Mix 2017']}, {'artist': 'The Beach Boys', 'album': 'Pet Sounds', 'tracks': ["Wouldn't It Be Nice - Digitally Remastered 96", 'You Still Believe In Me - Digitally Remastered 96', "That's Not Me - 1996 Digital Remaster", "Don't Talk (Put Your Head On My Shoulder) - 1996 Digital Remaster", "I'm Waiting For The Day - Digitally Remastered 96", "Let's Go Away For Awhile - Digitally Remastered 96", 'Sloop John B - 1996 - Remaster', 'God Only Knows - 1997 - Remaster', "I Know There's An Answer - Digitally Remastered 96", 'Here Today - Digitally Remastered 96', "I Just Wasn't Made For These Times - Digitally Remastered 96", 'Pet Sounds - Digitally Remastered 96', 'Caroline, No - 1996 Digital Remaster', 'Hang On To Your Ego', "Wouldn't It Be Nice - 2000 - Remaster", 'You Still Believe In Me - 1996 Digital Remaster', "That's Not Me - 1996 Digital Remaster", "Don't Talk (Put Your Head On My Shoulder) - 1996 Digital Remaster", "I'm Waiting For The Day - 1996 Digital Remaster", "Let's Go Away For Awhile - 1996 Digital Remaster", 'Sloop John B - 1996 Digital Remaster', 'God Only Knows - 1996 Digital Remaster', "I Know There's An Answer - 1996 Digital Remaster", 'Here Today - 1996 Digital Remaster', "I Just Wasn't Made For These Times - 1996 Digital Remaster", 'Pet Sounds - 1996 Digital Remaster', 'Caroline, No - 1996 Digital Remaster']}, {'artist': 'The Beatles', 'album': 'Revolver', 'tracks': ['Taxman - Remastered', 'Eleanor Rigby - Remastered', "I'm Only Sleeping - Remastered", 'Love You To - Remastered', 'Here, There And Everywhere - Remastered', 'Yellow Submarine - Remastered', 'She Said She Said - Remastered', 'Good Day Sunshine - Remastered', 'And Your Bird Can Sing - Remastered', 'For No One - Remastered', 'Doctor Robert - Remastered', 'I Want To Tell You - Remastered', 'Got To Get You Into My Life - Remastered', 'Tomorrow Never Knows - Remastered']}, {'artist': 'Bob Dylan', 'album': 'Highway 61 Revisited', 'tracks': ['Like a Rolling Stone', 'Tombstone Blues', 'It Takes a Lot to Laugh, It Takes a Train to Cry', 'From a Buick 6', 'Ballad of a Thin Man', 'Queen Jane Approximately', 'Highway 61 Revisited', "Just Like Tom Thumb's Blues", 'Desolation Row']}, ...]
```

In [None]:
import json

file = open('track_data.json', 'r')
json_data = json.load(file)

print(json_data)

In [None]:
json_data[0]

### Define the following functions:

**albumWithMostTopSongs** - returns the name of the artist and album that has that most songs featured on the top 500 songs list

**albumsWithTopSongs** - returns a list with the name of only the albums that have tracks featured on the list of top 500 songs

**songsThatAreOnTopAlbums** - returns a list with the name of only the songs featured on the list of top albums

**top10AlbumsByTopSongs** - returns a histogram with the 10 albums that have the most songs that appear in the top songs list. The album names should point to the number of songs that appear on the top 500 songs list.

**topOverallArtist** - Artist featured with the most songs and albums on the two lists. This means that if Brittany Spears had 3 of her albums featured on the top albums listed and 10 of her songs featured on the top songs, she would have a total of 13. The artist with the highest aggregate score would be the top overall artist.

In [None]:
#Returns the name of the artist and album if the song is features on the top 500 list

for track in top_500_songs: #for each track in our top 500 songs list
    if track in #if that track is in in th