Task: Greatest Albums
Load the all-time best-selling albums database into Pandas.
https://www.officialcharts.com/chart-news/the-best-selling-albums-of-all-time-on-the-official-uk-chart__15551/

1. Change the column headers to their Polish equivalents: ['TYTUŁ','ARTYSTA','ROK','MAX POZ']
2. How many solo artists are on the list?
3. Which bands appear most frequently on the list?
4. Change the column headers so that each one starts with a capital letter and the rest are in   lowercase.
5. Remove the column ‘Max Poz’ from the table.
6. In which year were the most albums on the list released?
7. How many albums released between 1960 and 1990 inclusive are on the list?
8. In which year was the most recent album on the list released?
9. Prepare a list of the earliest released album by each artist who is on the list.
10. Save the list to a CSV file.

In [90]:
# import python modules
import numpy as np
import pandas as pd

In [91]:
# Importing data from the HTML source and assigning it to variable
greatest_albums = pd.read_html('https://www.officialcharts.com/chart-news/the-best-selling-albums-of-all-time-on-the-official-uk-chart__15551/', header=0)

In [92]:
df = greatest_albums[0]

In [93]:
# 1. Changing column header names
df = df.rename(columns={'TITLE':'TYTUŁ', 'ARTIST':'ARTYSTA', 'YEAR':'ROK', 'HIGH POSN':'MAX POZ'})

In [95]:
# 2. Checking how many artist is on the list
number_of_artists = df['ARTYSTA'].nunique()
print(f"There is {number_of_artists} artists on the 60 album list.")

There is 47 artists on the 60 album list.


In [96]:
# 3. Checking which artist appers the most times on the list
most_common_artist = df['ARTYSTA'].mode()
print('The following artist(s) apper on the list the most times:')
print(most_common_artist)

The following artist(s) apper on the list the most times:
0     COLDPLAY
1    TAKE THAT
dtype: object


In [97]:
# 4. Changing column headers to Capitalized
df = df.rename(str.capitalize, axis='columns')

In [98]:
# 5. Droping column 'Max poz' from the dataframe
df = df.drop('Max poz', axis=1)

In [99]:
# 6. Checking in which year the most albums from the list was released
best_year = df['Rok'].mode()
print("The year(s) in which the most albums were released:")
print(best_year)

The year(s) in which the most albums were released:
0    1987
1    2000
dtype: int64


In [106]:
# 7. Checking how many albums were released between 1960-1990
albums_from_period = df[(df['Rok']>=1960) & (df['Rok']<=1990)]
print('The following albums were relesed from 1960 until end of 1990:')
print(albums_from_period)

The following albums were relesed from 1960 until end of 1990:
    Pos                                 Tytuł                   Artysta   Rok
0     1                         GREATEST HITS                     QUEEN  1981
2     3  SGT PEPPER'S LONELY HEARTS CLUB BAND                   BEATLES  1967
5     6                              THRILLER           MICHAEL JACKSON  1982
6     7             THE DARK SIDE OF THE MOON                PINK FLOYD  1973
7     8                      BROTHERS IN ARMS              DIRE STRAITS  1985
8     9                                   BAD           MICHAEL JACKSON  1987
10   11                               RUMOURS             FLEETWOOD MAC  1977
11   12             THE IMMACULATE COLLECTION                   MADONNA  1990
15   16                                LEGEND  BOB MARLEY & THE WAILERS  1984
18   19                       BAT OUT OF HELL                 MEAT LOAF  1977
20   21            BRIDGE OVER TROUBLED WATER         SIMON & GARFUNKEL  1970
2

In [110]:
# 8. Checking in which year was the newest album released
newest_year_of_release = df['Rok'].max()
print(f"The newest album on the list was released in year {newest_year_of_release}.")

The newest album on the list was released in year 2015.


In [114]:
# 9. Preparing a list of earliest released albums per artist
earliest_albums = df.loc[df.groupby('Artysta')['Rok'].idxmin()]
print("List of the earliest released album by each artist:")
print(earliest_albums[['Artysta', "Tytuł", "Rok"]])

List of the earliest released album by each artist:
                     Artysta                                   Tytuł   Rok
45                      ABBA                           GREATEST HITS  1975
3                      ADELE                                      21  2011
40         ALANIS MORISSETTE                      JAGGED LITTLE PILL  1995
12             AMY WINEHOUSE                           BACK TO BLACK  2006
2                    BEATLES    SGT PEPPER'S LONELY HEARTS CLUB BAND  1967
15  BOB MARLEY & THE WAILERS                                  LEGEND  1984
44                  COLDPLAY                              PARACHUTES  2000
27                     CORRS                         TALK ON CORNERS  1997
25                DAVID GRAY                            WHITE LADDER  1998
24                      DIDO                                NO ANGEL  2000
7               DIRE STRAITS                        BROTHERS IN ARMS  1985
48                ED SHEERAN                    

In [116]:
# 10. Exporting list of the earliest released albums per artist to .csv file
earliest_albums[['Artysta', 'Tytuł', 'Rok']].to_csv('earliest_albums_per_artist.csv', index=False)