## Scraping non-tabular, multipage sites
Scrape the top 500 <a href="https://bestsellingalbums.org/decade/2010">best-selling albums of the 2010's</a>. Your data must include the following datapoints:

- Name of album
- Name of artist
- Number of albums sold 
- The link to the page that breaks down sales by country (found by clicking album title)



In [70]:
## create cells as needed
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [71]:
link_fl = []
first_link = 'https://bestsellingalbums.org/decade/2010'
link_fl.append(first_link)

In [72]:
link = 'https://bestsellingalbums.org/decade/2010-'

In [73]:
for link_number in range(2, 11):
    full_link = f"{link}{link_number}"
    link_fl.append(full_link)
    print(full_link)

https://bestsellingalbums.org/decade/2010-2
https://bestsellingalbums.org/decade/2010-3
https://bestsellingalbums.org/decade/2010-4
https://bestsellingalbums.org/decade/2010-5
https://bestsellingalbums.org/decade/2010-6
https://bestsellingalbums.org/decade/2010-7
https://bestsellingalbums.org/decade/2010-8
https://bestsellingalbums.org/decade/2010-9
https://bestsellingalbums.org/decade/2010-10


In [74]:
link_fl

['https://bestsellingalbums.org/decade/2010',
 'https://bestsellingalbums.org/decade/2010-2',
 'https://bestsellingalbums.org/decade/2010-3',
 'https://bestsellingalbums.org/decade/2010-4',
 'https://bestsellingalbums.org/decade/2010-5',
 'https://bestsellingalbums.org/decade/2010-6',
 'https://bestsellingalbums.org/decade/2010-7',
 'https://bestsellingalbums.org/decade/2010-8',
 'https://bestsellingalbums.org/decade/2010-9',
 'https://bestsellingalbums.org/decade/2010-10']

In [75]:
top_albums = []

In [76]:
for link in link_fl:
    response = requests.get(link)
    soup = BeautifulSoup(response.text, 'html.parser')
    albums = soup.find_all('div', class_='album_card')
    print(f"Scraped page: {link}, Total albums collected so far: {len(top_albums)}")

Scraped page: https://bestsellingalbums.org/decade/2010, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-2, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-3, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-4, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-5, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-6, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-7, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-8, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-9, Total albums collected so far: 0
Scraped page: https://bestsellingalbums.org/decade/2010-10, Total albums collected so far: 0


In [77]:
for album in albums:
    album_title = album.find('div', class_='album').text.strip()
    artist = album.find('div', class_='artist').text.strip()
    total_sales = album.find('div', class_='sales').text.strip()
    sales_number = ''.join(filter(str.isdigit, total_sales))
    full_album_link = f"https://bestsellingalbums.org{sales_by_country}"
    top_albums.append({
        'Album Title': album_title,
        'Artist': artist,
        'Total Sales': total_sales,
        'Sales By Country': sales_by_country
    })

In [78]:
df = pd.DataFrame(top_albums)

In [79]:
df

Unnamed: 0,Album Title,Artist,Total Sales,Sales By Country
0,I AM > I WAS,21 SAVAGE,"Sales: 1,110,000",https://bestsellingalbums.org/album/16039
1,LIFE OF A DARK ROSE,LIL SKIES,"Sales: 1,110,000",https://bestsellingalbums.org/album/16039
2,4 YOUR EYEZ ONLY,J. COLE,"Sales: 1,110,000",https://bestsellingalbums.org/album/16039
3,LATE NIGHTS: THE ALBUM,JEREMIH,"Sales: 1,110,000",https://bestsellingalbums.org/album/16039
4,GET CLOSER,KEITH URBAN,"Sales: 1,110,000",https://bestsellingalbums.org/album/16039
5,IN AND OUT OF CONSCIOUSNESS - GREATEST HITS 19...,ROBBIE WILLIAMS,"Sales: 1,104,500",https://bestsellingalbums.org/album/16039
6,TANGLED (SOUNDTRACK),ALAN MENKEN,"Sales: 1,102,208",https://bestsellingalbums.org/album/16039
7,DRIP HARDER,GUNNA & LIL BABY,"Sales: 1,100,000",https://bestsellingalbums.org/album/16039
8,JUNGLE RULES,FRENCH MONTANA,"Sales: 1,100,000",https://bestsellingalbums.org/album/16039
9,H.E.R.,H.E.R.,"Sales: 1,100,000",https://bestsellingalbums.org/album/16039


In [80]:
len(df)

50

In [81]:
df.to_csv('top_500_albums_2010s_first_10_pages.csv', index=False)