# Lab 8.02

## Instructions
__Prioritize the MVP__
In the previous lab, you had to scrape data about "hot songs". It's critical to be on track with that part, as it was part of the request from the CTO.

If you couldn't finish the first lab, use this time to go back there.

__Expand the project__
If you're done, you can try to expand the project on your own. Here are a few suggestions:

* Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
* Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
* Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests

In [2]:
# get the html from wikipedia (coutries national anthems)
response = requests.get('https://en.wikipedia.org/wiki/List_of_national_anthems')
response.status_code

200

In [3]:
soup = BeautifulSoup(response.content, "html.parser")

#### First attempt - scraping every column

In [4]:
country = []
name = []

for i in range(204):
    country.append(soup.select('th > a')[i].get_text())
    #name.append(soup.select('tbody > tr > td > small')[i].get_text())

print(len(country))
print(country[0:4])  #worked (got the name of countries)

#print(len(name))
#print(name[0:4])  #got error, as some rows has no name of anthem (translated), so it doesn't match the length

204
['Afghanistan', 'Albania', 'Algeria', 'Andorra']


In [5]:
soup.select('table:nth-child(12) > tbody > tr > td > a')  #in case with original name of anthem, wasn't able to select 
#only names (href), as the class "mw-redirect" exists in other columns as well

[<a class="mw-redirect" href="/wiki/This_is_the_Home_of_the_Brave" title="This is the Home of the Brave">Dā də bātorāno kor</a>,
 <a href="/wiki/Taliban" title="Taliban">Islamic Emirate of Afghanistan government</a>,
 <a class="mw-redirect" href="/wiki/Mullah_Baradar" title="Mullah Baradar">Mullah Baradar</a>,
 <a class="mw-redirect" href="/wiki/Hymn_to_the_Flag" title="Hymn to the Flag">Betimi mbi Flamur</a>,
 <a href="/wiki/Aleksand%C3%ABr_Stavre_Drenova" title="Aleksandër Stavre Drenova">Aleksandër Stavre Drenova</a>,
 <a href="/wiki/Ciprian_Porumbescu" title="Ciprian Porumbescu">Ciprian Porumbescu</a>,
 <a href="/wiki/File:Hymni_i_Flamurit_instrumental.ogg" title="File:Hymni i Flamurit instrumental.ogg">"Himni i Flamurit"</a>,
 <a href="/wiki/Kassaman" title="Kassaman">Kassaman</a>,
 <a href="/wiki/Moufdi_Zakaria" title="Moufdi Zakaria">Moufdi Zakaria</a>,
 <a class="mw-redirect" href="/wiki/Mohamed_Fawzi_(artist)" title="Mohamed Fawzi (artist)">Mohamed Fawzi</a>,
 <a href="/wiki/F

#### Second attempt - scraping the table

In [6]:
table = soup.find('table', attrs={'class':'wikitable plainrowheaders sortable'})
table_rows = table.find_all('tr')

rows = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    rows.append(row)
rows

[[],
 ['"Dā də bātorāno kor"("This is the Home of the Brave")\n',
  '2021\n',
  'Islamic Emirate of Afghanistan government (exposed)Mullah Baradar (alleged)\n',
  'Unknown\n',
  '\n',
  '[15][16]\n'],
 ['"Betimi mbi Flamur"("The Pledge on the Flag")\n',
  '1912\n',
  'Aleksandër Stavre Drenova\n',
  'Ciprian Porumbescu\n',
  ' "Himni i Flamurit"\n',
  '[17]\n'],
 ['"Kassaman" ("We Pledge")\n',
  '1962\n',
  'Moufdi Zakaria\n',
  'Mohamed Fawzi\n',
  '  "Kassaman"\n',
  '[18]\n'],
 ['"El gran Carlemany"("The Great Charlemagne")\n',
  '1914\n',
  'Enric Marfany Bons\n',
  'Juan Benlloch y Vivó\n',
  ' "El Gran Carlemany"\n',
  '[19]\n'],
 ['"Angola Avante"("Onward Angola")\n',
  '1975\n',
  'Manuel Rui Alves Monteiro\n',
  'Rui Mingas\xa0[pt]\n',
  ' "Angola Avante"\n',
  '[20]\n'],
 ['"Fair Antigua, We Salute Thee"[b]\n',
  '1981\n',
  'Novelle Hamilton Richards\n',
  'Walter Garnet Picart Chambers\n',
  ' "Fair Antigua, We Salute Thee"\n',
  '[21][22]\n'],
 ['"Himno Nacional Argentino"

In [7]:
anthem_data = pd.DataFrame(rows, columns=['name', 'date_adopted', 'lyricist', 'composer', 'audio', 'ref'])
#drop irrelevant columns
anthem_data = anthem_data[['name', 'date_adopted', 'lyricist', 'composer']].reset_index(drop=True) 
anthem_data.head()

Unnamed: 0,name,date_adopted,lyricist,composer
0,,,,
1,"""Dā də bātorāno kor""(""This is the Home of the ...",2021\n,Islamic Emirate of Afghanistan government (exp...,Unknown\n
2,"""Betimi mbi Flamur""(""The Pledge on the Flag"")\n",1912\n,Aleksandër Stavre Drenova\n,Ciprian Porumbescu\n
3,"""Kassaman"" (""We Pledge"")\n",1962\n,Moufdi Zakaria\n,Mohamed Fawzi\n
4,"""El gran Carlemany""(""The Great Charlemagne"")\n",1914\n,Enric Marfany Bons\n,Juan Benlloch y Vivó\n


In [8]:
#drop first row
anthem_data = anthem_data.drop(index=anthem_data.iloc[:1, :].index.tolist())

In [9]:
#remowe newline marks
anthem_data = anthem_data.replace('\n','',regex=True)

In [11]:
#add the country name
del country[-9:]
anthem_data.insert(loc=0, column='country', value=country)

In [12]:
anthem_data.head(20)

Unnamed: 0,country,name,date_adopted,lyricist,composer
1,Afghanistan,"""Dā də bātorāno kor""(""This is the Home of the ...",2021,Islamic Emirate of Afghanistan government (exp...,Unknown
2,Albania,"""Betimi mbi Flamur""(""The Pledge on the Flag"")",1912,Aleksandër Stavre Drenova,Ciprian Porumbescu
3,Algeria,"""Kassaman"" (""We Pledge"")",1962,Moufdi Zakaria,Mohamed Fawzi
4,Andorra,"""El gran Carlemany""(""The Great Charlemagne"")",1914,Enric Marfany Bons,Juan Benlloch y Vivó
5,Angola,"""Angola Avante""(""Onward Angola"")",1975,Manuel Rui Alves Monteiro,Rui Mingas [pt]
6,Antigua and Barbuda,"""Fair Antigua, We Salute Thee""[b]",1981,Novelle Hamilton Richards,Walter Garnet Picart Chambers
7,Argentina,"""Himno Nacional Argentino""(""Argentine National...",1813,Vicente López y Planes,Blas Parera
8,Armenia,"""Mer Hayrenik""(""Our Fatherland"")","1918, 1991",Mikael Nalbandian,Barsegh Kanachyan
9,Australia,"""Advance Australia Fair""[b]",1984,Peter Dodds McCormick,Peter Dodds McCormick
10,Austria,"""Bundeshymne der Republik Österreich""(""Nationa...",1947,Paula von Preradović,Wolfgang Amadeus Mozart/Johann Holzer [de]
