## API AND WEB SCRAPING PROJECT

![](spotify.png)

## Part One - Web API Access:

Accessing data of songs hosted by spotify using its own API. To access it was needed to send an authorization request to https://accounts.spotify.com/authorize

In [1]:
#importing the libraries
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import requests
import json
import pandas as pd

In [1]:
#Setting the authorizations
#url_auth = "https://accounts.spotify.com/authorize?response_type=code&client_id="
client_id = CLIENT_ID #system variable
client_secret = CLIENT_SECRET #system variable

In [3]:
#Making a call to the API
client_credentials_manager = SpotifyClientCredentials(client_id = client_id, client_secret = client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)


## Searching for artists ID

To search an artist info, we need to know his spotify's ID. In this case, we're searching for Kreator's ID and it was found following the steps below:

![](kreat1.png)

![](kreat2.png)

In [4]:
#Searching for Kreator top tracks
art = sp.artist_top_tracks('3BM0EaYmkKWuPmmHFUTQHv')

In [5]:
art

{'tracks': [{'album': {'album_type': 'album',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3BM0EaYmkKWuPmmHFUTQHv'},
      'href': 'https://api.spotify.com/v1/artists/3BM0EaYmkKWuPmmHFUTQHv',
      'id': '3BM0EaYmkKWuPmmHFUTQHv',
      'name': 'Kreator',
      'type': 'artist',
      'uri': 'spotify:artist:3BM0EaYmkKWuPmmHFUTQHv'}],
    'external_urls': {'spotify': 'https://open.spotify.com/album/4YySbln9km5fNhE4NwlC6Q'},
    'href': 'https://api.spotify.com/v1/albums/4YySbln9km5fNhE4NwlC6Q',
    'id': '4YySbln9km5fNhE4NwlC6Q',
    'images': [{'height': 640,
      'url': 'https://i.scdn.co/image/ab67616d0000b273d5c0fc40f7b9299fd277dd8a',
      'width': 640},
     {'height': 300,
      'url': 'https://i.scdn.co/image/ab67616d00001e02d5c0fc40f7b9299fd277dd8a',
      'width': 300},
     {'height': 64,
      'url': 'https://i.scdn.co/image/ab67616d00004851d5c0fc40f7b9299fd277dd8a',
      'width': 64}],
    'name': 'Gods of Violence',
    'release_date': '2

The method above returned a dictionary. Checking this dictionary keys only one was found:

In [5]:
art.keys()

dict_keys(['tracks'])

The value of this keys is a list, where the values are other dictionaries. So, the first item was accessed just to check its keys

In [6]:
art['tracks'][0].keys()

dict_keys(['album', 'artists', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'is_playable', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])

Next, it was created a loop to iterate over the values, the song names and save them in a pandas dataframe.

In [7]:
songs_names = []
songs_infos = []

In [8]:
for song in range(0, len(art['tracks'])):
    #getting songs' names
    songs_names.append(art['tracks'][song]['name'])
    
    #getting infos
    songs_infos.append(list(sp.audio_features(art['tracks'][song]['id'])[0].values()))
                       
    if song == len(art['tracks'])-1:
        top10_audio_features = pd.DataFrame(songs_infos, columns = sp.audio_features(art['tracks'][song]['id'])[0].keys())
        top10_audio_features['song'] = songs_names
    

In [9]:
top10_audio_features.set_index('song')

Unnamed: 0_level_0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
song,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Satan Is Real,0.439,0.968,7,-4.551,1,0.0775,4.1e-05,0.0409,0.12,0.255,101.59,audio_features,5Mp3MFneLLCVGW3O5kYMvm,spotify:track:5Mp3MFneLLCVGW3O5kYMvm,https://api.spotify.com/v1/tracks/5Mp3MFneLLCV...,https://api.spotify.com/v1/audio-analysis/5Mp3...,278200,4
Hail to the Hordes,0.41,0.955,0,-4.865,1,0.125,1e-05,0.442,0.241,0.292,152.074,audio_features,4VHZuLcWwZV0ZJqMbFNYCO,spotify:track:4VHZuLcWwZV0ZJqMbFNYCO,https://api.spotify.com/v1/tracks/4VHZuLcWwZV0...,https://api.spotify.com/v1/audio-analysis/4VHZ...,242320,3
Pleasure to Kill,0.258,0.964,7,-5.719,1,0.218,2.6e-05,0.00759,0.282,0.261,125.591,audio_features,5PFhkQbjJge1h8k7wE1K5U,spotify:track:5PFhkQbjJge1h8k7wE1K5U,https://api.spotify.com/v1/tracks/5PFhkQbjJge1...,https://api.spotify.com/v1/audio-analysis/5PFh...,250227,4
World War Now,0.25,0.988,1,-4.56,1,0.0955,1.8e-05,0.00309,0.121,0.122,97.657,audio_features,0xqxvwQCDB4JzC9E1pKViY,spotify:track:0xqxvwQCDB4JzC9E1pKViY,https://api.spotify.com/v1/tracks/0xqxvwQCDB4J...,https://api.spotify.com/v1/audio-analysis/0xqx...,268280,4
Totalitarian Terror,0.471,0.989,2,-4.58,1,0.135,3e-05,0.0022,0.16,0.224,102.691,audio_features,7ePpV892UY1KFiqzw659sJ,spotify:track:7ePpV892UY1KFiqzw659sJ,https://api.spotify.com/v1/tracks/7ePpV892UY1K...,https://api.spotify.com/v1/audio-analysis/7ePp...,285280,4
Gods Of Violence,0.436,0.98,7,-5.235,1,0.0827,5.9e-05,0.0661,0.17,0.127,104.695,audio_features,024L4YpuCporBW4V2WaZuv,spotify:track:024L4YpuCporBW4V2WaZuv,https://api.spotify.com/v1/tracks/024L4YpuCpor...,https://api.spotify.com/v1/audio-analysis/024L...,351373,4
People of the Lie,0.515,0.942,9,-8.789,1,0.0385,0.000342,0.227,0.0945,0.481,92.52,audio_features,6WDMOh0LJY9EHO4aIE7dPh,spotify:track:6WDMOh0LJY9EHO4aIE7dPh,https://api.spotify.com/v1/tracks/6WDMOh0LJY9E...,https://api.spotify.com/v1/audio-analysis/6WDM...,195880,4
Extreme Aggression,0.287,0.988,0,-5.89,1,0.103,0.00778,0.0556,0.0998,0.183,109.278,audio_features,4iP6rp4XyLdh4zxORd0Tnq,spotify:track:4iP6rp4XyLdh4zxORd0Tnq,https://api.spotify.com/v1/tracks/4iP6rp4XyLdh...,https://api.spotify.com/v1/audio-analysis/4iP6...,284973,4
Phobia,0.436,0.94,9,-6.319,1,0.0631,0.000267,4e-06,0.161,0.434,192.273,audio_features,0hyFPXxmWqc4xTuHA1hEd9,spotify:track:0hyFPXxmWqc4xTuHA1hEd9,https://api.spotify.com/v1/tracks/0hyFPXxmWqc4...,https://api.spotify.com/v1/audio-analysis/0hyF...,202533,4
Fallen Brother,0.464,0.976,7,-4.47,1,0.12,1.7e-05,0.159,0.0838,0.35,144.113,audio_features,5xWcWtGd7hOH5y29Oyog4S,spotify:track:5xWcWtGd7hOH5y29Oyog4S,https://api.spotify.com/v1/tracks/5xWcWtGd7hOH...,https://api.spotify.com/v1/audio-analysis/5xWc...,277000,4


In [10]:
!pwd

/home/czrpxr/Projetos/Ironhack/IRONHACK-SUBMISSIONS/PROJECTS/web-project


In [11]:
top10_audio_features.to_csv('top10.csv')

## Part Two - Web Scraping

![](allblacks)

At this step the objective is to scrape data from the All Blacks stats website.

In [12]:
black_url = "http://stats.allblacks.com/"

In [14]:
stats = pd.read_html(black_url)[1]

In [15]:
stats = stats.dropna(axis=1, how='all')
stats = stats.dropna(axis=0, how='any')

In [16]:
stats.columns = ['Country', 'Matches', 'W', 'D', 'L', 'For', 'Against', '%(Wins)']

In [17]:
stats

Unnamed: 0,Country,Matches,W,D,L,For,Against,%(Wins)
1,Argentina,29,28,1,-,1150,422,96.55
2,Australia,166,115,7,44,3552,2365,69.28
3,British & Irish Lions,41,30,4,7,700,399,73.17
4,Canada,5,5,-,-,313,54,100.0
5,England,41,33,1,7,985,575,80.49
6,Fiji,5,5,-,-,364,50,100.0
7,France,61,48,1,12,1596,801,78.69
8,Georgia,1,1,-,-,43,10,100.0
9,Ireland,31,28,1,2,871,375,90.32
10,Italy,14,14,-,-,820,131,100.0


In [18]:
stats.to_csv('all_blacks_stats.csv')

## Other Web Scraping

![](doria.jpeg)

In [22]:
from bs4 import BeautifulSoup

In [23]:
url_gov = "http://www.saopaulo.sp.gov.br/sala-de-imprensa/agenda-do-governador/"

In [24]:
request_doriana = requests.get(url_gov)
request_doriana

<Response [200]>

In [25]:
doriana_soup = BeautifulSoup(request_doriana.text)
doriana_soup

<!DOCTYPE html>
<!--[if IE 7]><html class="no-js ie7" lang="pt-BR"><![endif]--><!--[if IE 8]><html class="no-js ie8" lang="pt-BR"><![endif]--><!--[if IE 9]><html class="no-js ie9" lang="pt-BR"><![endif]--><html class="no-js" lang="pt-BR">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<!-- Favicons -->
<link href="/wp-content/themes/saopaulo/apple-touch-icon.png" rel="apple-touch-icon" sizes="180x180"/>
<link href="/wp-content/themes/saopaulo/favicon-32x32.png" rel="icon" sizes="32x32" type="image/png"/>
<link href="/wp-content/themes/saopaulo/favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/>
<link href="/wp-content/themes/saopaulo/manifest.json" rel="manifest"/>
<link href="/wp-content/themes/saopaulo/safari-pinned-tab.svg" rel="mask-icon"/>
<link href="/wp-content/themes/saopaulo/favicon.ico" rel="shortcut icon"/>
<link href="https://cdn.boomcdn.com/libs/font-awes

In [26]:
first_page = doriana_soup.find_all("h3", class_='title')

In [27]:
first_page[0].text

'Quinta-feira, 29 de Agosto de 2019'

In [42]:
doriana_df = pd.DataFrame()
when = []
event = []
for page in first_page:
    temp_lst = []
    temp_df = pd.DataFrame()
    #quando.append(page.text)
    new_url = page.next_element['href']
    
    request_new = requests.get(new_url).text
    new_soup = BeautifulSoup(request_new)
    content = new_soup.find('article')
    
    for element in content.find_all('p'):
        temp_lst.append(element.text.replace('\n',' '))
        temp_df = pd.DataFrame({page.text: temp_lst})
        
    doriana_df = pd.concat([doriana_df,temp_df], axis=1)
    

In [43]:
doriana_df

Unnamed: 0,"Quinta-feira, 29 de Agosto de 2019","Quinta-feira, 29 de agosto de 2019 – Governador em exercício","Quarta-feira, 28 de Agosto de 2019","Quarta-feira, 28 de agosto de 2019 – Governador em exercício","Terça-feira, 27 de agosto de 2019","Segunda-feira, 26 de agosto de 2019","Sábado, 24 de agosto de 2019","Sexta-feira, 23 de agosto de 2019","Quinta-feira, 22 de agosto de 2019","Quarta-feira, 21 de agosto de 2019"
0,Tour por fábricas automotivas Horário: 9h às 1...,Participação na Agenda Araraquara 2019 – “Comp...,Chegada a Frankfurt Horário: 10h45,"Reunião sobre orçamento do DER, com a Secretar...",Solenidade de abertura do Congresso do Latam R...,Inauguração da Estação Jardim Planalto da Linh...,Abertura da plenária da 4ª Reunião de Governad...,Reunião de secretariado Horário: 8h às 12h Loc...,"Despacho com o Chefe de Gabinete, Wilson Pedro...",Café da manhã com o Presidente Global da Nestl...
1,Apresentação do sistema dual de formação profi...,Convênio para obras de infraestrutura – rotató...,Partida para Hannover Horário: 12h40,"Reunião sobre Rodoanel e Tamoios, com a Secret...",Solenidade de abertura do Salão Internacional ...,"Despacho com o Chefe de Gabinete, Wilson Pedro...",Participação dos painéis Horário: 9h05 às 10h2...,Coletiva de imprensa sobre o Programa SP Gastr...,"Despacho com o Secretário de Comunicação, Cleb...",Coletiva de Imprensa com o Presidente da Gol L...
2,Reunião com executivos e dirigentes empresaria...,Assinatura de liberação de recursos do Fundo d...,Chegada a Hannover Horário: 13h30,"Despacho com o Diretor-Geral da Artesp, Giovan...","Reunião com o Presidente da Hapag-Lloyd, Oscar...",Almoço com o Presidente do Tribunal de Contas ...,Reunião dos Governadores para elaboração da Ca...,Reunião do Projeto “Novo Ceagesp e Projeto Cit...,Lançamento do Programa de Parcerias da Secreta...,Abertura do Encontro com Secretários Estaduais...
3,Coletiva de imprensa Horário: 14h45 às 15h45,"Reunião com a Diretoria da Rumo Logística, o D...",Jantar no Restaurante Terra com empresários al...,"Reunião com o Prefeito de Botucatu, Mário Edua...","Reunião com o Governador do Pará, Helder Barba...","Reunião com o Governador do Pará, Helder Barba...",Coletiva de Imprensa Horário: 12h às 12h30 Loc...,Reunião com Presidências e Diretorias da Aneel...,Almoço com o Secretário de Desenvolvimento Reg...,"Despacho com o Secretário de Comunicação, Cleb..."
4,"Jantar com empresários, jornalistas e a delega...",Despacho com o Secretário de Transportes Metro...,,Despacho com o Subsecretário de Ações Estratég...,Decolagem para Frankfurt / Alemanha Horário: 1...,Entrevista para a Agência Chinesa de Notícias ...,Almoço dos Governadores do Sul e Sudeste do Co...,Assinatura de autorizo do futuro Hospital Regi...,Reunião com o Deputado Federal Roberto de Luce...,Almoço com o Vice-Governador e Secretário de G...
5,,"Reunião sobre o Vale do Ribeira, com o Secretá...",,Despacho com a Assessoria Parlamentar Horário:...,,Despacho com o Vice-Governador e Secretário de...,Apresentações da 4ª Reunião de Governadores do...,Decolagem para Vitória/ES Horário: 19h,Reunião com o Deputado Federal Nilson Leitão e...,Reunião com o novo Embaixador do Brasil na Itá...
6,,"Despacho com a Subsecretária de PPP, Tarcila R...",,Assinatura de contrato de financiamento da Des...,,Cerimônia de Premiação “Melhores e Maiores 201...,Plenária final e encerramento da 4ª Reunião de...,Chegada em Vitória/ES Horário: 20h15,"Reunião do Projeto “Novo Museu do Ipiranga”, c...",Reunião com a Vice-Presidente do Fórum Econômi...
7,,,,Despacho com o Secretário Executivo da Fazenda...,,,Decolagem para São Paulo/SP Horário: 18h15,"Jantar dos Governadores do Sul e Sudeste, no C...",Reunião do Conselho de Segurança Pública do Es...,Coletiva de Imprensa sobre reunião do Fórum Ec...
8,,,,Reunião com Conselheiro do TCE/SP Horário: 18h...,,,Pouso em São Paulo/SP Horário: 19h30,,Despacho com o Vice-Governador e Secretário de...,"Reunião com o Presidente da Nestlé Brasil, Mar..."
9,,,,Participação na cerimônia de entrega da 25ª Ed...,,,,,,Reunião com o Secretário de Relações Internaci...


In [44]:
doriana_df.to_csv('eventos_doriana.csv')