<a href="https://colab.research.google.com/github/DamodaraBarbosa/estudos_web_scraping/blob/main/Scraping_Xbox_Metacritic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Importando as bibliotecas que serão utilizadas

Pandas, urllib.request e bs4

In [1]:
import pandas as pd
from urllib.request import Request, urlopen
from urllib.error import HTTPError, URLError
from bs4 import BeautifulSoup

## Início do scraping

URL

In [2]:
url = 'https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=0'

In [3]:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'}


In [4]:
# tratamento de erro para HTTPError e URLError

try:
  req = Request(url, headers = headers)
  response = urlopen(req)
  html = response.read().decode('utf-8')
except HTTPError as e:
  print(e.status, e.reason)
except URLError as e:
  print(e.reason)

In [5]:
soup = BeautifulSoup(html, 'html.parser')

In [6]:
soup

<!DOCTYPE html>

<html xmlns:fb="http://ogp.me/ns/fb#" xmlns:og="http://opengraphprotocol.org/schema/">
<head>
<title>All Xbox One Video Game Releases - Metacritic</title>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<meta content="See how well critics are rating all Xbox One video game releases at metacritic.com" name="description"/>
<meta content="Metacritic" name="application-name"/>
<meta content="#000000" name="msapplication-TileColor"/>
<meta content="/images/win8tile/76bf1426-2886-4b87-ae1c-06424b6bb8a2.png" name="msapplication-TileImage"/>
<meta content="618k3mbeki8tar7u6wvrum5lxs5cka" name="facebook-domain-verification">
<meta content="All Xbox One Video Game Releases" property="og:title"/>
<meta content="website" property="og:type"/>
<meta content="https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=0" property="og:url"/>
<meta content="https://static.metacritic.com/images/icons/mc_fb_og.png" property="og:image"/>
<met

## Obtendo os dados desejáveis


Nome do jogo

In [7]:
game_name = soup.find('a', {'class': 'title'}).find('h3').get_text()

In [8]:
game_name

'Red Dead Redemption 2'

Data de lançamento

In [9]:
# spans = soup.find('div', {'class': 'clamp-details'}).findAll('span').split()

In [10]:
# spans

In [11]:
# para obter a data de lançamento seleciona-se o último elemento da lista e utiliza-se o método get_text()

# release_date = spans[-1].get_text()

In [12]:
# release_date

Obtenção do metascore e do userscore

In [13]:
# metascore = soup.find('div', {'class': 'clamp-metascore'}).find('div', {'class': 'metascore_w large game positive'}).get_text()

In [14]:
metascore = soup.find('div', {'class': 'clamp-metascore'}).find('div').get_text()

In [15]:
metascore

'97'

In [16]:
# analisando o html do site é possível verificarque a classe varia de cor da nota, a saber verde, amarela e vermelha
# podendo apresentar os valores: 'metascore_w large game positive' quando verde
# 'metascore_w large game mixed' quando amarelo
# e 'metascore_w large game negative' quando vermelho

In [17]:
userscore = soup.find('div', {'class': 'clamp-userscore'}).find('div').get_text()

In [18]:
type(userscore)

str

In [19]:
userscore

'8.2'

## Iterando as páginas do site

In [20]:
# são 21 páginas que na url 'https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=PÁGINA' vai de 0 a 20

In [21]:
for i in range(0, 21):
  print(i, end= ' ')

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

In [22]:
# logo:
url_teste = 'https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page='

for i in range(0, 21):
  print(url_teste + str(i))

https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=0
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=1
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=2
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=3
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=4
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=5
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=6
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=7
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=8
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=9
https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=10
https://www.metacritic.com/brow

Iterando o html de cada jogo

In [23]:
for soup_game in soup.findAll('td', {'class': 'clamp-summary-wrap'}):
  print(str(soup_game) + '\n\n')

<td class="clamp-summary-wrap">
<input class="clamp-summary-expand" id="503269" type="checkbox">
<div class="clamp-score-wrap">
<a class="metascore_anchor" href="/game/xbox-one/red-dead-redemption-2/critic-reviews">
<div class="metascore_w large game positive">97</div>
</a>
</div>
<span class="title numbered">
                                                                    1.
                                                            </span>
<a class="title" href="/game/xbox-one/red-dead-redemption-2"><h3>Red Dead Redemption 2</h3></a>
<div class="clamp-details">
<div class="platform">
<span class="label">Platform:</span>
<span class="data">
                                        Xbox One
                                                                            </span>
</div>
<span>October 26, 2018</span>
</div>
<div class="summary">
                        Developed by the creators of Grand Theft Auto V and Red Dead Redemption, Red Dead Redemption 2 is an epic tale of life in 

Criando rotina de scraping

In [24]:
# o dado de cada jogo será armazenado no dicionário 'game' e cada 'game' na lista 'games'

game = dict()
games = list()

In [38]:
for i in range(0, 21):
  url_ = 'https://www.metacritic.com/browse/games/release-date/available/xboxone/metascore?page=' + str(i)

  headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'}

  try:
    req = Request(url, headers = headers)
    response = urlopen(req)
    html = response.read().decode('utf-8')
  except HTTPError as e:
    print(e.status, e.reason)
  except URLError as e:
    print(e.reason)

  soup_jogos = soup.findAll('td', {'class': 'clamp-summary-wrap'})

  for index, soup_jogo in enumerate(soup_jogos):
    game['Name'] = soup_jogo.find('a', {'class': 'title'}).find('h3').get_text()
    spans = soup_jogo.find('div', {'class': 'clamp-details'}).findAll('span')
    game['Release date'] = spans[-1].get_text()

    metascore = soup_jogo.find('div', {'class': 'clamp-metascore'})
    # .find('div')

    if 'positive' in str(metascore):
      game['Metascore'] = metascore.find('div', {'class': 'metascore_w large game positive'}).get_text()
    elif 'mixed' in str(metascore):
      game['Metascore'] = metascore.find('div', {'class': 'metascore_w large game mixed'}).get_text()
    else:
      game['Metascore'] = metascore.find('div', {'class': 'metascore_w large game negative'}).get_text()

    userscore = soup_jogo.find('div', {'class': 'clamp-userscore'})

    if 'positive' in str(userscore):
      game['Userscore'] = userscore.find('div', {'class': 'metascore_w user large game positive'}).get_text()
    elif 'mixed' in str(userscore):
      game['Userscore'] = userscore.find('div', {'class': 'metascore_w user large game mixed'}).get_text()
    elif 'negative' in str(userscore):
      game['Userscore'] = userscore.find('div', {'class': 'metascore_w user large game negative'}).get_text()
    else:
      game['Userscore'] = userscore.find('div', {'class': 'metascore_w user large game tbd'}).get_text()

  games.append(game.copy())





In [39]:
game

{'Name': 'Sky Force Reloaded',
 'Release date': 'December 1, 2017',
 'Metascore': '86',
 'Userscore': '7.0'}

In [41]:
len(games)

2156

In [42]:
games

[{'Name': 'Red Dead Redemption 2',
  'Release date': 'October 26, 2018',
  'Metascore': '97',
  'Userscore': '8.2'},
 {'Name': 'Grand Theft Auto V',
  'Release date': 'November 18, 2014',
  'Metascore': '97',
  'Userscore': '7.9'},
 {'Name': 'Metal Gear Solid V: The Phantom Pain',
  'Release date': 'September 1, 2015',
  'Metascore': '95',
  'Userscore': '7.6'},
 {'Name': 'Celeste',
  'Release date': 'January 26, 2018',
  'Metascore': '94',
  'Userscore': '7.6'},
 {'Name': 'The Witcher 3: Wild Hunt - Blood and Wine',
  'Release date': 'May 31, 2016',
  'Metascore': '94',
  'Userscore': '8.6'},
 {'Name': 'Resident Evil 2',
  'Release date': 'January 25, 2019',
  'Metascore': '93',
  'Userscore': '8.8'},
 {'Name': 'INSIDE',
  'Release date': 'June 29, 2016',
  'Metascore': '93',
  'Userscore': '8.4'},
 {'Name': 'Forza Horizon 4',
  'Release date': 'September 28, 2018',
  'Metascore': '92',
  'Userscore': '8.3'},
 {'Name': 'Divinity: Original Sin II - Definitive Edition',
  'Release date'

## Exportando os dados como Dataframe

Gerando o Dataframe

In [43]:
dt_metacritic = pd.DataFrame(games)

In [49]:
dt_metacritic.head(110)

Unnamed: 0,Name,Release date,Metascore,Userscore
0,Red Dead Redemption 2,"October 26, 2018",97,8.2
1,Grand Theft Auto V,"November 18, 2014",97,7.9
2,Metal Gear Solid V: The Phantom Pain,"September 1, 2015",95,7.6
3,Celeste,"January 26, 2018",94,7.6
4,The Witcher 3: Wild Hunt - Blood and Wine,"May 31, 2016",94,8.6
...,...,...,...,...
105,Yakuza 6: The Song of Life,"March 25, 2021",87,7.3
106,Kingdom: New Lands,"August 9, 2016",87,6.5
107,Assassin's Creed Odyssey,"October 2, 2018",87,6.4
108,Donut County,"December 18, 2018",87,6.9


In [46]:
dt_metacritic['Name'].unique()

array(['Red Dead Redemption 2', 'Grand Theft Auto V',
       'Metal Gear Solid V: The Phantom Pain', 'Celeste',
       'The Witcher 3: Wild Hunt - Blood and Wine', 'Resident Evil 2',
       'INSIDE', 'Forza Horizon 4',
       'Divinity: Original Sin II - Definitive Edition',
       'RimWorld Console Edition', 'What Remains of Edith Finch',
       'Cuphead in the Delicious Last Course', 'The Swapper',
       'Dragon Quest XI S: Echoes of an Elusive Age - Definitive Edition',
       'The Witcher 3: Wild Hunt', 'Rayman Legends', 'Overwatch',
       'Sekiro: Shadows Die Twice', 'Forza Horizon 3', 'Dead Cells',
       'Psychonauts 2', 'F1 2020', 'Monster Hunter: World - Iceborne',
       'NieR: Automata - Become as Gods Edition',
       'The Witcher 3: Wild Hunt - Hearts of Stone',
       'Monster Hunter: World', 'Pinball FX3: Universal Classics Pinball',
       'Ori and the Will of the Wisps', 'NBA 2K17',
       'Curse of the Dead Gods', 'Yakuza 0',
       'Mass Effect Legendary Edition', 

In [47]:
len(dt_metacritic['Name'].unique())

100