# Tema: Avaliação de jogos
- The Last Of Us Remastered
- Duke Nukem Forever
- God of War: Ragnarok
- No Man's Sky
- Biomutant

### Objetivo 
- Desenvolver um crawler para coletar do site de avaliações de filmes e jogos(Metacritic) e construir um modelo de analise de sentimento

## Criação do Crawler

In [1]:
import requests
import re
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
def get_name(vreview):
    try:
        vname = vreview.find('div', class_='name').find('a').text
    except:
        vname = vreview.find('div', class_='review_body').find('span', class_='blurb blurb_expanded')
    return vname

def get_rating(vreview):
    try:
        vrating = vreview.find('div', class_='review_grade').text.strip('\n')
    except:
        vrating = None
    return vrating

def get_date(vreview):
    return vreview.find('div', class_='date').text

def get_coment(vreview):
    try:
        vcomment = vreview.find('span', class_='blurb blurb_expanded').text
    except: 
        vcomment = vreview.find('div', class_='review_body').find('span', class_='blurb blurb_expanded')
    return vcomment

def get_title(vurl):

    bs_title = BeautifulSoup(response.text, 'html.parser')
    vtitle = bs_title.find('div', class_='product_title').find('a').text.strip('\n')

    return vtitle

def get_last_pages(vurl, pages = -1):

    bs_last_page = BeautifulSoup(response.text, 'html.parser')
    v_last_page = int(bs_last_page.find('div', class_='pages').find('li', class_="page last_page").find('a').text)

    #O numero de paginas deve ser superior a -1 e menor/igual ao total de paginas
    if pages > -1 and pages <= v_last_page:
        v_last_page = pages
        
    return v_last_page

In [3]:

url_list = ["https://www.metacritic.com/game/playstation-4/the-last-of-us-remastered/user-reviews"
            , "https://www.metacritic.com/game/playstation-4/god-of-war-ragnarok/user-reviews"
            , "https://www.metacritic.com/game/pc/duke-nukem-forever/user-reviews"
            , "https://www.metacritic.com/game/playstation-4/biomutant/user-reviews"
            , "https://www.metacritic.com/game/pc/no-mans-sky/user-reviews"]
user_agent = {'User-agent': 'Mozilla/5.0'}

In [4]:
review_lst = []

for url in url_list:
    response = requests.get(url, headers=user_agent)
    last_page = get_last_pages(url)
    title = get_title(url)
    page_num = 0

    print('-----Loading Title: ' + title)
    while page_num <= last_page:
        
        response = requests.get(url + "/user-reviews?page=" + str(page_num), headers=user_agent)

        #Scrap
        bs = BeautifulSoup(response.text, 'html.parser')
        reviews = bs.find_all('div', class_='review_content')
        review_lst.append(pd.DataFrame([{'title': title
                                        ,'name': get_name(row)
                                        ,'date' : get_date(row)
                                        ,'rating': get_rating(row)
                                        ,'comment': get_coment(row)
                                        } for row in reviews]))

        print('-----Loaded Page: ' + str(page_num) + ' -------')
        page_num = page_num+1

    df_review = pd.concat(review_lst)
    df_review = df_review[df_review['comment'].notna()]
    df_review['rating'] = df_review['rating'].astype("Int8")


-----Loading Title: The Last of Us Remastered
-----Loaded Page: 0 -------
-----Loaded Page: 1 -------
-----Loaded Page: 2 -------
-----Loaded Page: 3 -------
-----Loaded Page: 4 -------
-----Loaded Page: 5 -------
-----Loaded Page: 6 -------
-----Loaded Page: 7 -------
-----Loaded Page: 8 -------
-----Loaded Page: 9 -------
-----Loaded Page: 10 -------
-----Loaded Page: 11 -------
-----Loaded Page: 12 -------
-----Loaded Page: 13 -------
-----Loaded Page: 14 -------
-----Loaded Page: 15 -------
-----Loaded Page: 16 -------
-----Loaded Page: 17 -------
-----Loaded Page: 18 -------
-----Loaded Page: 19 -------
-----Loaded Page: 20 -------
-----Loaded Page: 21 -------
-----Loaded Page: 22 -------
-----Loaded Page: 23 -------
-----Loaded Page: 24 -------
-----Loaded Page: 25 -------
-----Loaded Page: 26 -------
-----Loaded Page: 27 -------
-----Loaded Page: 28 -------
-----Loaded Page: 29 -------
-----Loaded Page: 30 -------
-----Loaded Page: 31 -------
-----Loaded Page: 32 -------
-----Lo

In [5]:
df_review.head()

Unnamed: 0,title,name,date,rating,comment
1,The Last of Us Remastered,Nataraja,"Aug 1, 2014",10,"General Overview: Fantastic story, smooth grap..."
4,The Last of Us Remastered,brad0103triplex,"Jul 29, 2014",10,This is pretty much the same game I loved on t...
5,The Last of Us Remastered,awohlleb,"Jul 29, 2014",10,When you stand at the top of the heap with thi...
6,The Last of Us Remastered,FinalFantasy467,"Jul 30, 2014",10,The last of us is such an amazing experience a...
9,The Last of Us Remastered,MONG,"Aug 1, 2014",10,The Original Was Amazing – This One is Even Be...


In [6]:
df_review['title'].value_counts().sort_index()

Biomutant                    116
Duke Nukem Forever           282
God of War: Ragnarok          60
No Man's Sky                 398
The Last of Us Remastered    951
Name: title, dtype: int64

In [7]:
df_review.describe()

Unnamed: 0,rating
count,1807.0
mean,7.469286
std,3.248651
min,0.0
25%,6.0
50%,9.0
75%,10.0
max,10.0


In [8]:
df_review['rating'].value_counts().sort_index()

0     124
1      59
2      55
3      63
4      62
5      56
6      78
7     106
8     187
9     265
10    752
Name: rating, dtype: Int64

## Primeiros levantamentos (10/04/2023):
- Avaliações que não possuem comentarios foram removidos da base para evitar problemas com a analise de texto
- Amostra possui 1807 avaliações
- Média de avaliações em 7.4
- A mediana das avaliações = 9