# Selenium CGV 영화 리뷰 스크래핑

## Selenium 및 웹 드라이버 설치

In [1]:
!pip install Selenium
!apt-get install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

import sys
sys.path.insert(0, '/usr/lib/chromium-browser/chromedriver')



Collecting Selenium
[?25l  Downloading https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)
[K     |████████████████████████████████| 911kB 2.8MB/s 
Installing collected packages: Selenium
Successfully installed Selenium-3.141.0
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  chromium-browser chromium-browser-l10n chromium-codecs-ffmpeg-extra
Suggested packages:


In [41]:
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

## CGV 영화 리뷰 긁어오기

* 아이언맨: http://www.cgv.co.kr/movies/detail-view/?midx=38262#1
* 다크나이트: http://www.cgv.co.kr/movies/detail-view/?midx=76417#1
* url을 통해 리뷰 페이지 접근 불가
* 셀레니움으로 페이지 번호를 클릭하여 접근

In [32]:
import time
import pandas as pd
from selenium.common.exceptions import NoSuchElementException

def get_movie_review(url, page_num):

  wd = webdriver.Chrome('chromedriver', options=chrome_options)
  wd.get(url)

  writer_list = []
  review_list = []
  day_list = []

  for page_no in range(1, page_num+1):
    try:
      page_ul = wd.find_element_by_id('paging_point')
      page_a = page_ul.find_element_by_link_text(str(page_no))
      page_a.click()
      time.sleep(1)

      writers = wd.find_elements_by_class_name('writer-name')
      writer_list += [writer.text for writer in writers]
      reviews = wd.find_elements_by_class_name('box-comment')
      review_list += [review.text for review in reviews]
      dates = wd.find_elements_by_class_name('day')
      day_list += [day.text for day in dates]

      if page_no % 10 ==0:
        next_button = page_ul.find_element_by_class_name('btn-paging next')
        next_button.click()
        time.sleep(1)
    except NoSuchElementException:
      break
  
  movie_review_df = pd.DataFrame({"writer": writer_list, "Review": review_list, "Date": day_list})
  
  return movie_review_df

In [33]:
url = 'http://www.cgv.co.kr/movies/detail-view/?midx=83327'

movie_review_df = get_movie_review(url,12)
movie_review_df

Unnamed: 0,writer,Review,Date
0,마루조하,그냥 19세로 나왔으면 더 볼만했는데 코로나때문에 손해 덜보려고 15세로 낮춰서 낸...,2020.08.28
1,정원♥,강추입니다!재미있어요,2020.08.28
2,eeeehgus,이정재 황정민 연기 지림,2020.08.28
3,히드라,생각보다 재밌었습니다,2020.08.28
4,ss**123,잼서요 굿굿굿굿 ^^ㅎㅎㅎ,2020.08.28
5,lo**1989,황정민과 이정재 배우의 연기는 믿고봅니다,2020.08.28
6,mi**20903,강철비보단 노잼 볼만은핰,2020.08.28
7,va**ocana79,다만 악에서 구하소서 누구를??,2020.08.28
8,겸둥이,"황정민배우, 이정재배우는 갈수록 연기가....캬~! 젤 최고는 박정민 배우!",2020.08.28
9,너랑나랑님,박정민이 정말 대tothe박!!! ㅎㅎ,2020.08.28


## CGV 상영작 스크래핑

* http://www.cgv.co.kr/movies/

In [46]:
url = 'http://www.cgv.co.kr/movies/'

wd = webdriver.Chrome('chromedriver', options=chrome_options)
wd.get(url)

movie_chart = wd.find_element_by_class_name('sect-movie-chart')
contents = movie_chart.find_elements_by_class_name('box-contents')
for content in contents:
  link = content.find_element_by_tag_name('a').get_attribute('href')
  title = content.find_element_by_class_name('title').text
  percent = content.find_element_by_class_name('percent').text
  info = content.find_element_by_class_name('txt-info').text
  print(title, percent, info, link)
  print(get_movie_review(link,2))

테넷 예매율87.7% 2020.08.26 개봉 http://www.cgv.co.kr/movies/detail-view/?midx=83381
          writer                                             Review        Date
0             찌쑤              이해가 되는데 안되는 영화. 전체적인 스토리 연출 등등등은 짱조아요  2020.08.28
1     sz**ng2878                         이해하려 하지마세요 그대의 머리로는 불가능합니다  2020.08.28
2        sa**103  구로cgv 목요일 22:30 영화 에어컨도 가동 안시켜놓고 뭐하나?일 제대로 안할거...  2020.08.28
3     gamjatigim                                    마 이게 할리우드다 ㅋㅋㅋㅋ  2020.08.28
4          윤아티스트                                  이해하려 하지말고 느끼면 된다.  2020.08.28
5           rema                                      n차 관람 가즈아!!!!  2020.08.28
6    si**ung0307                                        매우재밌었어용ㅇㅎㅎㅎ  2020.08.28
7        dn**r91                                       대박이다 명작을 봤어요  2020.08.28
8      ly**42857  인버전 씬이 예상 외로 전혀 어색하지 않게 표현되었다. 물리학은 엔드게임+인터스텔라...  2020.08.28
9      jh**gsun1                                     한번에 이해하기가 힘드네요  2020.08.28
10          cgv창                          