## G-Market 크롤링

* G-Market의 best item 정보를 크롤링 후 mysql에 저장할것.

1. `DB 스키마 설계`

In [3]:
# # table - ranking
# CREATE TABLE ranking (
#     num INT AUTO_INCREMENT NOT NULL PRIMARY KEY, # 카테고리별로 랭킹 1,2,...존재함. ranking을 pk로 사용 불가
#     main_category VARCHAR(50) NOT NULL, # 
#     sub_category VARCHAR(50) NOT NULL, # 
#     item_ranking TINYINT UNSIGNED NOT NULL, # 상품 랭킹
#     item_code VARCHAR(20) NOT NULL, #
#     FOREIGN KEY(item_code) REFERENCES items(item_code) # 상품별 code를 참조함
# );

# # table - item
# CREATE TABLE items(
#     item_code VARCHAR(20) NOT NULL PRIMARY KEY,
#     title VARCHAR(200) NOT NULL,
#     ori_price INT NOT NULL,
#     dis_price INT NOT NULL,
#     discount_percent INT NOT NULL,
#     provider VARCHAR(100)
# );

In [1]:
import pymysql

db = pymysql.connect(host='localhost', port=3306, user='root', passwd='3927', db='bestproducts', charset='utf8')
cursor = db.cursor()

sql = '''
CREATE TABLE items(
    item_code VARCHAR(20) NOT NULL PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    ori_price INT NOT NULL,
    dis_price INT NOT NULL,
    discount_percent INT NOT NULL,
    provider VARCHAR(100)
);    
'''

cursor.execute(sql)

sql = '''
CREATE TABLE ranking (
    num INT AUTO_INCREMENT NOT NULL PRIMARY KEY, # 카테고리별로 랭킹 1,2,...존재함. ranking을 pk로 사용 불가
    main_category VARCHAR(50) NOT NULL, # 
    sub_category VARCHAR(50) NOT NULL, # 
    item_ranking TINYINT UNSIGNED NOT NULL, # 상품 랭킹
    item_code VARCHAR(20) NOT NULL, #
    FOREIGN KEY(item_code) REFERENCES items(item_code) # 상품별 code를 참조함
);
'''
cursor.execute(sql)

db.commit()
db.close()

2. `크롤링`

In [4]:
import requests
from bs4 import BeautifulSoup

In [19]:
## main category 크롤링
res = requests.get('http://corners.gmarket.co.kr/Bestsellers') # base url
soup = BeautifulSoup(res.content, 'html.parser')

categories = soup.select('div.gbest-cate ul.by-group li a')

for category in categories:
    print("http://corners.gmarket.co.kr/" + category['href'], category.get_text())

http://corners.gmarket.co.kr//Bestsellers ALL
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G01 패션의류
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G02 신발/잡화
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G03 화장품/헤어
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G04 유아동/출산
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G07 식품
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G08 생활/주방/건강
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G09 가구/침구
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G05 스포츠/자동차
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G06 컴퓨터/전자
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G10 도서/음반
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G11 여행
http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G12 e쿠폰/티켓


In [31]:
## sub category 크롤링

def get_category(category_link, category_name): # input main category link, main category name
    print("***Main Category*** : ",category_name) # 현재 어느 메인의 서브를 파싱중인지 알기위해
    
    res = requests.get(category_link)
    soup = BeautifulSoup(res.content, 'html.parser') # 해당 메인에서의 서브카테고리 파싱
    sub_categories = soup.select('div.navi.group ul li a')
    for sub_category in sub_categories:
        print("Sub Category :",sub_category.get_text(),"Sub Category Link :" ,"http://corners.gmarket.co.kr/" + sub_category['href'])
        print("###########################################################################")

for category in categories:
    get_category("http://corners.gmarket.co.kr/" + category['href'], category.get_text())

***Main Category*** :  ALL
***Main Category*** :  패션의류
Sub Category : 여성의류 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G01&subGroupCode=S102
###########################################################################
Sub Category : 남성의류 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G01&subGroupCode=S002
###########################################################################
Sub Category : 언더웨어 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G01&subGroupCode=S088
###########################################################################
Sub Category : 브랜드 여성의류 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G01&subGroupCode=S161
###########################################################################
Sub Category : 브랜드 남성의류 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G01&subGroupCode=S162
########

Sub Category : 생활용품 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G08&subGroupCode=S035
###########################################################################
Sub Category : 주방용품 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G08&subGroupCode=S034
###########################################################################
Sub Category : 세제/세면용품 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G08&subGroupCode=S066
###########################################################################
Sub Category : 화장지/일용잡화 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G08&subGroupCode=S067
###########################################################################
Sub Category : 위생용품 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G08&subGroupCode=S037
################################################################

Sub Category : 국내여행 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G11&subGroupCode=S047
###########################################################################
Sub Category : 해외여행 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G11&subGroupCode=S048
###########################################################################
Sub Category : 여행소품 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G11&subGroupCode=S049
###########################################################################
***Main Category*** :  e쿠폰/티켓
Sub Category : 외식 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G12&subGroupCode=S051
###########################################################################
Sub Category : 온라인컨텐츠 Sub Category Link : http://corners.gmarket.co.kr//Bestsellers?viewType=G&groupCode=G12&subGroupCode=S052
#########################################

In [None]:
## main/sub category + 상품 정보 크롤링

def get_items(html, category_name, sub_category_name):
    items_result_list = list()
    best_item = html.select('div.best-list')
    for index,item in enumerate(best_item[1].select('li')):
        title = item.select_one('a.itemname').get_text()
        ori_price = item.select_one('div.o-price').get_text()
        dis_price = item.slect_one('div.s-price strong span').get_text()
        discount_percent = item.select_one('div.s-price em').get_text()


## main category 크롤링
res = requests.get('http://corners.gmarket.co.kr/Bestsellers') # base url
soup = BeautifulSoup(res.content, 'html.parser')

categories = soup.select('div.gbest-cate ul.by-group li a')

for category in categories:
    print("http://corners.gmarket.co.kr/" + category['href'], category.get_text())
    
## sub category 크롤링

def get_category(category_link, category_name):
    print(category_link, category_name) 
    res = requests.get(category_link)
    soup = BeautifulSoup(res.content, 'html.parser')
    
    sub_categories = soup.select('div.navi.group ul li a')
    for sub_category in sub_categories:
        print(category_link, category_name, sub_category.get_text(),"http://corners.gmarket.co.kr/" + sub_category['href'])

for category in categories:
    get_category("http://corners.gmarket.co.kr/" + category['href'], category.get_text())