# scrapy
- 웹사이트에서 데이터를 수집하기 위한 오픈소스 프레임워크
- scrapy 구조
- scrapy 사용 절차
    - gmarket best 200 상품 데이터 수집
        1. scrapy 프로젝트 생성
        2. 크롱링한 엘리먼트의 xpath를 확인
        3. items.py 모듈 코드 작성
        4. spider.py 모듈 코드 작성
        5. scrapy 프로젝트 실행

In [1]:
import scrapy
import requests
from scrapy.http import TextResponse

## 프로젝트 구조
- items.py
    - 모델 정의 : class를 사용
- middlewares.py
    - user agent 와 같은 설정을 정의
- pipelines.py
    - 수집된 데이터를 출력하기전에 다른 코드를 실행
    - 메신저로 데이터 전송, 데이터 베이스에 수집된 데이터 저장
- setting.py
    - 스크래피 설정 파일
    - robots.txt를 따를지?, pipline을 사용할지
- spiders 
    - 어떤 웹페이지를 어떤 절차로 수집할지에 대한 모듈을 모아 놓는 디렉토리
    - 가장 먼저 response requests로 데이터를 가져옴
    - items.py에 정의된 클래스에 데이터를 넣어 객체로 만듬
    - yield로 던짐
    - setting이 있으면 pipleline 거치고 (데이터베이스에 저장한다든지) 리턴

In [2]:
## sudo apt install tree
!tree gmarket/

[01;34mgmarket/[00m
├── [01;34mgmarket[00m
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── [01;34mspiders[00m
│       └── __init__.py
└── scrapy.cfg

2 directories, 7 files


## scrapy 프로젝트 생성

In [1]:
!scrapy startproject gmarket

New Scrapy project 'gmarket', using template directory '/opt/anaconda3/lib/python3.8/site-packages/scrapy/templates/project', created in:
    /Users/dokyum/Workspace/DSS - /TIL-data-science/scrapy/gmarket

You can start your first spider with:
    cd gmarket
    scrapy genspider example example.com


## 크롤링할 엘리먼트의 xpath를 확인
- 상품 링크 데이터 수집 > 상세 데이터 수집

In [2]:
# 상품 링크 데이터 수집
response = requests.get("http://corners.gmarket.co.kr/Bestsellers")
response

<Response [200]>

In [6]:
dom = TextResponse(response.url, body=response.text, encoding="utf-8")
links = dom.xpath('//*[@id="gBestWrap"]/div/div[3]/div[2]/ul/li/div[1]/a/@href').extract()
len(links), links[0]

(200,
 'http://item.gmarket.co.kr/Item?goodscode=1840147374&ver=637453816176941035')

In [53]:
## 상품 상세 데이터 수집 : 상품명, 원가, 판매가
link = links[3]
print(link)

http://item.gmarket.co.kr/Item?goodscode=1780931173&ver=637453816176941035


In [54]:
response = requests.get(link)
dom = TextResponse(response.url, body=response.text, encoding='utf-8')

In [55]:
title = dom.xpath('//*[@id="itemcase_basic"]/h1/text()')[0].extract().strip()
s_price = dom.xpath('//*[@id="itemcase_basic"]/p/span/strong/text()')[0].extract()
try:
    o_price = dom.xpath('//*[@id="itemcase_basic"]/p/span/span/text()')[0].extract()
except:
    o_price = s_price
title, s_price, o_price

('태송 프리미엄 다섯가지나물밥 210g x10봉', '16,400', '32,800')

## items.py 모듈 코드 작성

In [None]:
%load gmarket/gmarket/items.py

In [35]:
%%writefile gmarket/gmarket/items.py
import scrapy


class GmarketItem(scrapy.Item):
    title = scrapy.Field()
    o_price = scrapy.Field()
    s_price = scrapy.Field()    
    link = scrapy.Field()

Overwriting gmarket/gmarket/items.py


## spider.py 모듈 코드 작성

In [56]:
%%writefile gmarket/gmarket/spiders/spiders.py
import scrapy
from gmarket.items import GmarketItem

class GmarketSpider(scrapy.Spider):
    name = "GmarketBest"
    allow_domain = ['gmarket.co.kr']
    start_urls = ['http://corners.gmarket.co.kr/Bestsellers']
    
    def parse(self, response):
        links = response.xpath('//*[@id="gBestWrap"]/div/div[3]/div[2]/ul/li/div[1]/a/@href').extract()
        for link in links:
            yield scrapy.Request(link, callback=self.parse_content)
            
    def parse_content(self, response):
        item = GmarketItem()
        item['title'] = response.xpath('//*[@id="itemcase_basic"]/h1/text()')[0].extract().strip()
        item['s_price'] = response.xpath('//*[@id="itemcase_basic"]/p/span/strong/text()')[0].extract()
        try:
            item['o_price'] = response.xpath('//*[@id="itemcase_basic"]/p/span/span/text()')[0].extract()
        except:
            item['o_price'] = item['s_price']
        item['link'] = response.url
        yield item

Overwriting gmarket/gmarket/spiders/spiders.py


## scrapy 프로젝트 실행
- scrapy.cfg 파일 위치에서 커맨드를 실행해야 함
- `$ scrapy crawl GmarketBest -o items.csv`
- `-o` : output

In [37]:
!tree gmarket/

[01;34mgmarket/[00m
├── [01;34mgmarket[00m
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── [01;34mspiders[00m
│       ├── __init__.py
│       └── spiders.py
└── scrapy.cfg

2 directories, 8 files


In [46]:
!ls

01_python_iterator_generator.ipynb 03_scrapy.ipynb
02_xpath.ipynb                     [1m[36mgmarket[m[m


In [47]:
!cd gmarket && scrapy crawl GmarketBest -o items.csv

2021-01-04 19:16:50 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: gmarket)
2021-01-04 19:16:50 [scrapy.utils.log] INFO: Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.3 (default, Jul  2 2020, 11:26:31) - [Clang 10.0.0 ], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 2.9.2, Platform macOS-10.15.7-x86_64-i386-64bit
2021-01-04 19:16:50 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-01-04 19:16:50 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'gmarket',
 'NEWSPIDER_MODULE': 'gmarket.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['gmarket.spiders']}
2021-01-04 19:16:50 [scrapy.extensions.telnet] INFO: Telnet Password: e4c6102b69963db3
2021-01-04 19:16:50 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions

2021-01-04 19:16:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1969004445&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1975682518&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:52 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1784364135&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1784364135&ver=637453846105295034',
 'o_price': '24,000',
 's_price': '24,000',
 'title': '[안다르] 안다르 이태리 원사 기모 레깅스 특가'}
2021-01-04 19:16:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1515937090&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goo

2021-01-04 19:16:53 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1473201843&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1473201843&ver=637453846105295034',
 'o_price': '32,900',
 's_price': '32,900',
 'title': '[스팸] 스팸25% 라이트 340G 10개'}
2021-01-04 19:16:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1492652794&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1579165613&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1829792719&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:53 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodsc

2021-01-04 19:16:55 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1164836001&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1164836001&ver=637453846105295034',
 'o_price': '30,900',
 's_price': '30,900',
 'title': '[다우니] 다우니 대용량 섬유유연제 아로마플로럴 8.5L 2개'}
2021-01-04 19:16:55 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1914143805&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1914143805&ver=637453846105295034',
 'o_price': '43,800',
 's_price': '43,800',
 'title': '[피지오겔] 피지오겔 DMT 로션 200ml 2개 +증정'}
2021-01-04 19:16:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1101182286&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1503717275&ver=637453846105295034> (referer: http://corn

2021-01-04 19:16:57 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1101177736&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1101177736&ver=637453846105295034',
 'o_price': '40,900',
 's_price': '40,900',
 'title': '[맥심] 맥심 화이트골드 커피믹스 400T : 쿠폰가 36900원~'}
2021-01-04 19:16:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1997424211&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1528526231&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:57 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1997424211&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1997424211&ver=637453846105295034',
 'o_price': '118,200',
 's_price': '118,200',
 'title':

2021-01-04 19:16:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=373058782&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=373058782&ver=637453846105295034',
 'o_price': '9,900',
 's_price': '9,900',
 'title': '다온샵/연말특가10%/1+1/기모/청바지/슬랙스/빅사이즈'}
2021-01-04 19:16:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1920641521&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:16:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1775171803&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1775171803&ver=637453846105295034',
 'o_price': '18,900',
 's_price': '18,900',
 'title': '[모나리자] 모나리자 미용티슈 250매 3입 4개 총12개 각티슈 휴지'}
2021-01-04 19:16:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1920641521&ver=637453846105295034>
{'link': 'http://ite

2021-01-04 19:17:00 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1940828970&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1940828970&ver=637453846105295034',
 'o_price': '28,800',
 's_price': '28,800',
 'title': '[닥스] 갤러리아   닥스  남여 가죽장갑 4종 택 1'}
2021-01-04 19:17:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1946688328&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1747553680&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1822934262&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:00 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item

2021-01-04 19:17:02 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1702784089&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1702784089&ver=637453846105295034',
 'o_price': '39,000',
 's_price': '39,000',
 'title': '[길벗스쿨] 길벗스쿨 기적의 유아수학 A/B/C단계 외 선택구매'}
2021-01-04 19:17:02 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1107162855&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1107162855&ver=637453846105295034',
 'o_price': '6,900',
 's_price': '6,900',
 'title': '[프롬산타] 3+1 양말/롱삭스/수면 남자여자학생스니커즈덧신겨울'}
2021-01-04 19:17:02 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1162884079&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1162884079&ver=637453846105295034',
 'o_price': '6,900',
 's_price': '6,900',
 'title': '1+1 밍크퍼/기모 밴딩팬츠/융털/레깅스/빅사이즈~4XL'}
2021-01-04 19:17:02 [scrapy.core.scrape

2021-01-04 19:17:03 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=502114157&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=502114157&ver=637453846105295034',
 'o_price': '9,900',
 's_price': '9,900',
 'title': '코코리겨울신상박스롱후드맨투맨'}
2021-01-04 19:17:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=811829803&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1939404339&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:03 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=811829803&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=811829803&ver=637453846105295034',
 'o_price': '57,900',
 's_price': '57,900',
 'title': '(행사상품) 20년산 청원농협 왕의밥상 20KG 포'

2021-01-04 19:17:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1175590080&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1140168037&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1975603946&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1685903060&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:05 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1906285750&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1906285750&ver=63745384610

2021-01-04 19:17:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=994117207&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=994117207&ver=637453846105295034',
 'o_price': '29,900',
 's_price': '29,900',
 'title': '(국산) 마녀 포기김치10kg / 배추김치'}
2021-01-04 19:17:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1806518829&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1806518829&ver=637453846105295034',
 'o_price': '31,870',
 's_price': '31,870',
 'title': '[하누소] 하누소 왕갈비탕(650g3팩)+우거지갈비탕(750g3팩) 세트'}
2021-01-04 19:17:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1920642031&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1920642031&ver=637453846105295034',
 'o_price': '34,800',
 's_price': '34,800',
 'title': '[맥심] 모카골드 커피믹스 160T+160T : 쿠폰가 30890원 ~'}
2021-01-04 19:17:07 [scrapy.core.scr

2021-01-04 19:17:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1824745796&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:08 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1578966236&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1578966236&ver=637453846105295034',
 'o_price': '11,900',
 's_price': '11,900',
 'title': '[칠성사이다] 칠성 사이다 210ml x 30캔'}
2021-01-04 19:17:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1962949469&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:08 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1517719492&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1517719492&ver=637453846105295034',
 'o_price': '9,900',
 's_price': '9,900',
 'title': '메이킹유 6900원~ 뽀

2021-01-04 19:17:09 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1824749606&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1824749606&ver=637453846105295034',
 'o_price': '22,900',
 's_price': '22,900',
 'title': '[프로쉬] 독일 친환경 식기세척기세제 베이킹소다 2개SET  60EA'}
2021-01-04 19:17:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1494979529&ver=637453846105295034> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:17:10 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1494979529&ver=637453846105295034>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1494979529&ver=637453846105295034',
 'o_price': '7,900',
 's_price': '7,900',
 'title': '[이지바이] 균일가 7900원 32종/기모티/패딩/바지/조끼/후리스'}
2021-01-04 19:17:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1816843046&ver=637453846105295034> (referer: htt

## 쉘스크립트로 만들어서 실행

In [57]:
!pwd

/Users/dokyum/Workspace/DSS - /TIL-data-science/scrapy


In [67]:
%%writefile run.sh
cd gmarket
rm items.csv
scrapy crawl GmarketBest -o items.csv

Writing run.sh

Overwriting run.sh


In [68]:
!/bin/bash run.sh

2021-01-04 19:33:03 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: gmarket)
2021-01-04 19:33:03 [scrapy.utils.log] INFO: Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.3 (default, Jul  2 2020, 11:26:31) - [Clang 10.0.0 ], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 2.9.2, Platform macOS-10.15.7-x86_64-i386-64bit
2021-01-04 19:33:03 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-01-04 19:33:03 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'gmarket',
 'NEWSPIDER_MODULE': 'gmarket.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['gmarket.spiders']}
2021-01-04 19:33:03 [scrapy.extensions.telnet] INFO: Telnet Password: f4d520578a78f0f2
2021-01-04 19:33:03 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions

2021-01-04 19:33:05 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1278574706&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1278574706&ver=637453855841286502',
 'o_price': '22,300',
 's_price': '14,900',
 'title': '국내산 고등어 10+10팩 무료배송 팩당 70-100g'}
2021-01-04 19:33:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1607479365&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1849832625&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1914140738&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:06 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item

2021-01-04 19:33:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1655242371&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1655242371&ver=637453855841286502',
 'o_price': '100,000',
 's_price': '97,000',
 'title': '[구글플레이] (카드가능) 기프트코드 10만원 / 구글 기프트카드'}
2021-01-04 19:33:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1473201843&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1473201843&ver=637453855841286502',
 'o_price': '32,900',
 's_price': '32,900',
 'title': '[스팸] 스팸25% 라이트 340G 10개'}
2021-01-04 19:33:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1679174280&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1679174280&ver=637453855841286502',
 'o_price': '26,900',
 's_price': '20,900',
 'title': '[Aura] 럭셔리향수향 아우라 섬유유연제 윌유 1Lx6 +200ml x2'}
2021-01-04 19:33:07 [scrapy.core.en

2021-01-04 19:33:09 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1795046791&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1795046791&ver=637453855841286502',
 'o_price': '29,900',
 's_price': '29,900',
 'title': '[천하일미] 홍석천 이원일 갈비탕 800g 4팩 갈비함량 240g 1팩2인'}
2021-01-04 19:33:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1164836001&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:09 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1932392702&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1932392702&ver=637453855841286502',
 'o_price': '27,900',
 's_price': '27,900',
 'title': 'MLC  베베쿡 신비한 배도라지 신비1박스+금비1박스/총2박스'}
2021-01-04 19:33:09 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1914143805&ver=637453855841286502>
{'link': 'h

2021-01-04 19:33:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1834224973&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:10 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1702708612&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1702708612&ver=637453855841286502',
 'o_price': '26,900',
 's_price': '20,900',
 'title': '[테크] 테크 베이킹구연산 액체세제 일반 리필 2L6개 +1L증정'}
2021-01-04 19:33:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1913683537&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1864626658&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.k

2021-01-04 19:33:12 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1683936294&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1683936294&ver=637453855841286502',
 'o_price': '21,800',
 's_price': '10,900',
 'title': '[페리오] 칫솔 페리오 오리지널 미세모 칫솔 10+10+10입'}
2021-01-04 19:33:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1920641521&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=373058782&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1775171803&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:12 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/I

2021-01-04 19:33:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1806467850&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1940828970&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1822934262&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:13 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1806467850&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1806467850&ver=637453855841286502',
 'o_price': '58,000',
 's_price': '52,200',
 'title': '[하누소] 하누소 왕갈비탕(650g10팩) 세트'}
2021-01-04 19:33:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goo

2021-01-04 19:33:15 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1321028651&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1321028651&ver=637453855841286502',
 'o_price': '30,000',
 's_price': '9,700',
 'title': 'NEW땡큐 화장지 (30롤 x 2팩) / 3겹 두루마리 휴지'}
2021-01-04 19:33:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1702784089&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1107162855&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:15 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1162884079&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1162884079&ver=637453855841286502',
 'o_price': '23,000',
 's_price': '6,900',
 'title': '1+1 밍크

2021-01-04 19:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1965130647&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1965130647&ver=637453855841286502',
 'o_price': '26,200',
 's_price': '25,900',
 'title': '국내산 KF94 마스크 100매 개별포장 대형 식약처 인증'}
2021-01-04 19:33:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=502114157&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:16 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=502114157&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=502114157&ver=637453855841286502',
 'o_price': '33,000',
 's_price': '9,900',
 'title': '코코리겨울신상박스롱후드맨투맨'}
2021-01-04 19:33:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=811829803&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bests

2021-01-04 19:33:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1899276489&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1975603946&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1175590080&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1873181453&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:18 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1899276489&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1899276489&ver=63745385584

2021-01-04 19:33:19 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1791739177&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1791739177&ver=637453855841286502',
 'o_price': '29,800',
 's_price': '14,900',
 'title': '[스마트에코] 아기물티슈 모데라토 캡형 80매X20팩 엠보물티슈'}
2021-01-04 19:33:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=484828506&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1510637585&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1957397595&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/

2021-01-04 19:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1824745796&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1824745796&ver=637453855841286502',
 'o_price': '70,000',
 's_price': '22,900',
 'title': '[프로쉬] 독일 친환경 식기세척기세제 그린레몬 2개SET  60EA'}
2021-01-04 19:33:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1517719492&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1468092005&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:21 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1517719492&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1517719492&ver=637453855841286502',
 'o_price': '33,000',
 's_price': '9,900',
 'title': '메

2021-01-04 19:33:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1975682518&ver=637453855841286502> (referer: http://corners.gmarket.co.kr/Bestsellers)
2021-01-04 19:33:22 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1784364135&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1784364135&ver=637453855841286502',
 'o_price': '39,000',
 's_price': '24,000',
 'title': '[안다르] 안다르 이태리 원사 기모 레깅스 특가'}
2021-01-04 19:33:22 [scrapy.core.scraper] DEBUG: Scraped from <200 http://item.gmarket.co.kr/Item?goodscode=1975682518&ver=637453855841286502>
{'link': 'http://item.gmarket.co.kr/Item?goodscode=1975682518&ver=637453855841286502',
 'o_price': '76,000',
 's_price': '26,500',
 'title': '[김정문알로에] 김정문알로에 스킨케어 기획세트 추가할인+증정'}
2021-01-04 19:33:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://item.gmarket.co.kr/Item?goodscode=1992658336&ver=637453855841286502> (referer: http://corners.gm

In [69]:
df = pd.read_csv('./gmarket/items.csv')
df.tail()

Unnamed: 0,link,o_price,s_price,title
195,http://item.gmarket.co.kr/Item?goodscode=17809...,32800,16400,태송 프리미엄 다섯가지나물밥 210g x10봉
196,http://item.gmarket.co.kr/Item?goodscode=18781...,26900,26900,[피지오겔] 피지오겔 DMT 크림 150ml
197,http://item.gmarket.co.kr/Item?goodscode=17722...,20900,18610,코크제로 355ml캔 24입/ 코카콜라
198,http://item.gmarket.co.kr/Item?goodscode=19641...,23800,11900,자연 모짜렐라 슈레드치즈 1kg /피자치즈/대용량
199,http://item.gmarket.co.kr/Item?goodscode=14949...,26300,7900,[이지바이] 균일가 7900원 32종/기모티/패딩/바지/조끼/후리스


In [2]:
# !cd gmarket && scrapy crawl GmarketBest

[1m[36mgmarket[m[m    items.csv  scrapy.cfg
