# 키워드별 Scrapy 프로젝트 모듈화

### 키워드 '백신' 프로젝트 생성

In [1]:
!rm -rf covid_vaccine

In [2]:
!scrapy startproject covid_vaccine

New Scrapy project 'covid_vaccine', using template directory '/home/ubuntu/.pyenv/versions/3.6.9/envs/python3/lib/python3.6/site-packages/scrapy/templates/project', created in:
    /home/ubuntu/python3/project/covid_vaccine

You can start your first spider with:
    cd covid_vaccine
    scrapy genspider example example.com


In [3]:
!tree covid_vaccine

[01;34mcovid_vaccine[00m
├── [01;34mcovid_vaccine[00m
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── [01;34mspiders[00m
│       └── __init__.py
└── scrapy.cfg

2 directories, 7 files


### items.py 작성

In [4]:
%%writefile covid_vaccine/covid_vaccine/items.py
import scrapy


class CovidVaccineItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    date = scrapy.Field()

Overwriting covid_vaccine/covid_vaccine/items.py


### spider.py 작성

In [5]:
%%writefile covid_vaccine/covid_vaccine/spiders/spider.py

import scrapy
from datetime import datetime
from covid_vaccine.items import CovidVaccineItem

class CovidSpider(scrapy.Spider):
    name = 'CovidVaccine'
    allow_domain = ['https://www.naver.com/']

    def start_requests(self):
        yield scrapy.Request("https://search.naver.com/search.naver?where=news&query=코로나%20국내%20백신&pd=4", callback=self.parse_kword)
           
    def parse_kword(self, response):
        item = CovidVaccineItem()
        now = datetime.now()
        item['date'] = "%s년 %s월 %s일 %s시 %s분" %(now.year, now.month, now.day, now.hour, now.minute)
        vaccine_title = response.xpath('//*[@id="main_pack"]/section/div/div[3]/ul/li/div[1]/div/a/@title').extract()
        vaccine_link = response.xpath('//*[@id="main_pack"]/section/div/div[3]/ul/li/div[1]/div/a/@href').extract()
        for i in range(len(vaccine_title)):
            item['title']= vaccine_title[i]
            item['link'] = vaccine_link[i]
            yield item

Writing covid_vaccine/covid_vaccine/spiders/spider.py


In [6]:
# sh 파일 작성
%%writefile vaccine.sh
cd covid_vaccine
scrapy crawl CovidVaccine -o covid_vaccine.csv

Overwriting vaccine.sh


In [7]:
# settings.py 작성
!sed -i 's/ROBOTSTXT_OBEY = True/ROBOTSTXT_OBEY = False/' covid_vaccine/covid_vaccine/settings.py

### mongodb.py 작성

In [8]:
%%writefile covid_vaccine/covid_vaccine/mongodb.py
import pymongo

client = pymongo.MongoClient("mongodb://public_ip:port")
db = client.naver_kword1
collection = db.kword_vaccine

Writing covid_vaccine/covid_vaccine/mongodb.py


### pipelines.py 작성

In [9]:
%%writefile covid_vaccine/covid_vaccine/pipelines.py
from itemadapter import ItemAdapter
from .mongodb import collection

class CovidVaccinePipeline:
    def process_item(self, item, spider):
        data = {"title": item["title"], "link": item["link"], "date": item["date"]}
        collection.insert(data)
        return item

Overwriting covid_vaccine/covid_vaccine/pipelines.py


### 크롤링 테스트 및 데이터 프레임 출력

In [10]:
!/bin/bash vaccine.sh

2021-03-15 21:16:17 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: covid_vaccine)
2021-03-15 21:16:17 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.6.9 (default, Dec 28 2020, 03:27:25) - [GCC 7.5.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1j  16 Feb 2021), cryptography 3.4.6, Platform Linux-5.4.0-1038-aws-x86_64-with-debian-buster-sid
2021-03-15 21:16:17 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2021-03-15 21:16:17 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'covid_vaccine',
 'NEWSPIDER_MODULE': 'covid_vaccine.spiders',
 'SPIDER_MODULES': ['covid_vaccine.spiders']}
2021-03-15 21:16:17 [scrapy.extensions.telnet] INFO: Telnet Password: fb531e2da891f917
2021-03-15 21:16:17 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrap

In [11]:
import pandas as pd

In [12]:
df_vaccine = pd.read_csv('./covid_vaccine/covid_vaccine.csv')
df_vaccine

Unnamed: 0,date,link,title
0,2021년 3월 15일 21시 16분,http://yna.kr/AKR20210315044651530?did=1195m,코로나19 백신 어제 1천74명 접종…17일간 총 58만8천958명(종합)
1,2021년 3월 15일 21시 16분,https://imnews.imbc.com/news/2021/society/arti...,백신 이상반응 신고 여성-젊은층에서 높아…여성 2.1%-20대 3.6%
2,2021년 3월 15일 21시 16분,https://www.chosun.com/national/welfare-medica...,"75세 이상은 화이자, 65~74세는 아스트라 백신 맞는다"
3,2021년 3월 15일 21시 16분,http://www.dt.co.kr/contents.html?article_no=2...,文대통령 23일 AZ백신 접종
4,2021년 3월 15일 21시 16분,https://www.nocutnews.co.kr/news/5516516,충북 2분기 코로나19 예방접종 확대…백신 관리 '비상'
5,2021년 3월 15일 21시 16분,https://biz.chosun.com/site/data/html_dir/2021...,4월부터 일반국민 백신접종 시작…75세 이상 고령층 접종도 개시
6,2021년 3월 15일 21시 16분,https://news.sbs.co.kr/news/endPage.do?news_id...,접종 후 사망 신고 16명 중 14명 '백신과 무관' 잠정 결론
7,2021년 3월 15일 21시 16분,http://www.segye.com/content/html/2021/03/15/2...,‘언제 기회 줍니까’ 물었던 문 대통령…23일 AZ 백신 공개 접종
8,2021년 3월 15일 21시 16분,http://news.kmib.co.kr/article/view.asp?arcid=...,접종 후 사망 16명 중 14명 ‘백신과 무관’ 잠정 결론
9,2021년 3월 15일 21시 16분,http://www.newsis.com/view/?id=NISX20210315_00...,백신 확보 '한시름'…AZ 455만·화이자 350만명분 6월까지 도입(종합)


### 터미널 crontab editor
- 'cd python3/project/covid_vaccine && scrapy crawl CovidVaccine' 실행하여 크롤링된 데이터 mongodb 적재

### 키워드 '거리두기' 프로젝트 생성

In [13]:
!rm -rf covid_soc_distance

In [14]:
!scrapy startproject covid_soc_distance

New Scrapy project 'covid_soc_distance', using template directory '/home/ubuntu/.pyenv/versions/3.6.9/envs/python3/lib/python3.6/site-packages/scrapy/templates/project', created in:
    /home/ubuntu/python3/project/covid_soc_distance

You can start your first spider with:
    cd covid_soc_distance
    scrapy genspider example example.com


In [15]:
!tree covid_soc_distance/

[01;34mcovid_soc_distance/[00m
├── [01;34mcovid_soc_distance[00m
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── [01;34mspiders[00m
│       └── __init__.py
└── scrapy.cfg

2 directories, 7 files


### items.py 작성

In [16]:
%%writefile covid_soc_distance/covid_soc_distance/items.py
import scrapy

class CovidSocDistanceItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    date = scrapy.Field()

Overwriting covid_soc_distance/covid_soc_distance/items.py


### spider.py 작성

In [17]:
%%writefile covid_soc_distance/covid_soc_distance/spiders/spider.py

import scrapy
from datetime import datetime
from covid_soc_distance.items import CovidSocDistanceItem

class CovidSpider(scrapy.Spider):
    name = 'CovidSocDistance'
    allow_domain = ['https://www.naver.com/']
    
    def start_requests(self):
        yield scrapy.Request("https://search.naver.com/search.naver?where=news&query=코로나%20사회적거리두기&pd=4", callback=self.parse_kword)
        
    def parse_kword(self, response):
        item = CovidSocDistanceItem()
        now = datetime.now()
        item['date'] = "%s년 %s월 %s일 %s시 %s분" %(now.year, now.month, now.day, now.hour, now.minute)
        soc_distance_title = response.xpath('//*[@id="main_pack"]/section/div/div[3]/ul/li/div[1]/div/a/@title').extract()
        soc_distance_link = response.xpath('//*[@id="main_pack"]/section/div/div[3]/ul/li/div[1]/div/a/@href').extract()
        for i in range(len(soc_distance_title)):
            item['title']= soc_distance_title[i]
            item['link'] = soc_distance_link[i]
            yield item

Writing covid_soc_distance/covid_soc_distance/spiders/spider.py


In [18]:
# sh 파일 작성
%%writefile soc_distance.sh
cd covid_soc_distance
scrapy crawl CovidSocDistance -o covid_soc_distance.csv

Overwriting soc_distance.sh


In [19]:
# settings.py 수정
!sed -i 's/ROBOTSTXT_OBEY = True/ROBOTSTXT_OBEY = False/' covid_soc_distance/covid_soc_distance/settings.py

### mongodb.py 작성

In [20]:
%%writefile covid_soc_distance/covid_soc_distance/mongodb.py
import pymongo

client = pymongo.MongoClient("mongodb://public_ip:port")
db = client.naver_kword2
collection = db.kword_soc_distance

Writing covid_soc_distance/covid_soc_distance/mongodb.py


### pipelines.py 작성

In [21]:
%%writefile covid_soc_distance/covid_soc_distance/pipelines.py
from itemadapter import ItemAdapter
from .mongodb import collection

class CovidSocDistancePipeline:
    def process_item(self, item, spider):
        data = {"title": item["title"], "link": item["link"], "date": item["date"]}
        collection.insert(data)
        return item

Overwriting covid_soc_distance/covid_soc_distance/pipelines.py


### 크롤링 테스트 및 데이터 프레임 출력

In [22]:
!/bin/bash soc_distance.sh

2021-03-15 21:16:38 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: covid_soc_distance)
2021-03-15 21:16:38 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.6.9 (default, Dec 28 2020, 03:27:25) - [GCC 7.5.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1j  16 Feb 2021), cryptography 3.4.6, Platform Linux-5.4.0-1038-aws-x86_64-with-debian-buster-sid
2021-03-15 21:16:38 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2021-03-15 21:16:38 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'covid_soc_distance',
 'NEWSPIDER_MODULE': 'covid_soc_distance.spiders',
 'SPIDER_MODULES': ['covid_soc_distance.spiders']}
2021-03-15 21:16:38 [scrapy.extensions.telnet] INFO: Telnet Password: a9395d44e887fd85
2021-03-15 21:16:38 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.M

In [23]:
import pandas as pd

In [24]:
df_soc_distance = pd.read_csv('./covid_soc_distance/covid_soc_distance.csv')
df_soc_distance

Unnamed: 0,date,link,title
0,2021년 3월 15일 21시 16분,http://www.newsis.com/view/?id=NISX20210315_00...,"서울시, 거리두기 2단계 28일까지 연장…5인모임 금지도 유지"
1,2021년 3월 15일 21시 16분,http://tk.newdaily.co.kr/site/data/html/2021/0...,"울진군, 사회적 거리두기 1.5단계 2주 연장"
2,2021년 3월 15일 21시 16분,https://www.pressian.com/pages/articles/202103...,"전남도, 거리두기 1.5단계 2주 연장"
3,2021년 3월 15일 21시 16분,http://www.kukinews.com/newsView/kuk202103150270,"태백시, 사회적 거리두기 28일까지 1.5단계 재연장"
4,2021년 3월 15일 21시 16분,http://moneys.mt.co.kr/news/mwView.php?no=2021...,신규확진 382명… 아직 거리두기 2.5단계(종합)
5,2021년 3월 15일 21시 16분,https://ilyo.co.kr/?ac=article_view&entry_id=3...,[포항시정] 현 사회적 거리두기 단계 2주간 유지 外
6,2021년 3월 15일 21시 16분,http://www.fnnews.com/news/202103151005256447,"""코로나 팬데믹, 서비스상표 지형도 바꿨다"""
7,2021년 3월 15일 21시 16분,http://www.segye.com/content/html/2021/03/15/2...,미국은 코로나19 '1m 거리두기' 시험 중
8,2021년 3월 15일 21시 16분,http://news.kbs.co.kr/news/view.do?ncd=5138858...,신규 확진자 382명…“거리두기 2주 연장”
9,2021년 3월 15일 21시 16분,http://www.cts.tv/news/view?ncate=THMNWS01&dpi...,"사회적 거리두기 2단계 유지, 예배 적용은?"


### 터미널 crontab editor
- 'cd python3/project/covid_vaccine && scrapy crawl CovidVaccine' 실행하여 크롤링된 데이터 mongodb 적재

### 키워드 '확진' 프로젝트 생성

In [25]:
!rm -rf covid_infection

In [26]:
!scrapy startproject covid_infection

New Scrapy project 'covid_infection', using template directory '/home/ubuntu/.pyenv/versions/3.6.9/envs/python3/lib/python3.6/site-packages/scrapy/templates/project', created in:
    /home/ubuntu/python3/project/covid_infection

You can start your first spider with:
    cd covid_infection
    scrapy genspider example example.com


In [27]:
!tree covid_infection

[01;34mcovid_infection[00m
├── [01;34mcovid_infection[00m
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── [01;34mspiders[00m
│       └── __init__.py
└── scrapy.cfg

2 directories, 7 files


### items.py 작성

In [28]:
%%writefile covid_infection/covid_infection/items.py
import scrapy


class CovidInfectionItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    date = scrapy.Field()

Overwriting covid_infection/covid_infection/items.py


### spider.py 작성

In [29]:
%%writefile covid_infection/covid_infection/spiders/spider.py

import scrapy
from datetime import datetime
from covid_infection.items import CovidInfectionItem

class CovidSpider(scrapy.Spider):
    name = 'CovidInfection'
    allow_domain = ['https://www.naver.com/']
    
    def start_requests(self):
        yield scrapy.Request("https://search.naver.com/search.naver?where=news&query=코로나%20수도권%20확진&pd=4", callback=self.parse_kword)
        
    def parse_kword(self, response):
        item = CovidInfectionItem()
        now = datetime.now()
        item['date'] = "%s년 %s월 %s일 %s시 %s분" %(now.year, now.month, now.day, now.hour, now.minute)
        infection_title = response.xpath('//*[@id="main_pack"]/section/div/div[3]/ul/li/div[1]/div/a/@title').extract()
        infection_link = response.xpath('//*[@id="main_pack"]/section/div/div[3]/ul/li/div[1]/div/a/@href').extract()
        for i in range(len(infection_title)):
            item['title']= infection_title[i]
            item['link'] = infection_link[i]
            yield item

Writing covid_infection/covid_infection/spiders/spider.py


In [30]:
# sh 파일 작성
%%writefile infection.sh
cd covid_infection
scrapy crawl CovidInfection -o covid_infection.csv

Overwriting infection.sh


In [31]:
# settings.py 수정
!sed -i 's/ROBOTSTXT_OBEY = True/ROBOTSTXT_OBEY = False/' covid_infection/covid_infection/settings.py

### mongodb.py 작성

In [32]:
%%writefile covid_infection/covid_infection/mongodb.py
import pymongo

client = pymongo.MongoClient("mongodb://public_ip:port")
db = client.naver_kword3
collection = db.kword_infection

Writing covid_infection/covid_infection/mongodb.py


### pipelines.py 작성

In [33]:
%%writefile covid_infection/covid_infection/pipelines.py
from itemadapter import ItemAdapter
from .mongodb import collection

class CovidInfectionPipeline:
    def process_item(self, item, spider):
        data = {"title": item["title"], "link": item["link"], "date": item["date"]}
        collection.insert(data)
        return item

Overwriting covid_infection/covid_infection/pipelines.py


### 크롤링 테스트 및 데이터 프레임 출력

In [50]:
!/bin/bash infection.sh

2021-03-15 21:38:13 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: covid_infection)
2021-03-15 21:38:13 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.6.9 (default, Dec 28 2020, 03:27:25) - [GCC 7.5.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1j  16 Feb 2021), cryptography 3.4.6, Platform Linux-5.4.0-1038-aws-x86_64-with-debian-buster-sid
2021-03-15 21:38:13 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2021-03-15 21:38:13 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'covid_infection',
 'NEWSPIDER_MODULE': 'covid_infection.spiders',
 'SPIDER_MODULES': ['covid_infection.spiders']}
2021-03-15 21:38:13 [scrapy.extensions.telnet] INFO: Telnet Password: fb5de452dec74693
2021-03-15 21:38:13 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',

In [51]:
df_infection = pd.read_csv('./covid_infection/covid_infection.csv')
df_infection

Unnamed: 0,date,link,title
0,2021년 3월 15일 21시 38분,http://yna.kr/AKR20210315078600530?did=1195m,"내일 수도권 특별 방역대책 발표…""3차 유행, 안정세로 바꿔야"""
1,2021년 3월 15일 21시 38분,http://www.wowtv.co.kr/NewsCenter/News/Read?ar...,코로나19 신규확진 382명…휴일영향 1주 만에 300명대
2,2021년 3월 15일 21시 38분,http://www.newsis.com/view/?id=NISX20210315_00...,경남 코로나 53명 신규 확진…누적 2443명(종합)
3,2021년 3월 15일 21시 38분,https://biz.chosun.com/site/data/html_dir/2021...,확진자 몰린 수도권… “서울·경기 특별방역대책 내일 발표”
4,2021년 3월 15일 21시 38분,http://www.busan.com/view/busan/view.php?code=...,진주·거제 지역 목욕탕 관련 코로나19 영향 경남에서 하루 53명 확진(종합)
5,2021년 3월 15일 21시 38분,https://view.asiae.co.kr/article/2021031519050...,오후 6시까지 신규 확진 274명…내일 300명대 중반 예상
6,2021년 3월 15일 21시 38분,https://www.news1.kr/articles/?4241803,15일 오후 6시 276명 확진…진주 목욕탕 집단감염 확산(종합)
7,2021년 3월 15일 21시 38분,http://kormedi.com/?p=1335608,코로나19 모든 방역지표 나빠졌다.. 3차유행 확산
8,2021년 3월 15일 21시 38분,http://yna.kr/AKR20210315046851052?did=1195m,경남 53명 신규 확진…진주 사우나·거제 유흥시설 여파(종합)
9,2021년 3월 15일 21시 38분,http://news.khan.co.kr/kh_news/khan_art_view.h...,[속보] 코로나19 신규 확진자 382명…일주일 만에 300명대


### 터미널 crontab editor
- 'cd python3/project/covid_vaccine && scrapy crawl CovidInfection' 실행하여 크롤링된 데이터 mongodb 적재

### 키워드 '지원금' 프로젝트 생성

In [37]:
!rm -rf covid_support

In [38]:
!scrapy startproject covid_support

New Scrapy project 'covid_support', using template directory '/home/ubuntu/.pyenv/versions/3.6.9/envs/python3/lib/python3.6/site-packages/scrapy/templates/project', created in:
    /home/ubuntu/python3/project/covid_support

You can start your first spider with:
    cd covid_support
    scrapy genspider example example.com


In [39]:
!tree covid_support

[01;34mcovid_support[00m
├── [01;34mcovid_support[00m
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── [01;34mspiders[00m
│       └── __init__.py
└── scrapy.cfg

2 directories, 7 files


### items.py 작성

In [40]:
%%writefile covid_support/covid_support/items.py
import scrapy


class CovidSupportItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    date = scrapy.Field()

Overwriting covid_support/covid_support/items.py


### spider.py 작성

In [52]:
%%writefile covid_support/covid_support/spiders/spider.py

import scrapy
from datetime import datetime
from covid_support.items import CovidSupportItem

class CovidSpider(scrapy.Spider):
    name = 'CovidSupport'
    allow_domain = ['https://www.naver.com/']
    
    def start_requests(self):
        yield scrapy.Request("https://search.naver.com/search.naver?where=news&query=코로나%20수도권%20지원금&pd=4", callback=self.parse_kword)
        
    def parse_kword(self, response):
        item = CovidSupportItem()
        now = datetime.now()
        item['date'] = "%s년 %s월 %s일 %s시 %s분" %(now.year, now.month, now.day, now.hour, now.minute)
        support_title = response.xpath('//*[@id="main_pack"]/section/div/div[3]/ul/li/div[1]/div/a/@title').extract()
        support_link = response.xpath('//*[@id="main_pack"]/section/div/div[3]/ul/li/div[1]/div/a/@href').extract()
        for i in range(len(support_title)):
            item['title']= support_title[i]
            item['link'] = support_link[i]
            yield item

Overwriting covid_support/covid_support/spiders/spider.py


In [42]:
# sh 파일 작성
%%writefile support.sh
cd covid_support
scrapy crawl CovidSupport -o covid_support.csv

Writing support.sh


In [43]:
# settings.py 수정
!sed -i 's/ROBOTSTXT_OBEY = True/ROBOTSTXT_OBEY = False/' covid_support/covid_support/settings.py

### mongodb.py 작성

In [44]:
%%writefile covid_support/covid_support/mongodb.py
import pymongo

client = pymongo.MongoClient("mongodb://public_ip:port")
db = client.naver_kword4
collection = db.kword_support

Writing covid_support/covid_support/mongodb.py


### pipelines.py 작성

In [45]:
%%writefile covid_support/covid_support/pipelines.py
from itemadapter import ItemAdapter
from .mongodb import collection

class CovidSupportPipeline:
    def process_item(self, item, spider):
        data = {"title": item["title"], "link": item["link"], "date": item["date"]}
        collection.insert(data)
        return item

Overwriting covid_support/covid_support/pipelines.py


### 크롤링 테스트 및 데이터 프레임 출력

In [46]:
!/bin/bash support.sh

2021-03-15 21:30:27 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: covid_support)
2021-03-15 21:30:27 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.6.9 (default, Dec 28 2020, 03:27:25) - [GCC 7.5.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1j  16 Feb 2021), cryptography 3.4.6, Platform Linux-5.4.0-1038-aws-x86_64-with-debian-buster-sid
2021-03-15 21:30:27 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2021-03-15 21:30:27 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'covid_support',
 'NEWSPIDER_MODULE': 'covid_support.spiders',
 'SPIDER_MODULES': ['covid_support.spiders']}
2021-03-15 21:30:27 [scrapy.extensions.telnet] INFO: Telnet Password: 86106226760d56aa
2021-03-15 21:30:27 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrap

In [47]:
import pandas as pd

In [48]:
df_support = pd.read_csv('./covid_support/covid_support.csv')
df_support

Unnamed: 0,date,link,title
0,2021년 3월 15일 21시 30분,https://www.chosun.com/politics/assembly/2021/...,"[단독] “4차지원금 추경안 허점많다” 국회 예결위, 정부 비판 보고서"
1,2021년 3월 15일 21시 30분,http://www.newsis.com/view/?id=NISX20210315_00...,"충북도의회, 농어업인 재난지원금 지급대상 포함 건의문 채택"
2,2021년 3월 15일 21시 30분,https://www.news1.kr/articles/?4241847,"여야, 여행업·웨딩업에 재난지원금 300만원 지원 추진(종합)"
3,2021년 3월 15일 21시 30분,http://www.fnnews.com/news/202103151304486036,"전남도, 대중교통 분야 재난지원금 확보 총력"
4,2021년 3월 15일 21시 30분,https://www.wikitree.co.kr/articles/628463,"영암군, 코로나19 소상공인 지원으로 지역경제 살리기 전력질주"
5,2021년 3월 15일 21시 30분,http://news.mk.co.kr/newsRead.php?no=246328&ye...,"[단독] ""나도 300만원 받을 수 있나"" 여행·웨딩업에 재난지원금 100만원 더"
6,2021년 3월 15일 21시 30분,http://www.wowtv.co.kr/NewsCenter/News/Read?ar...,매출 오른 영업제한 업종도 지원금 100만원 받는다
7,2021년 3월 15일 21시 30분,https://www.hankyung.com/politics/article/2021...,"여야, 코로나19 직격탄 '여행·웨딩업' 재난지원금 300만원 추진"
8,2021년 3월 15일 21시 30분,https://view.asiae.co.kr/article/2021031509051...,"은평구, 실직자 등 위한 코로나19 위기극복 긴급 재정(182억) 투입"
9,2021년 3월 15일 21시 30분,https://www.pressian.com/pages/articles/202103...,"4.7 서울·부산 후보들, 코로나19 대응 공약은?"


### 터미널 crontab editor
- 'cd python3/project/covid_vaccine && scrapy crawl CovidSupport' 실행하여 크롤링된 데이터 mongodb 적재