# GET and POST

### GET

GET 방식은 인터넷 주소를 기준으로 이에 해당하는 데이터나 파일을 요청하는 것이다. 주로 클라이언트가 요청하는 쿼리를 앰퍼샌드(&) 또는 물음표(?) 형식으로 결합해 서버에 전달한다.

### POST

POST 방식은 사용자가 필요한 값을 추가해서 요청하는 방법이다. GET 방식과 달리 클라이언트가 요청하는 쿼리를 body에 넣어서 전송하므로 요청 내용을 직접 볼 수 없다.

# Crawling Examples

## 명언 크롤링하기

[http://quotes.toscrape.com/](http://quotes.toscrape.com/)

In [2]:
import requests as rq 

url = 'http://quotes.toscrape.com/'
quote = rq.get(url)

# response 200 : 데이터 이상 없이 받음
print(quote)

<Response [200]>


In [3]:
quote.content[:1000]

b'<!DOCTYPE html>\n<html lang="en">\n<head>\n\t<meta charset="UTF-8">\n\t<title>Quotes to Scrape</title>\n    <link rel="stylesheet" href="/static/bootstrap.min.css">\n    <link rel="stylesheet" href="/static/main.css">\n</head>\n<body>\n    <div class="container">\n        <div class="row header-box">\n            <div class="col-md-8">\n                <h1>\n                    <a href="/" style="text-decoration: none">Quotes to Scrape</a>\n                </h1>\n            </div>\n            <div class="col-md-4">\n                <p>\n                \n                    <a href="/login">Login</a>\n                \n                </p>\n            </div>\n        </div>\n    \n\n<div class="row">\n    <div class="col-md-8">\n\n    <div class="quote" itemscope itemtype="http://schema.org/CreativeWork">\n        <span class="text" itemprop="text">\xe2\x80\x9cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\xe2\x80\

In [5]:
# BeautifulSoup() 함수를 이용해 원하는 HTML 요소에 접근하기 쉬운 BeautifulSoup 객체로 변경할 수 있다.
from bs4 import BeautifulSoup

quote_html = BeautifulSoup(quote.content, 'html.parser')
quote_html.head()

[<meta charset="utf-8"/>,
 <title>Quotes to Scrape</title>,
 <link href="/static/bootstrap.min.css" rel="stylesheet"/>,
 <link href="/static/main.css" rel="stylesheet"/>]

### `find()` 함수를 이용한 크롤링

In [6]:
# 명언에 해당하는 부분이 [class가 quote인 div 태그 > class가 text인 span 태그]에 위치
quote_div = quote_html.find_all('div', class_='quote')
quote_div[0]

<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
            Tags:
            <meta class="keywords" content="change,deep-thoughts,thinking,world" itemprop="keywords"/>
<a class="tag" href="/tag/change/page/1/">change</a>
<a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
<a class="tag" href="/tag/thinking/page/1/">thinking</a>
<a class="tag" href="/tag/world/page/1/">world</a>
</div>
</div>

In [8]:
# class가 text인 span 태그
quote_span = quote_div[0].find_all('span', class_='text')

quote_span

[<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>]

In [11]:
# .text를 입력하면 텍스트 데이터만을 출력할 수 있다.
quote_span[0].text

'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'

In [12]:
# for문 중에 리스트 내포 형태를 이용하면 명언에 해당하는 부분을 한번에 추출할 수 있음
quote_div = quote_html.find_all('div', class_='quote')

quote_div

[<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
 <span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
 <span>by <small class="author" itemprop="author">Albert Einstein</small>
 <a href="/author/Albert-Einstein">(about)</a>
 </span>
 <div class="tags">
             Tags:
             <meta class="keywords" content="change,deep-thoughts,thinking,world" itemprop="keywords"/>
 <a class="tag" href="/tag/change/page/1/">change</a>
 <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
 <a class="tag" href="/tag/thinking/page/1/">thinking</a>
 <a class="tag" href="/tag/world/page/1/">world</a>
 </div>
 </div>,
 <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
 <span class="text" itemprop="text">“It is our choices, Harry, that show what we truly are, far more than our abilities.”</span>
 <span>by <small class="author" itempr

In [13]:
[i.find_all('span', class_='text')[0].text for i in quote_div]

['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”',
 '“It is our choices, Harry, that show what we truly are, far more than our abilities.”',
 '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”',
 '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”',
 "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”",
 '“Try not to become a man of success. Rather become a man of value.”',
 '“It is better to be hated for what you are than to be loved for what you are not.”',
 "“I have not failed. I've just found 10,000 ways that won't work.”",
 "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”",
 '“A day without sunshine is like, you know, night.”']

### `select()` 함수를 이용한 크롤링

In [14]:
quote_text = quote_html.select('div.quote > span.text')
quote_text

[<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>,
 <span class="text" itemprop="text">“It is our choices, Harry, that show what we truly are, far more than our abilities.”</span>,
 <span class="text" itemprop="text">“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”</span>,
 <span class="text" itemprop="text">“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”</span>,
 <span class="text" itemprop="text">“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”</span>,
 <span class="text" itemprop="text">“Try not to become a man of success. Rather become a man of value.”</span>,
 <span class="text" itemprop="text">“It is better to be hated for what you are than to be loved for what you are not.

In [15]:
quote_text_list = [i.text for i in quote_text]

quote_text_list

['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”',
 '“It is our choices, Harry, that show what we truly are, far more than our abilities.”',
 '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”',
 '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”',
 "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”",
 '“Try not to become a man of success. Rather become a man of value.”',
 '“It is better to be hated for what you are than to be loved for what you are not.”',
 "“I have not failed. I've just found 10,000 ways that won't work.”",
 "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”",
 '“A day without sunshine is like, you know, night.”']

In [16]:
# 명언을 말한 사람 크롤링
quote_author = quote_html.select('div.quote > span > small.author')
quote_author_list = [i.text for i in quote_author]

quote_author_list

['Albert Einstein',
 'J.K. Rowling',
 'Albert Einstein',
 'Jane Austen',
 'Marilyn Monroe',
 'Albert Einstein',
 'André Gide',
 'Thomas A. Edison',
 'Eleanor Roosevelt',
 'Steve Martin']

In [17]:
# about 정보 크롤링
quote_link = quote_html.select('div.quote > span > a')

quote_link

[<a href="/author/Albert-Einstein">(about)</a>,
 <a href="/author/J-K-Rowling">(about)</a>,
 <a href="/author/Albert-Einstein">(about)</a>,
 <a href="/author/Jane-Austen">(about)</a>,
 <a href="/author/Marilyn-Monroe">(about)</a>,
 <a href="/author/Albert-Einstein">(about)</a>,
 <a href="/author/Andre-Gide">(about)</a>,
 <a href="/author/Thomas-A-Edison">(about)</a>,
 <a href="/author/Eleanor-Roosevelt">(about)</a>,
 <a href="/author/Steve-Martin">(about)</a>]

In [18]:
# 이 중 우리는 속성값에 해당하는 정보만 필요
quote_link[0]['href']

'/author/Albert-Einstein'

In [19]:
# 모든 속성값을 한번에 추출한 후 URL을 만들기 위해 주소 부분도 결합
['http://quotes.toscrape.com' + i['href'] for i in quote_link]

['http://quotes.toscrape.com/author/Albert-Einstein',
 'http://quotes.toscrape.com/author/J-K-Rowling',
 'http://quotes.toscrape.com/author/Albert-Einstein',
 'http://quotes.toscrape.com/author/Jane-Austen',
 'http://quotes.toscrape.com/author/Marilyn-Monroe',
 'http://quotes.toscrape.com/author/Albert-Einstein',
 'http://quotes.toscrape.com/author/Andre-Gide',
 'http://quotes.toscrape.com/author/Thomas-A-Edison',
 'http://quotes.toscrape.com/author/Eleanor-Roosevelt',
 'http://quotes.toscrape.com/author/Steve-Martin']

### 모든 페이지 데이터 크롤링하기

In [30]:
import requests as rq 
from bs4 import BeautifulSoup
import time

# 명언과 말한 사람, 링크가 들어간 빈 리스트(text_list, author_list, infor_list)
text_list = []
author_list = []
infor_list = []

# 1부터 100까지 적용하여 URL을 생성
for i in range(1, 100):
    
    url = f'http://quotes.toscrape.com/page/{i}/'
    quote = rq.get(url)
    
    # HTML 정보를 받아 온 후 BeautifulSoup() 함수를 통해 파싱
    quote_html = BeautifulSoup(quote.content, 'html.parser')
    
    # 명언과 말한 사람, 링크에 해당하는 내용을 각각 추출
    quote_text = quote_html.select('div.quote > span.text')
    quote_author_list = [i.text for i in quote_text]
    
    quote_author = quote_html.select('div.quote > span > small.author')
    quote_author_list = [i.text for i in quote_author]
    
    quote_link = quote_html.select('div.quote > span > a')
    quote_link_list = ['http://quotes.toscrape.com' + i['href'] for i in quote_link]
    
    if len(quote_text_list) > 0:
        
        # print(quote_text_list)
        text_list.extend(quote_text_list)
        # print(quote_author_list)
        author_list.extend(quote_author_list)
        infor_list.extend(quote_link_list)
        # 한 번 루프가 돌때마다 1초간 정지
        time.sleep(1)
    
    else:
        break
    

In [31]:
len(text_list[:100]), len(author_list), len(infor_list)

(100, 100, 100)

In [33]:
import pandas as pd

df = pd.DataFrame({
    'text': text_list[:100],
    'author': author_list,
    'infor': infor_list
})
df

Unnamed: 0,text,author,infor
0,“The world as we have created it is a process ...,Albert Einstein,http://quotes.toscrape.com/author/Albert-Einstein
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,http://quotes.toscrape.com/author/J-K-Rowling
2,“There are only two ways to live your life. On...,Albert Einstein,http://quotes.toscrape.com/author/Albert-Einstein
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,http://quotes.toscrape.com/author/Jane-Austen
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,http://quotes.toscrape.com/author/Marilyn-Monroe
...,...,...,...
95,“Try not to become a man of success. Rather be...,Harper Lee,http://quotes.toscrape.com/author/Harper-Lee
96,“It is better to be hated for what you are tha...,Madeleine L'Engle,http://quotes.toscrape.com/author/Madeleine-LE...
97,"“I have not failed. I've just found 10,000 way...",Mark Twain,http://quotes.toscrape.com/author/Mark-Twain
98,“A woman is like a tea bag; you never know how...,Dr. Seuss,http://quotes.toscrape.com/author/Dr-Seuss


## 금융 속보 크롤링

[https://finance.naver.com/news/news_list.naver?mode=LSS2D&section_id=101&section_id2=258](https://finance.naver.com/news/news_list.naver?mode=LSS2D&section_id=101&section_id2=258)

In [1]:
import requests as rq 
from bs4 import BeautifulSoup

In [2]:
url = 'https://finance.naver.com/news/news_list.naver?mode=LSS2D&section_id=101&section_id2=258'
data = rq.get(url)
html = BeautifulSoup(data.content, 'html.parser')
html_select = html.select('dl > dd.articleSubject > a')

html_select[0:3]

[<a href="/news/news_read.naver?article_id=0004263960&amp;office_id=011&amp;mode=LSS2D&amp;type=0§ion_id=101§ion_id2=258§ion_id3=&amp;date=20231120&amp;page=1" title="공매도 금지했는데…개미, 해외로 '머니 무브'">공매도 금지했는데…개미, 해외로 '머니 무브'</a>,
 <a href="/news/news_read.naver?article_id=0000771235&amp;office_id=469&amp;mode=LSS2D&amp;type=0§ion_id=101§ion_id2=258§ion_id3=&amp;date=20231120&amp;page=1" title="은행 연말까지 '상생 보따리' 풀어놓는다... 2조 수준일 듯">은행 연말까지 '상생 보따리' 풀어놓는다... 2조 수준일 듯</a>,
 <a href="/news/news_read.naver?article_id=0001653465&amp;office_id=005&amp;mode=LSS2D&amp;type=0§ion_id=101§ion_id2=258§ion_id3=&amp;date=20231120&amp;page=1" title="HJ중공업, 4년 치 일감 확보…수주 잔액만 7조4000억원">HJ중공업, 4년 치 일감 확보…수주 잔액만 7조4000억원</a>]

In [4]:
html_select[0]['title']

"공매도 금지했는데…개미, 해외로 '머니 무브'"

In [5]:
# for 문으로 한번에 제목들 추출
[i['title'] for i in html_select]

["공매도 금지했는데…개미, 해외로 '머니 무브'",
 "은행 연말까지 '상생 보따리' 풀어놓는다... 2조 수준일 듯",
 'HJ중공업, 4년 치 일감 확보…수주 잔액만 7조4000억원',
 "샘 알트먼 소식에 20% 급등락…'유명무실' 월드코인 널뛰기",
 '제10회 금융투자협회장배 자선야구대회 폐막 [뉴스+현장]',
 '"봤노라, 이겼노라, 올랐노라" [마켓플러스]',
 "'파두 사태' 해결 의지 있나…시한폭탄 더 있다 [이슈플러스]",
 '\'원-메리츠 1주년\' 지주 중심 경영 발표..."재무 유연성 극대화"',
 "은행권 연내 상생금융 발표…'횡재세 대체' 2조 육박할 듯(종합)",
 '김장 물가 잡았다지만…식당들 기댈 곳은 여전히 ‘수입산 김치’',
 '[속보]메리츠증권·화재 신임 대표이사에 장원재·김중현',
 '[속보]메리츠금융지주, 부채·운용 부문 신설…김용범·최희문 부문장',
 '엔비디아 실적, 배드뉴스 변곡점 될까 [주간이슈캘린더]',
 '"오픈AI 떠나는 올트먼, MS로 자리 옮긴다"',
 "IPO 첫 집단소송 '일파만파' …해외 사례 살펴보니",
 '메리츠證, 14년만에 새 CEO…장원재 사장 선임',
 '메리츠證. \'13년 장수 CEO\' 최희문 교체..."내부통제 강화"']

## 표 크롤링하기

[https://en.wikipedia.org/wiki/List_of_countries_by_stock_market_capitalization](https://en.wikipedia.org/wiki/List_of_countries_by_stock_market_capitalization)

In [6]:
import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_countries_by_stock_market_capitalization'
tbl = pd.read_html(url)

tbl

[          Country Total market cap (in mil. US$)[2]  \
 0   United States                          44719661   
 1           China                          13214311   
 2           Japan                           6718220   
 3       Hong Kong                           6130420   
 4           India                      3,612,985[5]   
 ..            ...                               ...   
 95        Algeria                               371   
 96       Paraguay                               313   
 97        Uruguay                               284   
 98       Eswatini                               234   
 99        Bermuda                               220   
 
     Total market cap (% of GDP)[3] Number of domestic companies listed[4]  \
 0                            194.5                                   4266   
 1                             83.0                                   4154   
 2                            122.2                                   3754   
 3            

In [7]:
tbl[0].head()

Unnamed: 0,Country,Total market cap (in mil. US$)[2],Total market cap (% of GDP)[3],Number of domestic companies listed[4],Year
0,United States,44719661,194.5,4266,2020
1,China,13214311,83.0,4154,2020
2,Japan,6718220,122.2,3754,2020
3,Hong Kong,6130420,1768.8,2353,2020
4,India,"3,612,985[5]",103.0,5270,2023


## 기업 공시 채널에서 오늘의 공시 불러오기



In [10]:
# POST 방식으로 쿼리를 요청하는 방법

import requests as rq
from bs4 import BeautifulSoup
import pandas as pd  

url = 'https://kind.krx.co.kr/disclosure/todaydisclosure.do'
# 쿼리는 딕셔너리 형태로 입력하며, Form Data와 동일하게 입력해 준다.
payload = {
    'method' : 'searchTodayDisclosureSub',
    'currentPageSize' : '15',
    'pageIndex' : '1',
    'orderMode' : '0',
    'orderStat' : 'D',
    'forward' : 'todaydisclosure_sub',
    'close' : 'S',
    'todayFlag' : 'N',
    'setDate' : '2022-07-27'
}

# post() 함수를 통해 해당 URL에 원하는 쿼리를 요청
data = rq.post(url, data=payload)
html = BeautifulSoup(data.content, 'html.parser')

print(html)


<section class="scrarea type-00">
<table class="list type-00 mt10" summary="시간, 회사명, 공시제목, 제출인, 차트/주가">
<caption>목록</caption>
<colgroup>
<col width="9%"/>
<col width="22%"/>
<col width="*"/>
<col width="16%"/>
<col width="9%"/>
</colgroup>
<thead>
<tr class="first active" id="title-contents">
</tr>
</thead>
<tbody>
<tr class="first" id="parkman">
<td class="first txc">18:40</td>
<td><img alt="코스닥" class="vmiddle legend" src="/images/common/icn_t_ko.gif"/> <a href="#companysum" id="companysum" onclick="companysummary_open('07364'); return false;" title="테라사이언스"> 테라사이언스</a> <img alt="불성실공시" class="vmiddle legend" src="/images/common/icn_t_bul.gif"> <img alt="투자주의환기종목" class="vmiddle legend" src="/images/common/icn_t_hwan.gif"/> </img></td>
<td><a href="#viewer" onclick="openDisclsViewer('20231120000639','')" title="추가상장(유상증자(제3자배정))">추가상장(유상증자(제3자배정))</a></td>
<td>코스닥시장본부</td>
<td class="txc">
<a class="btn ico chart-00" href="#" onclick="openDisclsChart('07364');return false;" title="공

In [11]:
# prettify() 함수를 이용해 BeautifulSoup에서 파싱한 파서 트리를 유니코드 형태로 돌려준다.
html_unicode = html.prettify()

# read_html() 함수를 통해 표를 읽어 온다.
tbl = pd.read_html(html.prettify())
tbl[0].head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,18:40,테라사이언스,추가상장(유상증자(제3자배정)),코스닥시장본부,공시차트 주가차트
1,18:35,제일바이오,주권매매거래정지기간변경(개선기간 부여),코스닥시장본부,공시차트 주가차트
2,18:31,제일바이오,기타시장안내(코스닥시장위원회 심의·의결 결과 및 개선기간 부여 안내),코스닥시장본부,공시차트 주가차트
3,18:31,엔케이맥스,추가상장(국내사모 CB전환),코스닥시장본부,공시차트 주가차트
4,18:30,라이트론,추가상장(국내사모 CB전환),코스닥시장본부,공시차트 주가차트
