# BeautifulSoup 라이브러리
- requests 모듈을 이용하여 HTTP 요청을 보내고 응답을 받을 수 있다.
- 하지만 불필요한 요소들 또한 받기 때문에, 원하는 요소만 받고 싶다.
- 그래서 HTML Parser를 사용할 필요가 있다.

In [1]:
# 설치
%pip install beautifulsoup4

Collecting beautifulsoup4
  Downloading beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4)
  Downloading soupsieve-2.5-py3-none-any.whl.metadata (4.7 kB)
Downloading beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.9/147.9 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading soupsieve-2.5-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.12.3 soupsieve-2.5
Note: you may need to restart the kernel to use updated packages.


# BeautifulSoup 객체 만들기

In [3]:
import requests

res = requests.get("https://www.example.com")
res

<Response [200]>

### 선언
- 분석을 할 문서 (str)과, 해당 문서가 어떤 언어로 이루어져있는 지(str) 적어야한다.
    - html의 경우, html.parser

In [8]:
# bs4 불러오기

from bs4 import BeautifulSoup

bs = BeautifulSoup(res.text, features="html.parser")

### 메서드 및 속성
print(bs.prettify())
- prettify()
    - 문서를 깔끔하게 정리해준다.
- find("태그")
    - 태그를 찾아준다.
    - 여러개일 경우, 맨 처음 하나를 반환한다.
- find_all("태그")
    - 해당 태그를 모두 찾아 리스트로 반환한다.
- title, head, body
    - 각각 문서의 title, head, body를 불러온다.

In [14]:
## head 불러오기
bs.title

<title>Example Domain</title>

In [15]:
## head 불러오기
bs.head

<head>
<title>Example Domain</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-type"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>
</head>

In [16]:

## body 불러오기
bs.body

<body>
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>

In [25]:
h1 = bs.find("h1")
print(h1)
print(h1.name)

<h1>Example Domain</h1>
h1


In [27]:
p = bs.find_all("p")
print(p)

[<p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>, <p><a href="https://www.iana.org/domains/example">More information...</a></p>]


In [29]:
pp = bs.find("p")
print(pp)

<p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
