## 정적크롤링 실습1. 명언 수집

- 명언을 제공하는 웹사이트에서 'life'관련 명언을 스크랩하여 엑셀 파일에 저장하기
- 대상 : http://quotes.toscrape.com/

#### 크롤링하는 웹사이트 살펴보기

- 사이트에서 life 태그 클릭한 후 명언의 (about) 클릭

- 명언들 페이지의 하단 부분의 [Next]버튼, [Previeous]버튼

- 살펴본 명언 웹사이트의 특징
    - 명언들이 태그별로 모여 있음
    - 한 페이지에 일정 수의 명언만 보이고 나머지는 다음(Next) 페이지를 통해 볼 수 있음

#### 웹사이트 구조 확인 : 개발자도구(F12)

#### 데이터 수집을 위한 절차

- 1단계 : 페이지 구조 확인 및 명언/저자가 있는 위치를 파악
- 2단계 : 웹페이지 불러오기(읽기) (url 요청-응답)
- 3단계 : 각 태그에 맞는 명언/저자 추출

**웹페이지 불러오기(읽기)**

In [2]:
# 모듈 임포트
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup

In [5]:
url = 'https://quotes.toscrape.com/tag/life/'
html = urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

**각 태그에 맞는 명언/저자 추출(파싱)**

In [8]:
# 명언과 저자가 있는 태그 : div.quote
quotes = soup.find_all('div', {'class':'quote'})
print(f'명언개수 : {len(quotes)}')

명언개수 : 10


In [9]:
quotes[0]

<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
            Tags:
            <meta class="keywords" content="inspirational,life,live,miracle,miracles" itemprop="keywords"/>
<a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
<a class="tag" href="/tag/life/page/1/">life</a>
<a class="tag" href="/tag/live/page/1/">live</a>
<a class="tag" href="/tag/miracle/page/1/">miracle</a>
<a class="tag" href="/tag/miracles/page/1/">miracles</a>
</div>
</div>

In [46]:
text = quotes[0].find('span',{'class':'text'}).text.strip('“').strip('”')
author = quotes[0].find('small',{'class':'author'}).text
print(f'저자: {author}\n명언: {text}')

저자: Albert Einstein
명언: There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.


In [31]:
# 명언리스트, 저자리스트로 따로 저장
text_list, author_list = [], []
for q in quotes:
    text_list.append(q.find('span',{'class':'text'}).text)
    author_list.append(q.find('small',{'class':'author'}).text)

In [50]:
# 명언내용과 저자를 2차원 리스트 형식으로 저장
quote_list = []
for q in quotes:
    txt = q.find('span',{'class':'text'}).text.strip('“').strip('”')
    author = q.find('small',{'class':'author'}).text
    quote_list.append([author, txt])
print(f'명언리스트:\n{quote_list}')

명언리스트:
[['Albert Einstein', 'There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.'], ['André Gide', 'It is better to be hated for what you are than to be loved for what you are not.'], ['Marilyn Monroe', "This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and

In [52]:
text_list = [q.find('span',{'class':'text'}).text.strip('“').strip('”') for q in quotes]
author_list = [q.find('small',{'class':'author'}).text for q in quotes]

print(f'저자리스트:\n{author_list} \n\n명언리스트:\n{text_list}')

저자리스트:
['Albert Einstein', 'André Gide', 'Marilyn Monroe', 'Douglas Adams', 'Mark Twain', 'Allen Saunders', 'Dr. Seuss', 'Albert Einstein', 'George Bernard Shaw', 'Ralph Waldo Emerson'] 

명언리스트:
['There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.', 'It is better to be hated for what you are than to be loved for what you are not.', "This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart

In [53]:
# 명언내용리스트와 저자리스트를 갖는 딕셔너리
quot_dic = {'저자': author_list, '명언':text_list}
quot_dic

{'저자': ['Albert Einstein',
  'André Gide',
  'Marilyn Monroe',
  'Douglas Adams',
  'Mark Twain',
  'Allen Saunders',
  'Dr. Seuss',
  'Albert Einstein',
  'George Bernard Shaw',
  'Ralph Waldo Emerson'],
 '명언': ['There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.',
  'It is better to be hated for what you are than to be loved for what you are not.',
  "This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are goin

#### 명언과 저자를 추출하여 데이터프레임으로 저장하기

In [35]:
# pandas 설치
!pip install pandas

Collecting pandas
  Downloading pandas-2.2.3-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.2.4-cp312-cp312-win_amd64.whl.metadata (60 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.3-cp312-cp312-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   ------------------------------ --------- 8.7/11.5 MB 44.9 MB/s eta 0:00:01
   ---------------------------------------- 11.5/11.5 MB 45.0 MB/s eta 0:00:00
Downloading numpy-2.2.4-cp312-cp312-win_amd64.whl (12.6 MB)
   ---------------------------------------- 0.0/12.6 MB ? eta -:--:--
   ---------------------------------------  12.6/12.6 MB 78.5 MB/s eta 0:00:01
   ---------------------------------------- 12.6/12.6 MB 60.7 MB/s eta 0:00:00
Downloading pytz-20

In [37]:
import pandas as pd

In [54]:
df = pd.DataFrame(data=quot_dic)
df

Unnamed: 0,저자,명언
0,Albert Einstein,There are only two ways to live your life. One...
1,André Gide,It is better to be hated for what you are than...
2,Marilyn Monroe,"This life is what you make it. No matter what,..."
3,Douglas Adams,"I may not have gone where I intended to go, bu..."
4,Mark Twain,"Good friends, good books, and a sleepy conscie..."
5,Allen Saunders,Life is what happens to us while we are making...
6,Dr. Seuss,"Today you are You, that is truer than true. Th..."
7,Albert Einstein,Life is like riding a bicycle. To keep your ba...
8,George Bernard Shaw,Life isn't about finding yourself. Life is abo...
9,Ralph Waldo Emerson,Finish each day and be done with it. You have ...


In [55]:
df = pd.DataFrame(data=quote_list, columns=['author', 'text'])
df

Unnamed: 0,author,text
0,Albert Einstein,There are only two ways to live your life. One...
1,André Gide,It is better to be hated for what you are than...
2,Marilyn Monroe,"This life is what you make it. No matter what,..."
3,Douglas Adams,"I may not have gone where I intended to go, bu..."
4,Mark Twain,"Good friends, good books, and a sleepy conscie..."
5,Allen Saunders,Life is what happens to us while we are making...
6,Dr. Seuss,"Today you are You, that is truer than true. Th..."
7,Albert Einstein,Life is like riding a bicycle. To keep your ba...
8,George Bernard Shaw,Life isn't about finding yourself. Life is abo...
9,Ralph Waldo Emerson,Finish each day and be done with it. You have ...


In [57]:
df.to_csv('crawl_data/명언 목록.csv')

----