### RDBMS (관계형 데이터베이스)
- sql언어를 사용
- 규격이 엄격하고, 다양한 규격이 있으면 매번 수정해주어야 한다..
- Mysql, Oracle, Postgresql, SQLIte ...

### NoSQL
- sql을 사용하지 않음.
- 고정된 스키마가 없다..
- 정해진 규격이 엄격하지 않다..
- Mongodb, redis, Hbase, cassandra...

### Mongodb
- json구조로 data(document)를 관리
- sql : database > table > data(row, column)
- mongodb : database > collection > document

In [3]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.0.1-cp39-cp39-win_amd64.whl (354 kB)
Installing collected packages: pymongo
Successfully installed pymongo-4.0.1


In [4]:
import pymongo

In [5]:
conn = pymongo.MongoClient()  # server connection

In [6]:
tdb = conn['testdb']  # db를 새로이 생성과 동시에 연결

In [7]:
col_it = tdb['it']  # collection을 새로 생성과 동시에 연결

In [9]:
post = {'author':'Mike', 'text':' my first blog post', 'tags':['mongodb','python','pymongo']}
col_it.insert_one(post)

<pymongo.results.InsertOneResult at 0x1dce65cb780>

In [11]:
results = col_it.find()
for r in results:
    print(r)

{'_id': ObjectId('62034ea5f6f29a47cb029df9'), 'author': 'Mike', 'text': ' my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}


In [12]:
col_it.insert_one({'author':'Dave Lee', 'age':45})

<pymongo.results.InsertOneResult at 0x1dce60a0ac0>

In [13]:
results = col_it.find()
for r in results:
    print(r)

{'_id': ObjectId('62034ea5f6f29a47cb029df9'), 'author': 'Mike', 'text': ' my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('62034f27f6f29a47cb029dfa'), 'author': 'Dave Lee', 'age': 45}


* insert_many()

In [14]:
col_it.insert_many(
    [
        {'author':'Dave Ahn', 'age':25},
        {'author':'Dave', 'age':35}
    ]
)

<pymongo.results.InsertManyResult at 0x1dce6688e80>

In [15]:
results = col_it.find()
for r in results:
    print(r)

{'_id': ObjectId('62034ea5f6f29a47cb029df9'), 'author': 'Mike', 'text': ' my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('62034f27f6f29a47cb029dfa'), 'author': 'Dave Lee', 'age': 45}
{'_id': ObjectId('62034f94f6f29a47cb029dfb'), 'author': 'Dave Ahn', 'age': 25}
{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 35}


* document insert하면서, _id(primary key)를 확인하는 법

In [16]:
post = {'author':'Dave', 'text':'my first blog post'}

post_id = col_it.insert_one(post)
post_id

<pymongo.results.InsertOneResult at 0x1dce6688b00>

In [17]:
post_id.inserted_id

ObjectId('62035018f6f29a47cb029dfd')

* document count

In [18]:
col_it.count_documents({})

5

In [21]:
# col_it.count()

* 입력 : {}, 리스트, 중첩 딕셔너리 

In [22]:
col_it.insert_one({'title':'암살', 'castings':['이정재','전지현','하정우']})

<pymongo.results.InsertOneResult at 0x1dce60f5900>

In [23]:
col_it.insert_one(
    {
        'title':'실미도',
        'castings':['설경구','안성기'],
        'datetime':
        {
            'year':'2003',
            'month': 3,
            'val':
            {
                'a':
                {
                    'b':1
                }
            }
        }
    }
)

<pymongo.results.InsertOneResult at 0x1dce674bfc0>

In [24]:
data = list()
data.append({'name':'aaron', 'age':20})
data.append({'name':'bob', 'age':30})
data.append({'name':'cathy', 'age':25})
data.append({'name':'david', 'age':27})
data.append({'name':'erick', 'age':28})
data.append({'name':'fox', 'age':32})
data.append({'name':'hmm'})

col_it.insert_many(data)

<pymongo.results.InsertManyResult at 0x1dce60ca3c0>

In [25]:
col_it.count_documents({})

14

### document 검색하기

* find_one( {key:value} )

In [26]:
col_it.find_one()

{'_id': ObjectId('62034ea5f6f29a47cb029df9'),
 'author': 'Mike',
 'text': ' my first blog post',
 'tags': ['mongodb', 'python', 'pymongo']}

In [28]:
results = col_it.find()
for r in results:
    print(r)

{'_id': ObjectId('62034ea5f6f29a47cb029df9'), 'author': 'Mike', 'text': ' my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('62034f27f6f29a47cb029dfa'), 'author': 'Dave Lee', 'age': 45}
{'_id': ObjectId('62034f94f6f29a47cb029dfb'), 'author': 'Dave Ahn', 'age': 25}
{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 35}
{'_id': ObjectId('62035018f6f29a47cb029dfd'), 'author': 'Dave', 'text': 'my first blog post'}
{'_id': ObjectId('620350fdf6f29a47cb029dfe'), 'title': '암살', 'castings': ['이정재', '전지현', '하정우']}
{'_id': ObjectId('6203518ef6f29a47cb029dff'), 'title': '실미도', 'castings': ['설경구', '안성기'], 'datetime': {'year': '2003', 'month': 3, 'val': {'a': {'b': 1}}}}
{'_id': ObjectId('62035217f6f29a47cb029e00'), 'name': 'aaron', 'age': 20}
{'_id': ObjectId('62035217f6f29a47cb029e01'), 'name': 'bob', 'age': 30}
{'_id': ObjectId('62035217f6f29a47cb029e02'), 'name': 'cathy', 'age': 25}
{'_id': ObjectId('62035217f6f29a47cb029e03'), 'name': 'david', '

In [31]:
col_it.find_one({'author':'Dave'})

{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 35}

In [30]:
results = col_it.find({'author':'Dave'})

for r in results:
    print(r)

{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 35}
{'_id': ObjectId('62035018f6f29a47cb029dfd'), 'author': 'Dave', 'text': 'my first blog post'}


In [32]:
col_it.count_documents({'author':'Dave'})

2

In [33]:
for r in col_it.find().sort('age'):
    print(r)

{'_id': ObjectId('62034ea5f6f29a47cb029df9'), 'author': 'Mike', 'text': ' my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('62035018f6f29a47cb029dfd'), 'author': 'Dave', 'text': 'my first blog post'}
{'_id': ObjectId('620350fdf6f29a47cb029dfe'), 'title': '암살', 'castings': ['이정재', '전지현', '하정우']}
{'_id': ObjectId('6203518ef6f29a47cb029dff'), 'title': '실미도', 'castings': ['설경구', '안성기'], 'datetime': {'year': '2003', 'month': 3, 'val': {'a': {'b': 1}}}}
{'_id': ObjectId('62035217f6f29a47cb029e06'), 'name': 'hmm'}
{'_id': ObjectId('62035217f6f29a47cb029e00'), 'name': 'aaron', 'age': 20}
{'_id': ObjectId('62034f94f6f29a47cb029dfb'), 'author': 'Dave Ahn', 'age': 25}
{'_id': ObjectId('62035217f6f29a47cb029e02'), 'name': 'cathy', 'age': 25}
{'_id': ObjectId('62035217f6f29a47cb029e03'), 'name': 'david', 'age': 27}
{'_id': ObjectId('62035217f6f29a47cb029e04'), 'name': 'erick', 'age': 28}
{'_id': ObjectId('62035217f6f29a47cb029e01'), 'name': 'bob', 'age': 30}
{'_id': O

##### document update : update_one(), update_many()

In [34]:
col_it.find_one({'author':'Dave'})

{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 35}

In [35]:
col_it.update_one({"author":'Dave'}, 
                 {"$set": {'text':'Hi Dave'}})

<pymongo.results.UpdateResult at 0x1dce7722a00>

In [37]:
for d in col_it.find({'author':'Dave'}):
    print(d)

{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 35, 'text': 'Hi Dave'}
{'_id': ObjectId('62035018f6f29a47cb029dfd'), 'author': 'Dave', 'text': 'my first blog post'}


In [38]:
col_it.update_one({"author":'Dave'}, 
                 {"$set": {'age':40}})

<pymongo.results.UpdateResult at 0x1dce6683840>

In [39]:
for d in col_it.find({'author':'Dave'}):
    print(d)

{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 40, 'text': 'Hi Dave'}
{'_id': ObjectId('62035018f6f29a47cb029dfd'), 'author': 'Dave', 'text': 'my first blog post'}


In [40]:
col_it.update_many({'author':'Dave'},
                  {"$set": {'text':'hi dave'}})

<pymongo.results.UpdateResult at 0x1dce66837c0>

In [41]:
for d in col_it.find({'author':'Dave'}):
    print(d)

{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 40, 'text': 'hi dave'}
{'_id': ObjectId('62035018f6f29a47cb029dfd'), 'author': 'Dave', 'text': 'hi dave'}


##### document delete : delete_one(), delete_many()

In [44]:
for d in col_it.find({'author':'Dave Lee'}):
    print(d)

{'_id': ObjectId('62034f27f6f29a47cb029dfa'), 'author': 'Dave Lee', 'age': 45}


In [45]:
col_it.delete_one({'author':'Dave Lee'})

<pymongo.results.DeleteResult at 0x1dce60bf480>

In [46]:
for d in col_it.find({'author':'Dave Lee'}):
    print(d)

In [47]:
for d in col_it.find({'author':'Dave'}):
    print(d)

{'_id': ObjectId('62034f94f6f29a47cb029dfc'), 'author': 'Dave', 'age': 40, 'text': 'hi dave'}
{'_id': ObjectId('62035018f6f29a47cb029dfd'), 'author': 'Dave', 'text': 'hi dave'}


In [48]:
col_it.delete_many({'author':'Dave'})

<pymongo.results.DeleteResult at 0x1dce66ab040>

In [49]:
for d in col_it.find({'author':'Dave'}):
    print(d)

In [51]:
boos = conn.books # 새로운 database

In [52]:
it_book = boos.it_books  # 새로운 collections

In [50]:
data = list()
for index in range(100):
    data.append({'author':'Dave Lee', 'publisher':'bit_company', 'number': index})

In [53]:
data

[{'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 0},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 1},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 2},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 3},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 4},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 5},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 6},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 7},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 8},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 9},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 10},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 11},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 12},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 13},
 {'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 14},
 {'au

In [54]:
it_book.insert_many(data)

<pymongo.results.InsertManyResult at 0x1dce7729140>

In [55]:
docs = it_book.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('620356e8f6f29a47cb029e07'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 0}
{'_id': ObjectId('620356e8f6f29a47cb029e08'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 1}
{'_id': ObjectId('620356e8f6f29a47cb029e09'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 2}
{'_id': ObjectId('620356e8f6f29a47cb029e0a'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 3}
{'_id': ObjectId('620356e8f6f29a47cb029e0b'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 4}
{'_id': ObjectId('620356e8f6f29a47cb029e0c'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 5}
{'_id': ObjectId('620356e8f6f29a47cb029e0d'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 6}
{'_id': ObjectId('620356e8f6f29a47cb029e0e'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 7}
{'_id': ObjectId('620356e8f6f29a47cb029e0f'), 'author': 'Dave Lee', 'publisher': 'bit_company', 'number': 8}
{'_id': ObjectId('6

In [56]:
it_book.update_many({}, {'$set': {'publisher':'bit_camp_pub'}})

<pymongo.results.UpdateResult at 0x1dce614dc80>

In [57]:
docs = it_book.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('620356e8f6f29a47cb029e07'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 0}
{'_id': ObjectId('620356e8f6f29a47cb029e08'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 1}
{'_id': ObjectId('620356e8f6f29a47cb029e09'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 2}
{'_id': ObjectId('620356e8f6f29a47cb029e0a'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 3}
{'_id': ObjectId('620356e8f6f29a47cb029e0b'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 4}
{'_id': ObjectId('620356e8f6f29a47cb029e0c'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 5}
{'_id': ObjectId('620356e8f6f29a47cb029e0d'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 6}
{'_id': ObjectId('620356e8f6f29a47cb029e0e'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 7}
{'_id': ObjectId('620356e8f6f29a47cb029e0f'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 8}
{'_id': Ob

In [58]:
it_book.delete_many({'number': {'$gte': 6}})

<pymongo.results.DeleteResult at 0x1dce669f600>

In [59]:
docs = it_book.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('620356e8f6f29a47cb029e07'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 0}
{'_id': ObjectId('620356e8f6f29a47cb029e08'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 1}
{'_id': ObjectId('620356e8f6f29a47cb029e09'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 2}
{'_id': ObjectId('620356e8f6f29a47cb029e0a'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 3}
{'_id': ObjectId('620356e8f6f29a47cb029e0b'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 4}
{'_id': ObjectId('620356e8f6f29a47cb029e0c'), 'author': 'Dave Lee', 'publisher': 'bit_camp_pub', 'number': 5}


In [None]:
# crawling - cine21

In [62]:
import requests
from bs4 import BeautifulSoup

In [63]:
url = 'http://www.cine21.com/rank/person'

In [64]:
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
soup

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="1641311652800771" property="fb:pages"/>
<meta content="vTM0gmeRzJwn1MIM1LMSp3cxP_SaBzch1ziRY255RHw" name="google-site-verification"/>
<meta content="5yOe6b_e_3rr7vNDwgXJw_8wLZQGx4lJ_V48KNPrqkA" name="google-site-verification"/>
<meta content="20defde86fc4464f2693891567a98905bd0a60d1" name="naver-site-verification"/>
<meta content="dmds9ks357rhqvdnk" name="dailymotion-domain-verification"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<title>씨네21</title>
<link href="/inc/www/css/default1.css" media="all" rel="stylesheet" type="text/css"/>
<link href="/inc/www/css/content1.css" media="all" rel="stylesheet" type="text/css"/>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js"></script>
<meta conten

In [67]:
soup.select('div#rank_holder')

[<div id="rank_holder"></div>]

##### post crawling

In [90]:
import re

In [73]:
url ='http://www.cine21.com/rank/person/content'

In [74]:
month = '2022-01'

data = {'section': 'actor',
'period_start': month,
'gender': 'all',
'page': 1}

In [112]:
res = requests.post(url, data = data)
main_soup = BeautifulSoup(res.text, 'html.parser')
main_soup

 <ul class="people_list">
<li class="people_li">
<a href="/db/person/info/?person_id=78487"><img alt="" class="people_thumb" src="https://image.cine21.com/resize/cine21/still/2017/1207/15_06_46__5a28da76c2e01[X145,145].jpg" target="_blank"/></a>
<div class="name"><a href="/db/person/info/?person_id=78487">강하늘(2편)</a></div>
<ul class="num_info">
<li><span class="tit">흥행지수</span><strong>80,090</strong></li>
<!--
						<li><a href="#" class="btn_graph"><span class="ico"></span><span>흥행성적<br />그래프로 보기</span></a></li>
						-->
</ul>
<!-- 영화포스터는 최대 5개까지만 -->
<ul class="mov_list">
<li>
<a href="/movie/info/?movie_id=56540">
<img alt="" class="thumb" src="https://image.cine21.com/resize/cine21/poster/2022/0127/56540_61f1fcfdd84ce[X85,120].jpg" target="_blank"/>
<span>해적: 도깨비 깃발</span>
</a>
</li>
<li>
<a href="/movie/info/?movie_id=57948">
<img alt="" class="thumb" src="https://image.cine21.com/resize/cine21/poster/2021/1213/11_07_08__61b6aacc130e8[X85,120].jpg" target="_blank"/>
<span>해피 뉴 이어

In [80]:
tags = soup.select('li.people_li > div.name')
len(tags)

7

In [93]:
main_url = 'http://www.cine21.com'

for t in tags:
    print(main_url + t.select('a')[0]['href'])
    print(re.sub("\(\w+\)", "", t.text))

http://www.cine21.com/db/person/info/?person_id=78487
강하늘
http://www.cine21.com/db/person/info/?person_id=56311
한효주
http://www.cine21.com/db/person/info/?person_id=71308
이광수
http://www.cine21.com/db/person/info/?person_id=15225
권상우
http://www.cine21.com/db/person/info/?person_id=60358
조진웅
http://www.cine21.com/db/person/info/?person_id=20772
박희순
http://www.cine21.com/db/person/info/?person_id=95811
채수빈


In [94]:
actor_url = 'http://www.cine21.com/db/person/info/?person_id=78487'

In [96]:
res = requests.get(actor_url)
soup = BeautifulSoup(res.text, 'html.parser')
actor_datas = soup.select('ul.default_info')
actor_datas

[<ul class="default_info">
 <li><span class="tit">다른 이름</span>김하늘</li>
 <li><span class="tit">직업</span>배우</li>
 <li><span class="tit">생년월일</span>1990-02-21</li>
 <li><span class="tit">성별</span>남</li>
 <li><span class="tit">홈페이지</span>
 <a href="http://weibo.com/galpos3?is_hot=1" target="_blank">http://weibo.com/galpos3?is_hot=1</a><br/>
 </li>
 <li><span class="tit">신장/체중</span>181cm, 70kg</li>
 <li><span class="tit">학교</span>중앙대학교 연극학과</li>
 </ul>]

In [98]:
actor_datas[0].select('li')
    

[<li><span class="tit">다른 이름</span>김하늘</li>,
 <li><span class="tit">직업</span>배우</li>,
 <li><span class="tit">생년월일</span>1990-02-21</li>,
 <li><span class="tit">성별</span>남</li>,
 <li><span class="tit">홈페이지</span>
 <a href="http://weibo.com/galpos3?is_hot=1" target="_blank">http://weibo.com/galpos3?is_hot=1</a><br/>
 </li>,
 <li><span class="tit">신장/체중</span>181cm, 70kg</li>,
 <li><span class="tit">학교</span>중앙대학교 연극학과</li>]

In [108]:
actor_info_dict = dict()

for li in actor_datas[0].select('li'):
    
    key = li.select_one('span.tit').text
    
    li = re.sub("<span.*?>.*?</span>", '', str(li))
    value = re.sub("<.+?>", "", li)   
    
    actor_info_dict[key] = value.strip()
    
actor_info_dict

{'다른 이름': '김하늘',
 '직업': '배우',
 '생년월일': '1990-02-21',
 '성별': '남',
 '홈페이지': 'http://weibo.com/galpos3?is_hot=1',
 '신장/체중': '181cm, 70kg',
 '학교': '중앙대학교 연극학과'}

In [110]:
##### 흥행지수 뽑기

In [116]:
for s in main_soup.select('li.people_li ul.num_info strong'):
    print(int(s.text.replace(',','')))

80090
68142
60206
48673
47173
41931
38939


In [None]:
##### 영화 리스트 뽑기

In [123]:
movie_list = []
for s in main_soup.select('li.people_li ul.mov_list'):
    actor_movie = []
    for l in s.select('span'):
        actor_movie.append(l.text.strip())
    movie_list.append(actor_movie)

In [124]:
movie_list

[['해적: 도깨비 깃발', '해피 뉴 이어'],
 ['해적: 도깨비 깃발'],
 ['해적: 도깨비 깃발', '해피 뉴 이어'],
 ['해적: 도깨비 깃발'],
 ['경관의 피', '1984 최동원'],
 ['경관의 피'],
 ['해적: 도깨비 깃발']]

In [125]:
from bs4 import BeautifulSoup
import requests
import pymongo
import re

conn = pymongo.MongoClient()
actor_db = conn.cine21
actor_collection = actor_db.actor_collection

actors_info_list = list()

cine21_url = 'http://www.cine21.com/rank/person/content'

post_data = dict()

post_data['section'] = 'actor'
post_data['period_start'] = '2022-01'
post_data['gender'] = 'all'

for index in range(1, 21):
    post_data['page'] = index

    res = requests.post(cine21_url, data=post_data)
    soup = BeautifulSoup(res.content, 'html.parser')

    actors = soup.select('li.people_li div.name')
    hits = soup.select('ul.num_info > li > strong')
    movies = soup.select('ul.mov_list')
    rankings = soup.select('li.people_li > span.grade')
    
    for index, actor in enumerate(actors):
        
        actor_name = re.sub('\(\w*\)', '', actor.text)
        actor_hits = int(hits[index].text.replace(',', ''))
        movie_titles = movies[index].select('li a span')
        movie_title_list = list()
        for movie_title in movie_titles:
            movie_title_list.append(movie_title.text)
            
        actor_info_dict = dict()
        actor_info_dict['배우이름'] = actor_name
        actor_info_dict['흥행지수'] = actor_hits
        actor_info_dict['출연영화'] = movie_title_list
        actor_info_dict['랭킹'] = rankings[index].text

        actor_link = 'http://www.cine21.com' + actor.select_one('a').attrs['href']
        response_actor = requests.get(actor_link)
        soup_actor = BeautifulSoup(response_actor.content, 'html.parser')
        
        default_info = soup_actor.select_one('ul.default_info')
        actor_details = default_info.select('li')

        for actor_item in actor_details:
            actor_item_field = actor_item.select_one('span.tit').text
            actor_item_value = re.sub('<span.*?>.*?</span>', '', str(actor_item))
            actor_item_value = re.sub('<.*?>', '', actor_item_value)
            actor_info_dict[actor_item_field] = actor_item_value
            
        actors_info_list.append(actor_info_dict)

In [126]:
actors_info_list

[{'배우이름': '강하늘',
  '흥행지수': 80090,
  '출연영화': ['해적: 도깨비 깃발', '해피 뉴 이어'],
  '랭킹': '1',
  '다른 이름': '김하늘',
  '직업': '배우',
  '생년월일': '1990-02-21',
  '성별': '남',
  '홈페이지': '\nhttp://weibo.com/galpos3?is_hot=1\n',
  '신장/체중': '181cm, 70kg',
  '학교': '중앙대학교 연극학과'},
 {'배우이름': '한효주',
  '흥행지수': 68142,
  '출연영화': ['해적: 도깨비 깃발'],
  '랭킹': '2',
  '직업': '배우',
  '생년월일': '1987-02-22',
  '성별': '여',
  '홈페이지': '\nhttps://www.facebook.com/hhj.official\n',
  '신장/체중': '170cm',
  '학교': '동국대학교 연극영화',
  '취미': '영화감상'},
 {'배우이름': '이광수',
  '흥행지수': 60206,
  '출연영화': ['해적: 도깨비 깃발', '해피 뉴 이어'],
  '랭킹': '3',
  '직업': '배우',
  '생년월일': '1985-07-14',
  '성별': '남',
  '홈페이지': '\nhttps://twitter.com/masijacoke85\nhttps://www.instagram.com/masijacoke850714/\n',
  '신장/체중': '190cm',
  '소속사': '킹콩엔터테인먼트'},
 {'배우이름': '권상우',
  '흥행지수': 48673,
  '출연영화': ['해적: 도깨비 깃발'],
  '랭킹': '4',
  '다른 이름': 'Kwon Sang Woo',
  '직업': '배우',
  '생년월일': '1976-08-05',
  '성별': '남',
  '신장/체중': '183cm, 72kg',
  '학교': '한남대학교 미술교육학 학사',
  '취미': '수영, 헬스, 복싱',
  '특기': '농구

In [127]:
actor_collection.insert_many(actors_info_list)

<pymongo.results.InsertManyResult at 0x1dce81717c0>

In [129]:
results = actor_collection.find()
for r in results:
    print(r)

{'_id': ObjectId('62036a2af6f29a47cb029e6c'), '배우이름': '강하늘', '흥행지수': 80090, '출연영화': ['해적: 도깨비 깃발', '해피 뉴 이어'], '랭킹': '1', '다른 이름': '김하늘', '직업': '배우', '생년월일': '1990-02-21', '성별': '남', '홈페이지': '\nhttp://weibo.com/galpos3?is_hot=1\n', '신장/체중': '181cm, 70kg', '학교': '중앙대학교 연극학과'}
{'_id': ObjectId('62036a2af6f29a47cb029e6d'), '배우이름': '한효주', '흥행지수': 68142, '출연영화': ['해적: 도깨비 깃발'], '랭킹': '2', '직업': '배우', '생년월일': '1987-02-22', '성별': '여', '홈페이지': '\nhttps://www.facebook.com/hhj.official\n', '신장/체중': '170cm', '학교': '동국대학교 연극영화', '취미': '영화감상'}
{'_id': ObjectId('62036a2af6f29a47cb029e6e'), '배우이름': '이광수', '흥행지수': 60206, '출연영화': ['해적: 도깨비 깃발', '해피 뉴 이어'], '랭킹': '3', '직업': '배우', '생년월일': '1985-07-14', '성별': '남', '홈페이지': '\nhttps://twitter.com/masijacoke85\nhttps://www.instagram.com/masijacoke850714/\n', '신장/체중': '190cm', '소속사': '킹콩엔터테인먼트'}
{'_id': ObjectId('62036a2af6f29a47cb029e6f'), '배우이름': '권상우', '흥행지수': 48673, '출연영화': ['해적: 도깨비 깃발'], '랭킹': '4', '다른 이름': 'Kwon Sang Woo', '직업': '배우', '생년월일': '1976-08-

In [131]:
import pandas as pd
pd.DataFrame(actors_info_list)

Unnamed: 0,배우이름,흥행지수,출연영화,랭킹,다른 이름,직업,생년월일,성별,홈페이지,신장/체중,학교,_id,취미,소속사,특기,원어명,사망
0,강하늘,80090,"[해적: 도깨비 깃발, 해피 뉴 이어]",1,김하늘,배우,1990-02-21,남,\nhttp://weibo.com/galpos3?is_hot=1\n,"181cm, 70kg",중앙대학교 연극학과,62036a2af6f29a47cb029e6c,,,,,
1,한효주,68142,[해적: 도깨비 깃발],2,,배우,1987-02-22,여,\nhttps://www.facebook.com/hhj.official\n,170cm,동국대학교 연극영화,62036a2af6f29a47cb029e6d,영화감상,,,,
2,이광수,60206,"[해적: 도깨비 깃발, 해피 뉴 이어]",3,,배우,1985-07-14,남,\nhttps://twitter.com/masijacoke85\nhttps://ww...,190cm,,62036a2af6f29a47cb029e6e,,킹콩엔터테인먼트,,,
3,권상우,48673,[해적: 도깨비 깃발],4,Kwon Sang Woo,배우,1976-08-05,남,,"183cm, 72kg",한남대학교 미술교육학 학사,62036a2af6f29a47cb029e6f,"수영, 헬스, 복싱",벨액터스 엔터테인먼트,농구,,
4,조진웅,47173,"[경관의 피, 1984 최동원]",5,조원준,배우,1976-03-03,남,\nhttp://www.facebook.com/saram.chojinwoong\n,"185cm, 98kg",경성대학교 연극영화,62036a2af6f29a47cb029e70,,㈜사람엔터테인먼트,진도 북춤,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135,이민혁,35,[MONSTA X : THE DREAMING(몬스타엑스 : 더 드리밍)],136,몬스타엑스; MONSTA X,가수,1993-11-03,남,\nhttp://www.vlive.tv/channels/FE123\n,,,62036a2af6f29a47cb029ef3,,,,,
136,서준영,35,[동백],137,김상구,배우,1987-04-24,남,\nhttps://twitter.com/iamjjun0\nhttps://instag...,,,62036a2af6f29a47cb029ef4,,,,,
137,최정기,34,[존경하고 사랑하는 국민 여러분],138,,,,남,,,,62036a2af6f29a47cb029ef5,,,,,
138,고수경,33,[사막을 건너 호수를 지나],139,레츠피스,,,여,,,,62036a2af6f29a47cb029ef6,,,,,
