### RDBMS(관계형 데이터베이스)
- sql 언어를 사용
- 규격이 엄격하고, 다양한 규격이 있으면 매번 수정해 주어야 한다
- Mysql, Oracle, Postgresql, SQLIte...

### NoSQL
- sql을 사용하지 않음
- 고정된 스키마가 없다
- 정해진 규격이 엄격하지 않음 

### Mongodb
- json구조로 data(document)를 관리 
- sql : database > table > data(row, column)
- mongodb : database > collection > document

In [2]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.0.1-cp39-cp39-win_amd64.whl (354 kB)
Installing collected packages: pymongo
Successfully installed pymongo-4.0.1


In [3]:
import pymongo

In [5]:
conn = pymongo.MongoClient()

In [6]:
tdb = conn['testdb']

In [7]:
col_it = tdb['it']

In [10]:
post = {'author':'Mike', 'text': 'my first blog post', 'tags':['mongodb','python','pymongo']}
col_it.insert_one(post)

<pymongo.results.InsertOneResult at 0x24269929100>

In [11]:
results = col_it.find()
for r in results:
    print(r)

{'_id': ObjectId('62034ee29554eee55a529394'), 'author': 'Mike', 'text': 'my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}


In [12]:
col_it.insert_one({'author':'dave lee','age':45})

<pymongo.results.InsertOneResult at 0x24269decc40>

In [13]:
results = col_it.find()
for r in results:
    print(r)

{'_id': ObjectId('62034ee29554eee55a529394'), 'author': 'Mike', 'text': 'my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('62034f1f9554eee55a529395'), 'author': 'dave lee', 'age': 45}


* insert_many()

In [14]:
col_it.insert_many(
    [
        {'author':'dave ahn','age':25},
        {'author':'dave', 'age':35}
    ]
)

<pymongo.results.InsertManyResult at 0x24269e9fa80>

In [15]:
results = col_it.find()
for r in results:
    print(r)

{'_id': ObjectId('62034ee29554eee55a529394'), 'author': 'Mike', 'text': 'my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('62034f1f9554eee55a529395'), 'author': 'dave lee', 'age': 45}
{'_id': ObjectId('62034f909554eee55a529396'), 'author': 'dave ahn', 'age': 25}
{'_id': ObjectId('62034f909554eee55a529397'), 'author': 'dave', 'age': 35}


* document insert하면서 id(primary key)를 확인하는 법

post = {'author': 'dave', 'text':'my first blog post'}

post_id = col_it.insert_one(post)
post_id

In [18]:
post_id.inserted_id

ObjectId('620350339554eee55a529399')

* document count

In [19]:
col_it.count_documents({})

6

* 입력 : {}, 리스트, 중첩 댁셔너리

In [20]:
col_it.insert_one({'title':'암살','castings':['이정재','전지현','하정우']})

<pymongo.results.InsertOneResult at 0x24269774b40>

In [21]:
col_it.insert_one(
    {
        'title':'실미도',
        'castings':['설경구','안성기'],
        'datetime':
        {
            'year':'2003',
            'month':3,
            'val':
            {
                'a':
                {
                    "b":1
                }
            }
        }
    }
)

<pymongo.results.InsertOneResult at 0x24269ebac80>

In [24]:
data = list()
data.append({'name':'aaron','age':20})
data.append({'name':'bob','age':30})
data.append({'name':'cathy','age':25})
data.append({'name':'david','age':27})
data.append({'name':'erick','age':28})
data.append({'name':'fox','age':32})
data.append({'name':'hmm'})

col_it.insert_many(data)

<pymongo.results.InsertManyResult at 0x24269e84240>

In [25]:
col_it.count_documents({})

15

### document 검색하기

* find_one()

In [28]:
col_it.find_one()   # 제일 앞에 있는 거 

{'_id': ObjectId('62034ee29554eee55a529394'),
 'author': 'Mike',
 'text': 'my first blog post',
 'tags': ['mongodb', 'python', 'pymongo']}

In [29]:
col_it.find()    # 전체 

<pymongo.cursor.Cursor at 0x24269bd00d0>

In [30]:
results = col_it.find_one({'author':'dave'})

for r in results:
    print(r)

{'_id': ObjectId('62034f909554eee55a529397'), 'author': 'dave', 'age': 35}

In [31]:
col_it.count_documents({'author':'dave'})

3

In [32]:
for r in col_it.find().sort('age'):
    print(r)

{'_id': ObjectId('62034ee29554eee55a529394'), 'author': 'Mike', 'text': 'my first blog post', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('620350119554eee55a529398'), 'author': 'dave', 'text': 'my first blog post'}
{'_id': ObjectId('620350339554eee55a529399'), 'author': 'dave', 'text': 'my first blog post'}
{'_id': ObjectId('6203510a9554eee55a52939a'), 'title': '암살', 'castings': ['이정재', '전지현', '하정우']}
{'_id': ObjectId('6203517d9554eee55a52939b'), 'title': '실미도', 'castings': ['설경구', '안성기'], 'datetime': {'year': '2003', 'month': 3, 'val': {'a': {'b': 1}}}}
{'_id': ObjectId('620352109554eee55a5293a2'), 'name': 'hmm'}
{'_id': ObjectId('620352109554eee55a52939c'), 'name': 'aaron', 'age': 20}
{'_id': ObjectId('62034f909554eee55a529396'), 'author': 'dave ahn', 'age': 25}
{'_id': ObjectId('620352109554eee55a52939e'), 'name': 'cathy', 'age': 25}
{'_id': ObjectId('620352109554eee55a52939f'), 'name': 'david', 'age': 27}
{'_id': ObjectId('620352109554eee55a5293a0'), 'name': 'erick',

### document update : update_one(), update_many()

In [33]:
col_it.find_one({'author':'dave'})

{'_id': ObjectId('62034f909554eee55a529397'), 'author': 'dave', 'age': 35}

In [34]:
col_it.update_one({'author':'dave'},
                 {"$set":{"text":'hi dave'}})

<pymongo.results.UpdateResult at 0x24269ec0480>

In [36]:
for d in col_it.find_one({'author':'dave'}):
    print(d)

_id
author
age
text


In [37]:
col_it.update_one({'author':'dave'},
                 {'$set':{'age':40}})

<pymongo.results.UpdateResult at 0x24269f15800>

In [38]:
for d in col_it.find_one({'author':'dave'}):
    print(d)

_id
author
age
text


In [39]:
col_it.update_many({'author':'dave'},
                  {'$set': {'text':'hi dave'}})

<pymongo.results.UpdateResult at 0x24269e8d4c0>

In [40]:
for d in col_it.find_one({'author':'dave'}):
    print(d)

_id
author
age
text


In [41]:
for d in col_it.find_one({'author':'dave lee'}):
    print(d)

_id
author
age


#### document delete : delete_one(), delete_many()

In [43]:
col_it.delete_many({'author':'dave'})

<pymongo.results.DeleteResult at 0x24267775f00>

In [46]:
data = list()
for index in range(100):
    data.append({'author':'dave lee','publisher':'bit_company','number':index})

In [47]:
boos = conn.books

In [48]:
it_book = boos.it_books

In [49]:
data

[{'author': 'dave lee', 'publisher': 'bit_company', 'number': 0},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 1},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 2},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 3},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 4},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 5},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 6},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 7},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 8},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 9},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 10},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 11},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 12},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 13},
 {'author': 'dave lee', 'publisher': 'bit_company', 'number': 14},
 {'au

In [51]:
it_book.insert_many(data)

<pymongo.results.InsertManyResult at 0x2426a067980>

In [52]:
docs = it_book.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('620356df9554eee55a5293a3'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 0}
{'_id': ObjectId('620356df9554eee55a5293a4'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 1}
{'_id': ObjectId('620356df9554eee55a5293a5'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 2}
{'_id': ObjectId('620356df9554eee55a5293a6'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 3}
{'_id': ObjectId('620356df9554eee55a5293a7'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 4}
{'_id': ObjectId('620356df9554eee55a5293a8'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 5}
{'_id': ObjectId('620356df9554eee55a5293a9'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 6}
{'_id': ObjectId('620356df9554eee55a5293aa'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 7}
{'_id': ObjectId('620356df9554eee55a5293ab'), 'author': 'dave lee', 'publisher': 'bit_company', 'number': 8}
{'_id': ObjectId('6

In [53]:
it_book.update_many({},{'$set': {"publisher":'bit_camp_pub'}})

<pymongo.results.UpdateResult at 0x2426a070800>

In [55]:
it_book.delete_many({'number':{'$gte':6}})

<pymongo.results.DeleteResult at 0x2426a06c880>

In [56]:
docs = it_book.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('620356df9554eee55a5293a3'), 'author': 'dave lee', 'publisher': 'bit_camp_pub', 'number': 0}
{'_id': ObjectId('620356df9554eee55a5293a4'), 'author': 'dave lee', 'publisher': 'bit_camp_pub', 'number': 1}
{'_id': ObjectId('620356df9554eee55a5293a5'), 'author': 'dave lee', 'publisher': 'bit_camp_pub', 'number': 2}
{'_id': ObjectId('620356df9554eee55a5293a6'), 'author': 'dave lee', 'publisher': 'bit_camp_pub', 'number': 3}
{'_id': ObjectId('620356df9554eee55a5293a7'), 'author': 'dave lee', 'publisher': 'bit_camp_pub', 'number': 4}
{'_id': ObjectId('620356df9554eee55a5293a8'), 'author': 'dave lee', 'publisher': 'bit_camp_pub', 'number': 5}


In [None]:
# crawling cine 21

In [62]:
import requests
from bs4 import BeautifulSoup

In [57]:
url = 'http://www.cine21.com/rank/person'

In [63]:
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
soup

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="1641311652800771" property="fb:pages"/>
<meta content="vTM0gmeRzJwn1MIM1LMSp3cxP_SaBzch1ziRY255RHw" name="google-site-verification"/>
<meta content="5yOe6b_e_3rr7vNDwgXJw_8wLZQGx4lJ_V48KNPrqkA" name="google-site-verification"/>
<meta content="20defde86fc4464f2693891567a98905bd0a60d1" name="naver-site-verification"/>
<meta content="dmds9ks357rhqvdnk" name="dailymotion-domain-verification"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<title>씨네21</title>
<link href="/inc/www/css/default1.css" media="all" rel="stylesheet" type="text/css"/>
<link href="/inc/www/css/content1.css" media="all" rel="stylesheet" type="text/css"/>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js"></script>
<meta conten

In [64]:
soup.select('div#rank_holder')

[<div id="rank_holder"></div>]

In [74]:
url = 'http://www.cine21.com/rank/person'

In [75]:
month = '2022-01'

data = {'section':'actor',
    'period_start':month,
       'gender':'all',
       'page':1}

In [76]:
res = requests.post(url, data=data)
soup = BeautifulSoup(res.text, 'html.parser')
soup

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="1641311652800771" property="fb:pages"/>
<meta content="vTM0gmeRzJwn1MIM1LMSp3cxP_SaBzch1ziRY255RHw" name="google-site-verification"/>
<meta content="5yOe6b_e_3rr7vNDwgXJw_8wLZQGx4lJ_V48KNPrqkA" name="google-site-verification"/>
<meta content="20defde86fc4464f2693891567a98905bd0a60d1" name="naver-site-verification"/>
<meta content="dmds9ks357rhqvdnk" name="dailymotion-domain-verification"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<title>씨네21</title>
<link href="/inc/www/css/default1.css" media="all" rel="stylesheet" type="text/css"/>
<link href="/inc/www/css/content1.css" media="all" rel="stylesheet" type="text/css"/>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js"></script>
<meta conten

In [78]:
li_tags = soup.select('li.peeple_li')
len(li_tags)

0

In [83]:
tags = soup.select('li.peeple_li > div.name')
len(tags)

0

In [84]:
main_url = 'http://www.cine21.com'

for t in tags:
    print(t[0].select('a')[0]['href'])
    print(t.text)

In [None]:
actor_url = 

In [86]:

from bs4 import BeautifulSoup
import requests
import pymongo
import re

conn = pymongo.MongoClient()
actor_db = conn.cine21
actor_collection = actor_db.actor_collection

actors_info_list = list()

cine21_url = 'http://www.cine21.com/rank/person/content'
post_data = dict()
post_data['section'] = 'actor'
post_data['period_start'] = '2018-08'
post_data['gender'] = 'all'

for index in range(1, 21):
    post_data['page'] = index

    res = requests.post(cine21_url, data=post_data)
    soup = BeautifulSoup(res.content, 'html.parser')

    actors = soup.select('li.people_li div.name')
    hits = soup.select('ul.num_info > li > strong')
    movies = soup.select('ul.mov_list')
    rankings = soup.select('li.people_li > span.grade')
    
    for index, actor in enumerate(actors):
        actor_name = re.sub('\(\w*\)', '', actor.text)
        actor_hits = int(hits[index].text.replace(',', ''))
        movie_titles = movies[index].select('li a span')
        movie_title_list = list()
        for movie_title in movie_titles:
            movie_title_list.append(movie_title.text)
        actor_info_dict = dict()
        actor_info_dict['배우이름'] = actor_name
        actor_info_dict['흥행지수'] = actor_hits
        actor_info_dict['출연영화'] = movie_title_list
        actor_info_dict['랭킹'] = rankings[index].text

        actor_link = 'http://www.cine21.com' + actor.select_one('a').attrs['href']
        response_actor = requests.get(actor_link)
        soup_actor = BeautifulSoup(response_actor.content, 'html.parser')
        default_info = soup_actor.select_one('ul.default_info')
        actor_details = default_info.select('li')

        for actor_item in actor_details:
            actor_item_field = actor_item.select_one('span.tit').text
            actor_item_value = re.sub('<span.*?>.*?</span>', '', str(actor_item))
            actor_item_value = re.sub('<.*?>', '', actor_item_value)
            actor_info_dict[actor_item_field] = actor_item_value
        actors_info_list.append(actor_info_dict)
        
actor_collection.insert_many(actors_info_list)

<pymongo.results.InsertManyResult at 0x2426addea40>

In [89]:
actor_collection.insesrt_many(actors_info_list)

TypeError: 'Collection' object is not callable. If you meant to call the 'insesrt_many' method on a 'Collection' object it is failing because no such method exists.

In [88]:
import pandas as pd
pd.DataFrame(actors_info_list)

Unnamed: 0,배우이름,흥행지수,출연영화,랭킹,다른 이름,직업,생년월일,성별,홈페이지,신장/체중,학교,취미,_id,특기,소속사,원어명
0,하정우,21830,"[신과 함께-인과 연, 백두산, PMC: 더 벙커, 클로젯, 신과 함께-죄와 벌, ...",1,김성훈; 河正宇,배우,1978-03-11,남,\nhttps://www.facebook.com/ft.hajungwoo\n,"184cm, 75kg",중앙대학교 연극학 학사,"피아노, 검도, 수영",620366249554eee55a529409,,,
1,마동석,19552,"[나쁜 녀석들: 더 무비, 신과 함께-인과 연, 성난황소, 동네사람들, 원더풀 고스...",2,Ma Dongseok,배우,1971-03-01,남,\nhttps://www.instagram.com/madongseok_/\nhttp...,"178cm, 100kg",,,620366249554eee55a52940a,,,
2,이병헌,16450,"[백두산, 남산의 부장들, 내부자들, 그것만이 내 세상, 광해, 왕이 된 남자, 번...",3,Byung-hun Lee;BH Lee,배우,1970-07-12,남,\nhttp://www.leebyunghun.kr/\n,"177cm, 72kg",한양대학교 불어문학과,"모자수집, 여행",620366249554eee55a52940b,"태권도, 스노우보드, 수영, 팔씨름",,
3,황정민,15902,"[공작, 인질, 다만 악에서 구하소서, 신세계, 와이키키 브라더스, 부당거래]",4,,배우,1970-09-01,남,,"180cm, 75kg",서울예술대학 연극과 졸업,,620366249554eee55a52940c,"농구, 악기연주",예당엔터테인먼트,
4,이성민,15648,"[남산의 부장들, 목격자, 공작, 기적, 비스트, 마약왕]",5,,배우,1968-10-15,남,,178cm,,,620366249554eee55a52940d,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135,김혜수,1953,"[내가 죽던 날, 국가부도의 날, 타짜, 도둑들, 닥터봉]",136,,배우,1970-09-05,여,,"171cm, 49kg",동국대 연극영화 - 성균관대언론정보대학원 석사,"영화 감상, 사진집 모으기, 태권도, 수영, 테니스, 볼링",620366249554eee55a529490,태권도,,
136,남문철,1950,"[공작, 나랏말싸미, 유열의 음악앨범, 독전, 애비규환, 4등]",137,,배우,1971-03-20,남,,,,,620366249554eee55a529491,,,
137,손종학,1946,"[돈, 강철비2: 정상회담, 정직한 후보, 도희야, 검은 사제들, 비밀은 없다]",138,,배우,1967-06-20,남,,,,,620366249554eee55a529492,,,
138,최덕문,1911,"[나랏말싸미, 마약왕, 블랙머니, 소공녀, 암살, 애비규환]",139,,배우,1970-00-00,남,,,,,620366249554eee55a529493,,,
