In [1]:
import pandas as pd
import json
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm

# Crawling


## Data Collection with API

### Youtube

site: https://console.developers.google.com/ <br>
reference: https://developers.google.com/youtube/v3/docs?hl=ko <br>

```python
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from oauth2client.tools import argparser

import pandas as pd


DEVELOPER_KEY=''
YOUTUBE_API_SERVICE_NAME='youtube'
YOUTUBE_API_VERSION='v3'

youtube = build(
  YOUTUBE_API_SERVICE_NAME,
  YOUTUBE_API_VERSION,
  developerKey=DEVELOPER_KEY
  )
```

In [None]:
!pip install google-api-python-client
!pip install oauth2client

#### Videos

채널 내 비디오 조회 기본 코드
```python
response = (
  youtube
  .playlistItems()
  .list(
    playlistId=*channel_id*,
    part='snippet',
    maxResults=50
    )
  .execute()
  )
```


1번의 query에 전체 데이터를 다 받을 수 없음 <br>
따라서 pageToken을 이용하여 다음 페이지의 정보를 받아야 함 <br>
```python
if 'nextPageToken' in response:
  response = (
    youtube
    .playlistItems()
    .list(
      playlistId=*channel_id*,
      pageToken=response.get('nextPageToken'),
      part='snippet',
      maxResults=50
      )
    .execute()
    )
```

#### Comments
비디오 내 댓글 조회 기본 코드
```python
response = (
    youtube
    .commentThreads()
    .list(
      part='snippet,replies', 
      videoId=*video_id*, 
      maxResults=100
      )
    .execute()
    )
```

1번의 query에 전체 데이터를 다 받을 수 없음 <br>
따라서 pageToken을 이용하여 다음 페이지의 정보를 받아야 함 <br>
```python
if 'nextPageToken' in response:
  response = (
    youtube
    .commentThreads()
    .list(
      part='snippet,replies',
      videoId=video_id,
      pageToken=response.get('nextPageToken'),
      maxResults=100
      )
    .execute()
    )
```

### Dart

url에 key와 value값을 추가하여 호출

site: https://opendart.fss.or.kr/ <br>
reference: https://opendart.fss.or.kr/guide/main.do?apiGrpCd=DS001 <br>

#### 공시정보

```python
https://opendart.fss.or.kr/api/document.xml
```

#### 기업개황

```python
https://opendart.fss.or.kr/api/company.json?crtfc_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&corp_code=00126380
```

#### 공시검색

```
https://opendart.fss.or.kr/api/list.json?crtfc_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&bgn_de=20200117&end_de=20200117&corp_cls=Y&page_no=1&page_count=10
```

#### 재무재표

```python
https://opendart.fss.or.kr/api/fnlttSinglAcntAll.json?crtfc_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&corp_code=00126380&bsns_year=2018&reprt_code=11011&fs_div=OFS
```

### 네이버

#### 네이버 트렌드

site: https://developers.naver.com/apps/#/list <br>
reference: [Naver Trend](https://developers.naver.com/docs/serviceapi/datalab/search/search.md#%ED%86%B5%ED%95%A9-%EA%B2%80%EC%83%89%EC%96%B4-%ED%8A%B8%EB%A0%8C%EB%93%9C)

<br>

<span style="font-size: 18px;">Naver Official Guideline</span>

```python
import os
import sys
import urllib.request
client_id = "YOUR_CLIENT_ID"
client_secret = "YOUR_CLIENT_SECRET"
url = "https://openapi.naver.com/v1/datalab/search";
body = "{\"startDate\":\"2017-01-01\",\"endDate\":\"2017-04-30\",\"timeUnit\":\"month\",\"keywordGroups\":[{\"groupName\":\"한글\",\"keywords\":[\"한글\",\"korean\"]},{\"groupName\":\"영어\",\"keywords\":[\"영어\",\"english\"]}],\"device\":\"pc\",\"ages\":[\"1\",\"2\"],\"gender\":\"f\"}";

request = urllib.request.Request(url)
request.add_header("X-Naver-Client-Id",client_id)
request.add_header("X-Naver-Client-Secret",client_secret)
request.add_header("Content-Type","application/json")
response = urllib.request.urlopen(request, data=body.encode("utf-8"))
rescode = response.getcode()
if(rescode==200):
    response_body = response.read()
    print(response_body.decode('utf-8'))
else:
    print("Error Code:" + rescode)
```

<br>

<span style="font-size: 18px;">requests.post</span>

```python
import requests
import json

url = "https://openapi.naver.com/v1/datalab/search"

headers = {
    'X-Naver-Client-Id': client_id,
    'X-Naver-Client-Secret': client_secret,
    'Content-Type': 'application/json',
}

body = {
    "startDate": "2017-01-01",
    "endDate": "2017-04-30",
    "timeUnit": "month",
    "keywordGroups":[
        {"groupName": "한글", "keywords": ["한글","korean"]},
        {"groupName": "영어", "keywords": ["영어","english"]}],
    "device": "pc",
    "ages": ["1", "2"],
    "gender": "f",
}

response = requests.post(url, data=json.dumps(body), headers=headers)
```

## Data Sources

공공데이터포털: https://www.data.go.kr/ <br>
서울열린데이터광장: http://data.seoul.go.kr/ <br>
경제통계: https://ecos.bok.or.kr/ <br>
보건의료빅데이터개방시스템: https://opendata.hira.or.kr/home.do <br>
문화공공데이터광장: https://www.culture.go.kr/data/main/main.do <br>
VWORLD: https://www.vworld.kr/dev/v4api.do <br>
네이버 API: https://developers.naver.com/products/service-api/datalab/datalab.md <br>
카카오 API: https://developers.kakao.com/ <br>
대신증권 API: https://money2.daishin.com/E5/WTS/Customer/GuideTrading/DW_CybosPlus_Page.aspx?m=9508&p=8812&v=8632 <br>
미국공공데이터: https://www.data.gov/ <br>
