### Web Crawing
- 웹 페이지의 종류
    - 정적 페이지 : 페이지의 데이터가 변경될 때 URL이 변경 O, 서버에서 **html**을 받아와서 다른 file(page)를 출력.
    - 동적 페이지 : 페이지의 데이터가 변경될 때 URL이 변경 X, 서버에서 **json**을 받아와서 딕셔너리에 추가됨.
- requests package
    - 브라우저 URL을 입력하면 서버에서 데이터를 다운받아 화면에 출력. : URL -> DATA
    - request 패키지 : URL -> DATA

### Naver Stock Data
- Kospi 지수
- Kosdaq 지수
- USD : 원달러 환율

In [3]:
import requests
import pandas as pd

1. 웹 서비스를 분석 : 크롬 개발자 도구 : URL
2. 데이터를 요청하는 Request > 데이터를 받아오는 Response : JSON(str)
3. JSON(str) > python의 list, dict > DataFrame으로 출력.

- 참고1: PC버전에 자료가 많다고 느껴지면 모바일버전에서 확인해보기.
    - 1. F12를 눌러서 크롬 개발자 도구를 연다.
    - 2. Element 탭을 열어서, 바로 왼쪽의 아이콘(Toggle device toolbar)를 누른다.
    - 3. Dimension을 설정 후, 페이지를 새로고침(F5)을 한다. (or 브라우저 주소창을 누르고 엔터)
    - 안되는 경우가 있는데, 이 경우엔 모바일 버전의 홈페이지를 연다.

- 참고2 : Network탭에서 데이터를 받는 URL을 찾을 수 있다.

In [19]:
# 1. 웹 서비스를 분석. : 크롬 개발자 도구 : URL
# https://m.stock.naver.com/
url = 'https://m.stock.naver.com/api/index/KOSPI/price?pageSize=10&page=4'

In [7]:
# 2. requeset(url) > response(json) : JSON(str)
response = requests.get(url) # Shift + Tab을 눌러서 함수 확인 가능.
response # <Response [200]> 이 뜨는데 이것은 접속 성공의 의미이다.

<Response [200]>

In [9]:
response.text # 리스트 안에 딕셔너리가 있는 것을 확인할 수 있다.

'[{"localTradedAt":"2022-06-22","closePrice":"2,342.81","compareToPreviousClosePrice":"-66.12","compareToPreviousPrice":{"code":"5","text":"하락","name":"FALLING"},"fluctuationsRatio":"-2.74","openPrice":"2,417.11","highPrice":"2,418.05","lowPrice":"2,342.81"},{"localTradedAt":"2022-06-21","closePrice":"2,408.93","compareToPreviousClosePrice":"17.90","compareToPreviousPrice":{"code":"2","text":"상승","name":"RISING"},"fluctuationsRatio":"0.75","openPrice":"2,402.99","highPrice":"2,423.48","lowPrice":"2,385.60"},{"localTradedAt":"2022-06-20","closePrice":"2,391.03","compareToPreviousClosePrice":"-49.90","compareToPreviousPrice":{"code":"5","text":"하락","name":"FALLING"},"fluctuationsRatio":"-2.04","openPrice":"2,449.89","highPrice":"2,449.89","lowPrice":"2,372.35"},{"localTradedAt":"2022-06-17","closePrice":"2,440.93","compareToPreviousClosePrice":"-10.48","compareToPreviousPrice":{"code":"5","text":"하락","name":"FALLING"},"fluctuationsRatio":"-0.43","openPrice":"2,409.72","highPrice":"2,441.

In [10]:
response.text[:200]

'[{"localTradedAt":"2022-06-22","closePrice":"2,342.81","compareToPreviousClosePrice":"-66.12","compareToPreviousPrice":{"code":"5","text":"하락","name":"FALLING"},"fluctuationsRatio":"-2.74","openPrice"'

In [16]:
# 3. JSON(str) > list, dict > DataFrame
data = response.json()
type(data), data[:2]

(list,
 [{'localTradedAt': '2022-06-22',
   'closePrice': '2,342.81',
   'compareToPreviousClosePrice': '-66.12',
   'compareToPreviousPrice': {'code': '5', 'text': '하락', 'name': 'FALLING'},
   'fluctuationsRatio': '-2.74',
   'openPrice': '2,417.11',
   'highPrice': '2,418.05',
   'lowPrice': '2,342.81'},
  {'localTradedAt': '2022-06-21',
   'closePrice': '2,408.93',
   'compareToPreviousClosePrice': '17.90',
   'compareToPreviousPrice': {'code': '2', 'text': '상승', 'name': 'RISING'},
   'fluctuationsRatio': '0.75',
   'openPrice': '2,402.99',
   'highPrice': '2,423.48',
   'lowPrice': '2,385.60'}])

In [20]:
df = pd.DataFrame(data)[['localTradedAt', 'closePrice']]
df.tail(4)

Unnamed: 0,localTradedAt,closePrice
6,2022-06-14,2492.97
7,2022-06-13,2504.51
8,2022-06-10,2595.87
9,2022-06-09,2625.44


In [21]:
# 4. 함수 만들기
# parmas : pagesize, page
# https://m.stock.naver.com/api/index/KOSPI/price?pageSize=10&page=4
# 크론탭(crontab)을 이용하면 특정 시간에 특정 코드를 실행시키는 스케줄링을 할 수 있음.

In [41]:
def stock_price(pagesize, page):
    url = f'https://m.stock.naver.com/api/index/KOSPI/price?pageSize={pagesize}&page={page}'
    response = requests.get(url)
    data = response.json()
    return pd.DataFrame(data)[['localTradedAt', 'closePrice']]

In [23]:
stock_price(30, 2)

Unnamed: 0,localTradedAt,closePrice
0,2022-06-22,2342.81
1,2022-06-21,2408.93
2,2022-06-20,2391.03
3,2022-06-17,2440.93
4,2022-06-16,2451.41
5,2022-06-15,2447.38
6,2022-06-14,2492.97
7,2022-06-13,2504.51
8,2022-06-10,2595.87
9,2022-06-09,2625.44


### 실습해보기
- KOSDAQ 데이터 수집 코드를 작성해보기

0. request와 pandas를 import.

In [25]:
import requests
import pandas as pd

1. 웹 서비스 분석하기 :  URL

In [26]:
url = 'https://m.stock.naver.com/api/index/KOSDAQ/price?pageSize=10&page=2'

2. request(url) > response(json) : JSON(str)

In [28]:
response = requests.get(url)
response.text

'[{"localTradedAt":"2022-07-06","closePrice":"2,292.01","compareToPreviousClosePrice":"-49.77","compareToPreviousPrice":{"code":"5","text":"하락","name":"FALLING"},"fluctuationsRatio":"-2.13","openPrice":"2,330.11","highPrice":"2,332.14","lowPrice":"2,290.33"},{"localTradedAt":"2022-07-05","closePrice":"2,341.78","compareToPreviousClosePrice":"41.44","compareToPreviousPrice":{"code":"2","text":"상승","name":"RISING"},"fluctuationsRatio":"1.80","openPrice":"2,322.11","highPrice":"2,344.08","lowPrice":"2,309.62"},{"localTradedAt":"2022-07-04","closePrice":"2,300.34","compareToPreviousClosePrice":"-5.08","compareToPreviousPrice":{"code":"5","text":"하락","name":"FALLING"},"fluctuationsRatio":"-0.22","openPrice":"2,310.73","highPrice":"2,318.31","lowPrice":"2,276.63"},{"localTradedAt":"2022-07-01","closePrice":"2,305.42","compareToPreviousClosePrice":"-27.22","compareToPreviousPrice":{"code":"5","text":"하락","name":"FALLING"},"fluctuationsRatio":"-1.17","openPrice":"2,342.92","highPrice":"2,354.9

3. JSON(str) > list, dict > DataFrame

In [31]:
data = response.json()
data

[{'localTradedAt': '2022-07-06',
  'closePrice': '2,292.01',
  'compareToPreviousClosePrice': '-49.77',
  'compareToPreviousPrice': {'code': '5', 'text': '하락', 'name': 'FALLING'},
  'fluctuationsRatio': '-2.13',
  'openPrice': '2,330.11',
  'highPrice': '2,332.14',
  'lowPrice': '2,290.33'},
 {'localTradedAt': '2022-07-05',
  'closePrice': '2,341.78',
  'compareToPreviousClosePrice': '41.44',
  'compareToPreviousPrice': {'code': '2', 'text': '상승', 'name': 'RISING'},
  'fluctuationsRatio': '1.80',
  'openPrice': '2,322.11',
  'highPrice': '2,344.08',
  'lowPrice': '2,309.62'},
 {'localTradedAt': '2022-07-04',
  'closePrice': '2,300.34',
  'compareToPreviousClosePrice': '-5.08',
  'compareToPreviousPrice': {'code': '5', 'text': '하락', 'name': 'FALLING'},
  'fluctuationsRatio': '-0.22',
  'openPrice': '2,310.73',
  'highPrice': '2,318.31',
  'lowPrice': '2,276.63'},
 {'localTradedAt': '2022-07-01',
  'closePrice': '2,305.42',
  'compareToPreviousClosePrice': '-27.22',
  'compareToPreviousP

In [33]:
kosdaq = pd.DataFrame(data)
kosdaq.head(5)

Unnamed: 0,localTradedAt,closePrice,compareToPreviousClosePrice,compareToPreviousPrice,fluctuationsRatio,openPrice,highPrice,lowPrice
0,2022-07-06,2292.01,-49.77,"{'code': '5', 'text': '하락', 'name': 'FALLING'}",-2.13,2330.11,2332.14,2290.33
1,2022-07-05,2341.78,41.44,"{'code': '2', 'text': '상승', 'name': 'RISING'}",1.8,2322.11,2344.08,2309.62
2,2022-07-04,2300.34,-5.08,"{'code': '5', 'text': '하락', 'name': 'FALLING'}",-0.22,2310.73,2318.31,2276.63
3,2022-07-01,2305.42,-27.22,"{'code': '5', 'text': '하락', 'name': 'FALLING'}",-1.17,2342.92,2354.97,2291.49
4,2022-06-30,2332.64,-45.35,"{'code': '5', 'text': '하락', 'name': 'FALLING'}",-1.91,2368.57,2368.57,2332.59


In [38]:
def KOSDAQ(pagesize, page):
    url = f'https://m.stock.naver.com/api/index/KOSDAQ/price?pageSize={pagesize}&page={page}'
    response = requests.get(url)
    data = response.json()
    return pd.DataFrame(data)

In [39]:
KOSDAQ(20,3)

Unnamed: 0,localTradedAt,closePrice,compareToPreviousClosePrice,compareToPreviousPrice,fluctuationsRatio,openPrice,highPrice,lowPrice
0,2022-06-08,874.95,1.17,"{'code': '2', 'text': '상승', 'name': 'RISING'}",0.13,877.77,880.73,873.61
1,2022-06-07,873.78,-17.73,"{'code': '5', 'text': '하락', 'name': 'FALLING'}",-1.99,889.07,889.09,873.09
2,2022-06-03,891.51,0.37,"{'code': '2', 'text': '상승', 'name': 'RISING'}",0.04,897.08,899.01,890.2
3,2022-06-02,891.14,-2.22,"{'code': '5', 'text': '하락', 'name': 'FALLING'}",-0.25,890.47,892.98,888.45
4,2022-05-31,893.36,6.92,"{'code': '2', 'text': '상승', 'name': 'RISING'}",0.78,888.44,893.42,884.11
5,2022-05-30,886.44,12.47,"{'code': '2', 'text': '상승', 'name': 'RISING'}",1.43,882.17,886.44,881.46
6,2022-05-27,873.97,2.54,"{'code': '2', 'text': '상승', 'name': 'RISING'}",0.29,881.41,883.36,873.21
7,2022-05-26,871.43,-1.26,"{'code': '5', 'text': '하락', 'name': 'FALLING'}",-0.14,876.19,885.12,868.81
8,2022-05-25,872.69,7.62,"{'code': '2', 'text': '상승', 'name': 'RISING'}",0.88,869.19,877.71,861.92
9,2022-05-24,865.07,-18.52,"{'code': '5', 'text': '하락', 'name': 'FALLING'}",-2.1,881.55,883.11,865.05


In [45]:
def stock_price(pagesize, page, code='KOSPI'):
    """This function is crawling stock price from naver webpage.
    Params
    ------
    pagesize : int : one page size
    page : int : page number
    code : str : KOSPI or KOSDAQ
    Return
    ------
    type : DataFrame : display date, price columns
    """
    url = f'https://m.stock.naver.com/api/index/{code}/price?pageSize={pagesize}&page={page}'
    response = requests.get(url)
    data = response.json()
    return pd.DataFrame(data)[['localTradedAt', 'closePrice']]

In [47]:
# docstring : 함수를 사용하는 방법을 문자열로 작성
# help(), shift + tab
stock_price(20,3,'KOSDAQ')

Unnamed: 0,localTradedAt,closePrice
0,2022-06-08,874.95
1,2022-06-07,873.78
2,2022-06-03,891.51
3,2022-06-02,891.14
4,2022-05-31,893.36
5,2022-05-30,886.44
6,2022-05-27,873.97
7,2022-05-26,871.43
8,2022-05-25,872.69
9,2022-05-24,865.07
