<h1 align="center">KRX Big-Data Contest</h1>

# [ 1 ] Overview

## 1. Sources

### - Basic

 - `[유가증권]일별 시세정보(주문번호-1300-27)` : 2020년 1분기 주식 정보 -> _CSV 형식_
 - `[유가증권]일별 시세정보(주문번호-1300-30)` : 2021년 1분기 주식 정보 -> _CSV 형식_
 - `[유가증권]일별 시세정보(주문번호-1300-33)` : 2022년 1분기 주식 정보 -> _CSV 형식_

### - Extension ( https://kr.investing.com )

 - `환율 추이` : 2020년, 2021년, 2022년 1분기 -> _CSV 형식_
 - `미국 3년 채권수익률` : 2020년, 2021년, 2022년 1분기 -> _CSV 형식_
 - 각종 주식 지표를 계산하기 위한 추가 KRX 추가 데이터
   - 2019년 12월
   - 2020년 12월
   - 2021년 12월

<br><br><br>

## 2. Targets from `.CSV` files (Input)

| 항목 명 | 항목 영어명 | 모델 학습값 여부 |
|:---:|:---:|:---:|
|`거래일자`|TRD_DD|Y|
|`종목코드`|ISU_CD|N|
|`종목명`|ISU_NM|N|
|`시가`|OPNPRC|Y|
|`고가`|HGPRC|Y|
|`저가`|LWPRC|Y|
|`종가`|CLSPRC|Y|
|(누적)`거래량`|ACC_TRDVOL|Y|
|`업종구분`(지수업종코드)|IDX_IND_CD|N|
|`PER`(주가수익률)|PER|Y|
|`상장일`|LIST_DD|N|
|`시가총액`|MKTCAP|Y|

<!-- <br><br><br>

## 3. Results (Output)

| Property | Description |
|:---:|:---:|
|TRD_DD|`거래일자`|
|ISU_CD|`종목코드`|
|ISU_NM|`종목명`|
|OPNPRC|`시가`|
|HGPRC|`고가`|
|LWPRC|`저가`|
|CLSPRC|`종가`|
|ACC_TRDVOL|(누적)`거래량`|
|IDX_IND_CD|`업종구분`(지수업종코드)|
|PER|`PER`(주가수익률)|
|LIST_DD|`상장일`|
|MKTCAP|`시가총액`|
 -->
<br><br><br><hr>

# [ 2 ] Importing Modules

In [1]:
# Data Handlers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display

# Code Libraries
import os
import copy
import abc
import datetime

<br><br><br>

# [ 3 ] Declarations

In [216]:
############################################################################################################

import copy

class Utils:
    """
    데이터 처리를 위해 유용한 기능들을 정의한 클래스입니다.
    """
    @staticmethod
    def generate_int_range(start:int, end:int)->iter:
        """
        start와 end 사이의 정수들을 반환하는 Generator입니다.
        """
        if(start >= end):
            raise SyntaxError(f"{start} must be larger than {end}")
        while start <= end:
            yield start
            start += 1
            
    @staticmethod
    def clone(target:object)->object:
        """
        깊은 복사를 한 인스턴스를 반환합니다.
        """
        return copy.deepcopy(target)
    
############################################################################################################

import pandas

class PandasBasedCSVHandler:
    """
    Pandas 모듈을 기반으로 CSV 파일 데이터를 다루는 클래스입니다.
    """
    def __init__(self, handler:pandas):
        self.__handler = handler
        self.__data = dict()
        
    @property
    def handler(self)->pandas:
        """
        주입받은 Pandas 객체를 반환합니다.
        이미 생성된 handler 인스턴스는 대체될 수 없고 반환만 가능합니다.
        """
        return self.__handler
    
    def take_data_from_CSV_file(self, *, data_id:object, filepath:str, encoding:str="utf-8")->object:
        """
        불러올 CSV 파일의 경로를 받아 데이터를 가져오고
        데이터를 식별할 data_id를 받습니다.
        Argument를 반드시 키워드로 명시하여 Parameter에 전달해야 합니다.
        """
        self.__data[data_id] = self.__handler.read_csv(filepath, encoding=encoding)
            
        return self
    
    def get_CSV_data(self)->dict:
        """
        다음 Dictionary 자료구조를 반환합니다.
        key의 타입(자료형)은 정수형으로 의도되었지만 어떤 타입이 들어올지는 자유입니다.
        value는 Pandas 타입의 객체입니다.
        """
        return self.__data
    
    def validate(self, target_properties:list)->object:
        """
        모든 Pandas 데이터가 target_properties에 명시된 속성을 가지고 있는지 확인합니다.
        만일, 속성이 매칭되지 않으면 예외가 발생할 것입니다.
        모든 과정이 성공하면 True를 반환합니다.
        """
        for data_key, _ in self.__data.items():
            self.__data[data_key][target_properties]
        return True

############################################################################################################
import requests
import json
from pandas import json_normalize

class KRXStockCrawler:
    """
    KRX 정보데이터시스템 데이터 크롤링하는 클래스입니다.
    주식 관련 지표들을 계산하기 위한 추가 데이터들을 얻는 것이 목적입니다.
    """
    def __init__(self):
        self.__url = "http://data.krx.co.kr/comm/bldAttendant/getJsonData.cmd"
        self.__headers = "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
        self.__base_data = {
            "bld": "dbms/MDC/STAT/standard/MDCSTAT01701",
            "locale": "ko_KR",
            "param1isuCd_finder_stkisu0_1": "ALL",
            "share": 1,
            "money": 1,
            "csvxls_isNo": "false"
        }
        
    def execute(self, code:str, start_date:int, end_date:int)->dict:
        """
        다음 변수들을 기반으로 크롤링하고 그 결과를 딕셔너리 자료구조 형태로 반환합니다.
           code        : 종목코드
           start_date  : 시작날짜(ex. 20201201)
           end_date    : 종료날짜(ex. 20201231)
        """
        query_data = {
            "isuCd": code,
            "strtDd": start_date,
            "endDd": end_date,
        }
        res = requests.post('http://data.krx.co.kr/comm/bldAttendant/getJsonData.cmd', data=dict(**self.__base_data,**query_data))
        dict_json = json.loads(res.text)
        return dict_json['output']
    
############################################################################################################
class KRXStockData:
    """
    
    """
    pass
    

############################################################################################################

class DataVisualization:
    pass

############################################################################################################

<br><br><br>

# [ 4 ] Data Preprocessing

## 01. 준비 단계

In [217]:
# "종목코드"를 기준으로 각 CSV 데이터들을 식별하는 용도의 자료구조
index_properties:list = dict()

In [218]:
# 데이터 전처리에 필요한 속성들
selected_properties = {
    'krx':["거래일자","종목코드","종목명", "시가", "고가", "저가", "종가", "거래량", "업종구분", "PER", "상장일", "시가총액"],
    'investing.com': ["날짜", "종가", "오픈", "고가", "저가", "변동 %"]
}

## 01-A. 데이터 수집 및 전처리 단계: `KRX에서 기본적으로 제공받은 CSV`

In [219]:
# class<PandasBasedCSVHandler> 인스턴스 생성
csv_handler_krx:PandasBasedCSVHandler = PandasBasedCSVHandler(pd)

### (1) 데이터 경로 및 이름 설정

In [220]:
# CSV 파일 루트 경로
root_dir:str = os.path.join("..", "data")

# CSV 파일 전체 경로 및 이름 형식
filepath_form:str = os.path.join(root_dir, "{0}","{0}_{1}.csv")

# 각 CSV 파일들을 가져오기 위한 프로파일 list<dict[]>
csv_file_profiles : list = [
    {
        "name" : "[유가증권]일별 시세정보(주문번호-1300-27)",
        "date_range" : [202001, 202003]
    },
    {
        "name" : "[유가증권]일별 시세정보(주문번호-1300-30)",
        "date_range" : [202101, 202103]
    },
    {
        "name" : "[유가증권]일별 시세정보(주문번호-1300-33)",
        "date_range" : [202201, 202203]
    }
]

### (2) CSV 파일 데이터 불러오기

In [221]:
# CSV 프로파일 기반으로 class<PandasBasedCSVHandler> 인스턴스에 데이터 병합
for csv_file_info in csv_file_profiles: # 프로파일 요소 기반 iteration 작업
    for date_num in Utils.generate_int_range(csv_file_info["date_range"][0], csv_file_info["date_range"][1]): # Iterator 생성
        csv_handler_krx.take_data_from_CSV_file(
            data_id = date_num, # Year + Month 형식의 정수
            filepath = filepath_form.format(csv_file_info["name"], date_num), # 폴더를 포함한 전체 경로의 파일명
            encoding="cp949" # 파일 인코딩 명시
        )

### (3) 데이터 속성 검증하기

In [222]:
# 검증할 속성 내용들은 [1. 데이터 경로 및 이름 설정]의 selected_properties 변수 참고
csv_handler_krx.validate(selected_properties['krx'])

True

### (4) 데이터 정형화

In [223]:
dataframe_list = csv_handler_krx.get_CSV_data()

 - 날짜 형식 일치시키기

In [224]:
for df_id in dataframe_list:
    dataframe_list[df_id]["거래일자"] = pd.to_datetime(dataframe_list[df_id]['거래일자'].astype('str'))  

#### - **데이터 형식**

```js
{
    202001 : Pandas,
    202002 : Pandas,
    202003 : Pandas,
    202101 : Pandas,
    202102 : Pandas,
    202103 : Pandas,
    202201 : Pandas,
    202202 : Pandas,
    202203 : Pandas
}
```

<br>
<br>
<br>
<br>
<br>

## 01-B. 데이터 수집 및 전처리 단계: `미국 3년 채권수익률`

In [225]:
# class<PandasBasedCSVHandler> 인스턴스 생성
csv_handler_bond:PandasBasedCSVHandler = PandasBasedCSVHandler(pd)

### (1) 데이터 경로 및 이름 설정

In [226]:
# CSV 파일 루트 경로
root_dir:str = os.path.join("..", "data")

# CSV 파일 전체 경로 및 이름 형식
filepath_form:str = os.path.join(root_dir, "{0}","{1}.csv")

# 각 CSV 파일들을 가져오기 위한 프로파일 list<dict[]>
csv_file_profiles : list = [
    {
        "name" : "미국 3년 채권수익률",
        "date_range" : [202001, 202003]
    },
    {
        "name" : "미국 3년 채권수익률",
        "date_range" : [202101, 202103]
    },
    {
        "name" : "미국 3년 채권수익률",
        "date_range" : [202201, 202203]
    }
]

### (2) CSV 파일 데이터 불러오기

In [227]:
# CSV 프로파일 기반으로 class<PandasBasedCSVHandler> 인스턴스에 데이터 병합
for csv_file_info in csv_file_profiles: # 프로파일 요소 기반 iteration 작업
    for date_num in Utils.generate_int_range(csv_file_info["date_range"][0], csv_file_info["date_range"][1]): # Iterator 생성
        csv_handler_bond.take_data_from_CSV_file(
            data_id = date_num, # Year + Month 형식의 정수
            filepath = filepath_form.format(csv_file_info["name"], date_num), # 폴더를 포함한 전체 경로의 파일명
            encoding="cp949" # 파일 인코딩 명시
        )

### (3) 데이터 속성 검증하기

In [228]:
# 검증할 속성 내용들은 [1. 데이터 경로 및 이름 설정]의 selected_properties 변수 참고
csv_handler_bond.validate(selected_properties['investing.com'])

True

### (4) 데이터 정형화

In [229]:
dataframe_list = csv_handler_bond.get_CSV_data()

 - 날짜 형식 일치시키기

In [230]:
for df_id in dataframe_list:
    dataframe_list[df_id]['날짜'] = pd.to_datetime(dataframe_list[df_id]['날짜'], format='%Y년 %m월 %d일')  

 - DataFrame 병합과 데이터 식별을 위해 각 컬럼 이름 바꿔주기

In [231]:
for df_id in dataframe_list:
    column_replacer = ["BOND " + column_name for column_name in selected_properties['investing.com'] if column_name != "날짜"]
    column_replacer.insert(0,"거래일자")
    dataframe_list[df_id].columns = column_replacer

 - 날짜를 기준으로 역순으로 배열하기

In [232]:
for df_id in dataframe_list:
    dataframe_list[df_id] = pd.concat([dataframe_list[df_id].iloc[::-1]], ignore_index=True) 

#### - **데이터 형식**

```js
{
    202001 : Pandas,
    202002 : Pandas,
    202003 : Pandas,
    202101 : Pandas,
    202102 : Pandas,
    202103 : Pandas,
    202201 : Pandas,
    202202 : Pandas,
    202203 : Pandas
}
```

<br>
<br>
<br>
<br>
<br>

## 01-C. 데이터 수집 및 전처리 단계: `환율 추이`

In [233]:
# class<PandasBasedCSVHandler> 인스턴스 생성
csv_handler_exchange_rate:PandasBasedCSVHandler = PandasBasedCSVHandler(pd)

### (1) 데이터 경로 및 이름 설정

In [234]:
# CSV 파일 루트 경로
root_dir:str = os.path.join("..", "data")

# CSV 파일 전체 경로 및 이름 형식
filepath_form:str = os.path.join(root_dir, "{0}","{1}.csv")

# 각 CSV 파일들을 가져오기 위한 프로파일 list<dict[]>
csv_file_profiles : list = [
    {
        "name" : "환율 추이",
        "date_range" : [202001, 202003]
    },
    {
        "name" : "환율 추이",
        "date_range" : [202101, 202103]
    },
    {
        "name" : "환율 추이",
        "date_range" : [202201, 202203]
    }
]

### (2) CSV 파일 데이터 불러오기

In [235]:
# CSV 프로파일 기반으로 class<PandasBasedCSVHandler> 인스턴스에 데이터 병합
for csv_file_info in csv_file_profiles: # 프로파일 요소 기반 iteration 작업
    for date_num in Utils.generate_int_range(csv_file_info["date_range"][0], csv_file_info["date_range"][1]): # Iterator 생성
        csv_handler_exchange_rate.take_data_from_CSV_file(
            data_id = date_num, # Year + Month 형식의 정수
            filepath = filepath_form.format(csv_file_info["name"], date_num), # 폴더를 포함한 전체 경로의 파일명
            encoding="cp949" # 파일 인코딩 명시
        )

### (3) 데이터 속성 검증하기

In [236]:
# 검증할 속성 내용들은 [1. 데이터 경로 및 이름 설정]의 selected_properties 변수 참고
csv_handler_exchange_rate.validate(selected_properties['investing.com'])

True

### (4) 데이터 정형화

In [237]:
dataframe_list = csv_handler_exchange_rate.get_CSV_data()

 - 날짜 형식 일치시키기

In [238]:
for df_id in dataframe_list:
    dataframe_list[df_id]['날짜'] = pd.to_datetime(dataframe_list[df_id]['날짜'], format='%Y년 %m월 %d일')  

 - DataFrame 병합과 데이터 식별을 위해 각 컬럼 이름 바꿔주기

In [239]:
for df_id in dataframe_list:
    column_replacer = ["EX_RATE " + column_name for column_name in selected_properties['investing.com'] if column_name != "날짜"]
    column_replacer.insert(0,"거래일자")
    dataframe_list[df_id].columns = column_replacer

 - 날짜를 기준으로 역순으로 배열하기

In [240]:
for df_id in dataframe_list:
    dataframe_list[df_id] = pd.concat([dataframe_list[df_id].iloc[::-1]], ignore_index=True) 

#### - **데이터 형식**

```js
{
    202001 : Pandas,
    202002 : Pandas,
    202003 : Pandas,
    202101 : Pandas,
    202102 : Pandas,
    202103 : Pandas,
    202201 : Pandas,
    202202 : Pandas,
    202203 : Pandas
}
```

<hr>

## 02. 수집된 데이터 정리 및 주식 관련 지표 추가

### (1) KRX

In [241]:
krx_data = Utils.clone(csv_handler_krx.get_CSV_data())
krx_data[202001][selected_properties['krx']].head()

Unnamed: 0,거래일자,종목코드,종목명,시가,고가,저가,종가,거래량,업종구분,PER,상장일,시가총액
0,2020-01-02,KR7000020008,동화약품보통주,8340,8400,8290,8400,111305,의약품 제조업,23.01,19760324,234624348000
1,2020-01-03,KR7000020008,동화약품보통주,8400,8450,8290,8360,96437,의약품 제조업,22.9,19760324,233507089200
2,2020-01-06,KR7000020008,동화약품보통주,8290,8330,8120,8180,73230,의약품 제조업,22.41,19760324,228479424600
3,2020-01-07,KR7000020008,동화약품보통주,8200,8280,8090,8160,117904,의약품 제조업,22.36,19760324,227920795200
4,2020-01-08,KR7000020008,동화약품보통주,8170,8170,7830,7930,263246,의약품 제조업,21.73,19760324,221496557100


### (2) Investing (미국 3년 채권수익률)

In [242]:
bond_data = Utils.clone(csv_handler_bond.get_CSV_data())
bond_data[202001].head()

Unnamed: 0,거래일자,BOND 종가,BOND 오픈,BOND 고가,BOND 저가,BOND 변동 %
0,2020-01-01,1.6086,1.6086,1.6086,1.6086,0.00%
1,2020-01-02,1.595,1.614,1.625,1.565,-0.85%
2,2020-01-03,1.5457,1.562,1.573,1.524,-3.09%
3,2020-01-05,1.524,1.5267,1.5267,1.524,-1.40%
4,2020-01-06,1.5593,1.527,1.576,1.516,2.32%


### (3) Investing (환율)

In [243]:
exchange_rate_data = Utils.clone(csv_handler_exchange_rate.get_CSV_data())
exchange_rate_data[202001].head()

Unnamed: 0,거래일자,EX_RATE 종가,EX_RATE 오픈,EX_RATE 고가,EX_RATE 저가,EX_RATE 변동 %
0,2020-01-01,1154.02,1155.07,1155.32,1154.08,0.00%
1,2020-01-02,1157.35,1155.02,1161.15,1153.48,0.29%
2,2020-01-03,1164.95,1157.94,1168.83,1155.7,0.66%
3,2020-01-06,1166.94,1165.89,1172.99,1165.78,0.17%
4,2020-01-07,1167.3,1167.54,1168.82,1163.11,0.03%


#### - 전처리 결과 변수

| Variable Name | Description |
|:---:|:---:|
|`krx_data`| KRX 유가증권 CSV 데이터 모음|
|`bond_data`| Investing.com 미국 3년 채권수익률|
|`exchange_rate_data`| Investing.com 환율 추이|

## 03. 데이터 병합

In [252]:
exchange_rate_and_bond = exchange_rate_data[202001].merge(bond_data[202001])
krx_and_exchange_rate_and_bond = krx_data[202001][selected_properties['krx']].merge(exchange_rate_and_bond, on='거래일자')

 - 특정 종목을 통한 병합 확인

In [253]:
krx_and_exchange_rate_and_bond.loc[krx_and_exchange_rate_and_bond["종목코드"] == "KR7000020008"]

Unnamed: 0,거래일자,종목코드,종목명,시가,고가,저가,종가,거래량,업종구분,PER,...,EX_RATE 종가,EX_RATE 오픈,EX_RATE 고가,EX_RATE 저가,EX_RATE 변동 %,BOND 종가,BOND 오픈,BOND 고가,BOND 저가,BOND 변동 %
0,2020-01-02,KR7000020008,동화약품보통주,8340,8400,8290,8400,111305,의약품 제조업,23.01,...,1157.35,1155.02,1161.15,1153.48,0.29%,1.595,1.614,1.625,1.565,-0.85%
916,2020-01-03,KR7000020008,동화약품보통주,8400,8450,8290,8360,96437,의약품 제조업,22.9,...,1164.95,1157.94,1168.83,1155.7,0.66%,1.5457,1.562,1.573,1.524,-3.09%
1832,2020-01-06,KR7000020008,동화약품보통주,8290,8330,8120,8180,73230,의약품 제조업,22.41,...,1166.94,1165.89,1172.99,1165.78,0.17%,1.5593,1.527,1.576,1.516,2.32%
2748,2020-01-07,KR7000020008,동화약품보통주,8200,8280,8090,8160,117904,의약품 제조업,22.36,...,1167.3,1167.54,1168.82,1163.11,0.03%,1.5511,1.562,1.57,1.546,-0.53%
3664,2020-01-08,KR7000020008,동화약품보통주,8170,8170,7830,7930,263246,의약품 제조업,21.73,...,1162.25,1168.3,1179.67,1160.5,-0.43%,1.6071,1.508,1.637,1.468,3.61%
4580,2020-01-09,KR7000020008,동화약품보통주,8020,8060,7900,7900,50346,의약품 제조업,21.64,...,1158.72,1162.49,1163.55,1157.41,-0.30%,1.5964,1.604,1.631,1.591,-0.67%
5496,2020-01-10,KR7000020008,동화약품보통주,7970,8140,7880,8100,77059,의약품 제조업,22.19,...,1157.97,1159.73,1164.16,1157.55,-0.06%,1.5857,1.599,1.612,1.575,-0.67%
6412,2020-01-13,KR7000020008,동화약품보통주,8140,8250,8070,8220,91646,의약품 제조업,22.52,...,1153.95,1158.92,1159.57,1153.13,-0.35%,1.6018,1.626,1.626,1.588,1.02%
7328,2020-01-14,KR7000020008,동화약품보통주,8240,8240,8070,8140,100901,의약품 제조업,22.3,...,1157.2,1154.95,1159.66,1150.3,0.28%,1.5803,1.602,1.612,1.575,-1.34%
8244,2020-01-15,KR7000020008,동화약품보통주,8160,8170,8000,8090,72255,의약품 제조업,22.16,...,1157.91,1158.2,1162.8,1155.85,0.06%,1.5643,1.578,1.586,1.559,-1.01%


 - 변수 이름 변경

In [254]:
total_data = krx_and_exchange_rate_and_bond

#### - 병합된 전체 데이터 변수

| Variable Name | Description |
|:---:|:---:|
| **total_data** | `krx_data` + `bond_data` + `exchange_rate_data`|

# [ 5 ] Stock Predictions

In [255]:
# total_data 이 변수를 활용하시면 됩니다.

In [256]:
total_data.head(5)

Unnamed: 0,거래일자,종목코드,종목명,시가,고가,저가,종가,거래량,업종구분,PER,...,EX_RATE 종가,EX_RATE 오픈,EX_RATE 고가,EX_RATE 저가,EX_RATE 변동 %,BOND 종가,BOND 오픈,BOND 고가,BOND 저가,BOND 변동 %
0,2020-01-02,KR7000020008,동화약품보통주,8340,8400,8290,8400,111305,의약품 제조업,23.01,...,1157.35,1155.02,1161.15,1153.48,0.29%,1.595,1.614,1.625,1.565,-0.85%
1,2020-01-02,KR7000040006,KR모터스보통주,288,288,278,282,511750,그외 기타 운송장비 제조업,-,...,1157.35,1155.02,1161.15,1153.48,0.29%,1.595,1.614,1.625,1.565,-0.85%
2,2020-01-02,KR7000050005,경방보통주,9280,9600,9270,9580,51436,종합 소매업,12.95,...,1157.35,1155.02,1161.15,1153.48,0.29%,1.595,1.614,1.625,1.565,-0.85%
3,2020-01-02,KR7000060004,메리츠화재해상보험보통주,17950,17950,17050,17150,301623,보험업,8.08,...,1157.35,1155.02,1161.15,1153.48,0.29%,1.595,1.614,1.625,1.565,-0.85%
4,2020-01-02,KR7000070003,삼양홀딩스보통주,67000,68000,65700,67300,9070,기타 금융업,7.44,...,1157.35,1155.02,1161.15,1153.48,0.29%,1.595,1.614,1.625,1.565,-0.85%


In [257]:
total_data.columns

Index(['거래일자', '종목코드', '종목명', '시가', '고가', '저가', '종가', '거래량', '업종구분', 'PER',
       '상장일', '시가총액', 'EX_RATE 종가', 'EX_RATE 오픈', 'EX_RATE 고가', 'EX_RATE 저가',
       'EX_RATE 변동 %', 'BOND 종가', 'BOND 오픈', 'BOND 고가', 'BOND 저가',
       'BOND 변동 %'],
      dtype='object')

In [43]:
import datetime
start_date = datetime.datetime.strptime('2020-02-28', '%Y-%m-%d')
start_date.date()

data = { 'test' : [start_date.date()] }

pd.DataFrame(data)

Unnamed: 0,test
0,2020-02-28


In [158]:
start_date = datetime.datetime.strptime('2020년 01월 07일', '%Y년 %m월 %d일')
data = { 'test' : [start_date] }

pd.DataFrame(data)

Unnamed: 0,test
0,2020-01-07


In [160]:
jo = KRXStockCrawler().target("KR7000020008", 20191201, 20191231)
jo

[{'TRD_DD': '2019/12/30',
  'TDD_CLSPRC': '8,310',
  'FLUC_TP_CD': '1',
  'CMPPREVDD_PRC': '170',
  'FLUC_RT': '2.09',
  'TDD_OPNPRC': '8,110',
  'TDD_HGPRC': '8,420',
  'TDD_LWPRC': '8,110',
  'ACC_TRDVOL': '198,584',
  'ACC_TRDVAL': '1,656,632,670',
  'MKTCAP': '232,110,515,700',
  'LIST_SHRS': '27,931,470'},
 {'TRD_DD': '2019/12/27',
  'TDD_CLSPRC': '8,140',
  'FLUC_TP_CD': '1',
  'CMPPREVDD_PRC': '140',
  'FLUC_RT': '1.75',
  'TDD_OPNPRC': '8,010',
  'TDD_HGPRC': '8,150',
  'TDD_LWPRC': '7,980',
  'ACC_TRDVOL': '57,641',
  'ACC_TRDVAL': '465,725,790',
  'MKTCAP': '227,362,165,800',
  'LIST_SHRS': '27,931,470'},
 {'TRD_DD': '2019/12/26',
  'TDD_CLSPRC': '8,000',
  'FLUC_TP_CD': '2',
  'CMPPREVDD_PRC': '-150',
  'FLUC_RT': '-1.84',
  'TDD_OPNPRC': '8,160',
  'TDD_HGPRC': '8,160',
  'TDD_LWPRC': '8,000',
  'ACC_TRDVOL': '318,154',
  'ACC_TRDVAL': '2,563,066,660',
  'MKTCAP': '223,451,760,000',
  'LIST_SHRS': '27,931,470'},
 {'TRD_DD': '2019/12/24',
  'TDD_CLSPRC': '8,150',
  'FLUC_TP_

### (4) 

#### KRX 종목을 기준으로 데이터 분리

```js
```

In [16]:
csv_data = csv_handler.get_CSV_data()
csv_data[202002]

Unnamed: 0,거래일자,시장ID,종목코드,종목명,시가,고가,저가,종가,거래량,거래대금,...,주문량기준취소율,종목별거래횟수기준매수매도불균형,종목별거래량기준매수매도불균형,주문횟수기준주문불균형,주문량기준주문불균형,평균호가스프레드,평균비율스프레드,평균유효스프레드,평균실현스프레드,HS 역선택비용
0,20200203,STK,KR7000020008,동화약품보통주,7790,7840,7370,7530,394550,2967584650,...,0.352018,0.380111,0.436301,1.402451,0.859184,14.635436,0.192036,17.111547,0.303962,16.807585
1,20200204,STK,KR7000020008,동화약품보통주,7550,7640,7530,7600,87815,666582090,...,0.367710,0.660454,0.487321,0.889005,0.950656,12.786766,0.170271,13.298217,6.397621,6.900596
2,20200205,STK,KR7000020008,동화약품보통주,7640,7770,7630,7680,120557,925304910,...,0.383564,0.508726,0.434874,0.945687,0.911906,12.402631,0.168878,15.636998,-2.842870,18.479868
3,20200206,STK,KR7000020008,동화약품보통주,7750,7890,7690,7750,116790,909542280,...,0.353751,0.433333,0.462323,1.338227,1.023081,16.000567,0.181895,18.468254,7.411765,11.056489
4,20200207,STK,KR7000020008,동화약품보통주,7750,7790,7580,7670,182278,1399518850,...,0.319877,0.323601,0.317865,1.582620,0.779785,18.445407,0.243364,29.355231,16.950221,12.405010
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18324,20200224,STK,KYG5307W1015,엘브이엠씨홀딩스보통주,4835,5070,4770,5070,435177,2138539185,...,0.175747,0.523097,0.572139,1.992132,1.014681,13.430841,0.229276,12.694611,-4.667135,17.361746
18325,20200225,STK,KYG5307W1015,엘브이엠씨홀딩스보통주,5080,5150,4970,5070,234995,1189625830,...,0.207188,0.530172,0.462496,0.954404,0.894499,17.561368,0.247898,13.300493,-2.688278,15.988771
18326,20200226,STK,KYG5307W1015,엘브이엠씨홀딩스보통주,5050,5130,4980,5040,217641,1099346705,...,0.165647,0.375212,0.297810,1.407659,0.767952,13.510749,0.270839,16.419205,7.811951,8.607254
18327,20200227,STK,KYG5307W1015,엘브이엠씨홀딩스보통주,5000,5110,4850,4940,279812,1396515140,...,0.216348,0.354447,0.210917,1.103750,0.712317,21.297175,0.262642,13.877159,-1.620339,15.497498


In [339]:
for _, data_value in csv_data.items():
    selected_df = data_value[selected_properties]
    corps_id = selected_df[["종목코드","종목명"]].drop_duplicates()
    print(id(selected_df), id(corps_id))
    break
    
indicator = Utils.clone(selected_df)

1805686697600 1805684077520


In [194]:
indicator.tail()

Unnamed: 0,거래일자,종목코드,종목명,시가,고가,저가,종가,거래량,업종구분,PER,상장일,시가총액
18322,20200123,KYG5307W1015,엘브이엠씨홀딩스보통주,4220,4325,4200,4290,178258,자동차 판매업,-,20101130,219306683310
18323,20200128,KYG5307W1015,엘브이엠씨홀딩스보통주,4050,4200,3740,4095,348080,자동차 판매업,-,20101130,209338197705
18324,20200129,KYG5307W1015,엘브이엠씨홀딩스보통주,4095,4180,4095,4105,104726,자동차 판매업,-,20101130,209849402095
18325,20200130,KYG5307W1015,엘브이엠씨홀딩스보통주,4100,4145,3985,4050,145074,자동차 판매업,-,20101130,207037777950
18326,20200131,KYG5307W1015,엘브이엠씨홀딩스보통주,4000,4250,4000,4085,193650,자동차 판매업,-,20101130,208826993315


In [219]:
import datetime
pd.to_datetime(indicator["거래일자"].astype('str'))

0       2020-01-02
1       2020-01-03
2       2020-01-06
3       2020-01-07
4       2020-01-08
           ...    
18322   2020-01-23
18323   2020-01-28
18324   2020-01-29
18325   2020-01-30
18326   2020-01-31
Name: 거래일자, Length: 18327, dtype: datetime64[ns]

In [100]:
for a in raw_df:
    print(a)

Open
High
Low
Close
Volume


In [341]:
indicator.loc[indicator["종목코드"] == "KR7000020008"]

Unnamed: 0,거래일자,종목코드,종목명,시가,고가,저가,종가,거래량,업종구분,PER,상장일,시가총액
0,20200102,KR7000020008,동화약품보통주,8340,8400,8290,8400,111305,의약품 제조업,23.01,19760324,234624348000
1,20200103,KR7000020008,동화약품보통주,8400,8450,8290,8360,96437,의약품 제조업,22.9,19760324,233507089200
2,20200106,KR7000020008,동화약품보통주,8290,8330,8120,8180,73230,의약품 제조업,22.41,19760324,228479424600
3,20200107,KR7000020008,동화약품보통주,8200,8280,8090,8160,117904,의약품 제조업,22.36,19760324,227920795200
4,20200108,KR7000020008,동화약품보통주,8170,8170,7830,7930,263246,의약품 제조업,21.73,19760324,221496557100
5,20200109,KR7000020008,동화약품보통주,8020,8060,7900,7900,50346,의약품 제조업,21.64,19760324,220658613000
6,20200110,KR7000020008,동화약품보통주,7970,8140,7880,8100,77059,의약품 제조업,22.19,19760324,226244907000
7,20200113,KR7000020008,동화약품보통주,8140,8250,8070,8220,91646,의약품 제조업,22.52,19760324,229596683400
8,20200114,KR7000020008,동화약품보통주,8240,8240,8070,8140,100901,의약품 제조업,22.3,19760324,227362165800
9,20200115,KR7000020008,동화약품보통주,8160,8170,8000,8090,72255,의약품 제조업,22.16,19760324,225965592300


<hr><hr><hr><hr><hr><hr><hr><hr><hr><hr><hr>

In [3]:
import random
import datetime

start_date = datetime.datetime.strptime(str(20210406), '%Y%m%d')

def randomly(start=0, end=100):
    for _ in range(30):
        yield random.randrange(start, end)

raw_data = {
            'Date': [start_date.date() + datetime.timedelta(i) for i in range(30)],
            'Open': list(randomly()),
            'High': list(randomly()),
            'Low': list(randomly()),
            'Close': list(randomly()),
            'Volume': list(randomly(start=1000, end=10000))
           }

raw_df = pd.DataFrame(raw_data)

In [4]:
raw_df

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2021-04-06,15,3,79,82,3290
1,2021-04-07,50,81,0,7,6502
2,2021-04-08,4,87,86,71,7801
3,2021-04-09,49,35,79,85,3330
4,2021-04-10,19,80,33,75,3579
5,2021-04-11,86,63,79,57,6192
6,2021-04-12,67,82,28,14,5756
7,2021-04-13,25,35,66,55,9838
8,2021-04-14,9,78,39,88,5035
9,2021-04-15,45,32,22,61,1327


In [324]:
start_date = datetime.datetime.strptime(str(20210506), '%Y%m%d') 

raw_data2 = {
            'Date': [start_date.date() + datetime.timedelta(i) for i in range(30)],
            'Open': list(randomly()),
            'High': list(randomly()),
            'Low': list(randomly()),
            'Close': list(randomly()),
            'Volume': list(randomly(start=1000, end=10000))
           }

raw_df2 = pd.DataFrame(raw_data2)

In [325]:
raw_df2.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2021-05-06,44,38,16,77,2809
1,2021-05-07,49,5,70,1,2959
2,2021-05-08,92,58,35,5,1918
3,2021-05-09,72,63,49,95,7482
4,2021-05-10,77,96,43,79,5282


In [326]:
concated = pd.concat([raw_df, raw_df2], ignore_index=True)

In [327]:
a = concated[["Date","Close"]]

In [328]:
print(id(concated), id(a))

1805686696976 1805686696832


In [None]:
import pandas
import numpy

class StockAnalyzer:
    """
    주식 분석에 필요한 계산 및 분석을 위한 클래스입니다.
    모든 함수들은 모듈 pandas에 대한 의존성을 가집니다.
    """
    @staticmethod
    def RSI_calculation(values)->float:
        """
        Calculation of Relative Strength Index (RSI)     
        Avg(PriceUp)/(Avg(PriceUP)+Avg(PriceDown)*100
        Where: PriceUp(t)=1*(Price(t)-Price(t-1)){Price(t)- Price(t-1)>0};
               PriceDown(t)=-1*(Price(t)-Price(t-1)){Price(t)- Price(t-1)<0};
        """
        up = values[values>0].mean()
        down = -1*values[values<0].mean()
        return 100 * up / (up + down)
    
    # as_of_property 수정 필요
    @staticmethod
    def add_Momentum_1D(ins_ref:pandas, as_of_property:str)->pandas:
        """
        Add Momentum_1D column for all 15 stocks.
        Momentum_1D = P(t) - P(t-1)
        """
        ins_ref['Momentum_1D'] = (ins_ref[as_of_property]-ins_ref[as_of_property].shift(1)).fillna(0)
        return ins_ref
    
    @staticmethod
    def add_RSI_14D(ins_ref:pandas)->pandas:
        """
        Calculation of Relative Strength Index (RSI)
        """
        ins_ref['RSI_14D'] = ins_ref['Momentum_1D'].rolling(center=False, window=14).apply(StockAnalyzer.RSI_calculation).fillna(0)
        return ins_ref
        
    @staticmethod
    def bollinger_bands_calculation(price:int, length:int=30, numsd:int=2)->float:
        """ 
        Calculation of Bollinger Bands
        returns average, upper band, and lower band
        """
        ave = price.rolling(window = length, center = False).mean()
        sd = price.rolling(window = length, center = False).std()
        upband = ave + (sd*numsd)
        dnband = ave - (sd*numsd)
        print(type(numpy.round(ave,3)), type(numpy.round(upband,3)), type(numpy.round(dnband,3)))
        return numpy.round(ave,3), numpy.round(upband,3), numpy.round(dnband,3)
        
    # as_of_property 수정 필요
    @staticmethod
    def add_BB_Band(ins_ref:pandas, as_of_property:str)->pandas:
        ins_ref['BB_Middle_Band'], ins_ref['BB_Upper_Band'], ins_ref['BB_Lower_Band'] = StockAnalyzer.bollinger_bands_calculation(ins_ref[as_of_property], length=20, numsd=1)
        ins_ref['BB_Middle_Band'] = ins_ref['BB_Middle_Band'].fillna(0)
        ins_ref['BB_Upper_Band'] = ins_ref['BB_Upper_Band'].fillna(0)
        ins_ref['BB_Lower_Band'] = ins_ref['BB_Lower_Band'].fillna(0)
        return ins_ref

    # Date, High, Low 수정 필요
    @staticmethod
    def aroon_oscillator_calculation(df, tf=25):  
        """
        Calculation of Aroon Oscillator
        return type of tuple<list<float>>
        """
        aroonup = []
        aroondown = []
        x = tf
        while x < len(df['Date']):
            aroon_up = ((df['High'][x-tf:x].tolist().index(max(df['High'][x-tf:x])))/float(tf))*100
            aroon_down = ((df['Low'][x-tf:x].tolist().index(min(df['Low'][x-tf:x])))/float(tf))*100
            aroonup.append(aroon_up)
            aroondown.append(aroon_down)
            x+=1
        return aroonup, aroondown
    
    @staticmethod
    def add_aroon_oscillator(ins_ref:pandas)->pandas:
        """
        """
        list_of_zeros = [0] * 25
        up, down = StockAnalyzer.aroon_oscillator_calculation(ins_ref)
        aroon_list = [x - y for x, y in zip(up,down)]
        if len(aroon_list) == 0:
            aroon_list = [0] * ins_ref.shape[0]
            ins_ref['Aroon_Oscillator'] = aroon_list
        else:
            ins_ref['Aroon_Oscillator'] = list_of_zeros + aroon_list
        return ins_ref

    # Close, Volume 수정 필요
    @staticmethod
    def add_PVT(ins_ref:pandas)->pandas:
        """
        Calculation of Price Volume Trend
        PVT = [((CurrentClose - PreviousClose) / PreviousClose) x Volume] + PreviousPVT
        """
        ins_ref["PVT"] = (ins_ref['Momentum_1D']/ ins_ref['Close'].shift(1)) * ins_ref['Volume']
        ins_ref["PVT"] = ins_ref["PVT"] - ins_ref["PVT"].shift(1)
        ins_ref["PVT"] = ins_ref["PVT"].fillna(0)
        return ins_ref
    
    # Close, High, Low 수정필요
    @staticmethod
    def add_AB_Band(ins_ref:pandas)->pandas:
        """
        Calculation of Acceleration Bands
        """
        #ins_ref['AB_Middle_Band'] = pd.rolling_mean(df['Close'], 20)
        ins_ref['AB_Middle_Band'] = ins_ref['Close'].rolling(window = 20, center=False).mean()
        # High * ( 1 + 4 * (High - Low) / (High + Low))
        ins_ref['aupband'] = ins_ref['High'] * (1 + 4 * (ins_ref['High']-ins_ref['Low'])/(ins_ref['High']+ins_ref['Low']))
        ins_ref['AB_Upper_Band'] = ins_ref['aupband'].rolling(window=20, center=False).mean()
        # Low *(1 - 4 * (High - Low)/ (High + Low))
        ins_ref['adownband'] = ins_ref['Low'] * (1 - 4 * (ins_ref['High']-ins_ref['Low'])/(ins_ref['High']+ins_ref['Low']))
        ins_ref['AB_Lower_Band'] = ins_ref['adownband'].rolling(window=20, center=False).mean()
        ins_ref = ins_ref.fillna(0)
        return ins_ref

In [6]:
StockAnalyzer.add_Momentum_1D(raw_df, "Close")
StockAnalyzer.add_RSI_14D(raw_df).tail(5)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Momentum_1D,RSI_14D
25,2021-05-01,5,46,59,32,6816,-31.0,66.364812
26,2021-05-02,14,57,23,17,7302,-15.0,56.818182
27,2021-05-03,46,78,16,69,7719,52.0,66.631908
28,2021-05-04,60,33,87,31,9251,-38.0,63.645418
29,2021-05-05,81,57,78,24,3167,-7.0,67.823344


In [331]:
StockAnalyzer.add_BB_Band(raw_df, "Close").tail(5)

<class 'pandas.core.series.Series'> <class 'pandas.core.series.Series'> <class 'pandas.core.series.Series'>


Unnamed: 0,Date,Open,High,Low,Close,Volume,Momentum_1D,RSI_14D,BB_Middle_Band,BB_Upper_Band,BB_Lower_Band
25,2021-05-01,93,11,4,99,2744,23.0,59.318182,51.1,82.768,19.432
26,2021-05-02,53,90,22,44,6176,-55.0,54.149378,51.2,82.84,19.56
27,2021-05-03,14,71,53,35,8377,-9.0,57.459926,51.25,82.863,19.637
28,2021-05-04,57,73,4,65,6065,30.0,56.437768,49.9,80.231,19.569
29,2021-05-05,69,93,85,75,2054,10.0,52.235294,50.2,80.759,19.641


In [332]:
StockAnalyzer.add_aroon_oscillator(raw_df).tail()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Momentum_1D,RSI_14D,BB_Middle_Band,BB_Upper_Band,BB_Lower_Band,Aroon_Oscillator
25,2021-05-01,93,11,4,99,2744,23.0,59.318182,51.1,82.768,19.432,88.0
26,2021-05-02,53,90,22,44,6176,-55.0,54.149378,51.2,82.84,19.56,88.0
27,2021-05-03,14,71,53,35,8377,-9.0,57.459926,51.25,82.863,19.637,88.0
28,2021-05-04,57,73,4,65,6065,30.0,56.437768,49.9,80.231,19.569,4.0
29,2021-05-05,69,93,85,75,2054,10.0,52.235294,50.2,80.759,19.641,4.0


In [333]:
StockAnalyzer.add_PVT(raw_df).tail()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Momentum_1D,RSI_14D,BB_Middle_Band,BB_Upper_Band,BB_Lower_Band,Aroon_Oscillator,PVT
25,2021-05-01,93,11,4,99,2744,23.0,59.318182,51.1,82.768,19.432,88.0,2479.238257
26,2021-05-02,53,90,22,44,6176,-55.0,54.149378,51.2,82.84,19.56,88.0,-4261.532164
27,2021-05-03,14,71,53,35,8377,-9.0,57.459926,51.25,82.863,19.637,88.0,1717.633838
28,2021-05-04,57,73,4,65,6065,30.0,56.437768,49.9,80.231,19.569,4.0,6912.048701
29,2021-05-05,69,93,85,75,2054,10.0,52.235294,50.2,80.759,19.641,4.0,-4882.571429


In [336]:
StockAnalyzer.add_AB_Band(raw_df).tail()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Momentum_1D,RSI_14D,BB_Middle_Band,BB_Upper_Band,BB_Lower_Band,Aroon_Oscillator,PVT,AB_Middle_Band,aupband,AB_Upper_Band,adownband,AB_Lower_Band
25,2021-05-01,93,11,4,99,2744,23.0,59.318182,51.1,82.768,19.432,88.0,2479.238257,51.1,31.533333,84.23295,-3.466667,48.73295
26,2021-05-02,53,90,22,44,6176,-55.0,54.149378,51.2,82.84,19.56,88.0,-4261.532164,51.2,308.571429,95.500562,-31.428571,44.000562
27,2021-05-03,14,71,53,35,8377,-9.0,57.459926,51.25,82.863,19.637,88.0,1717.633838,51.25,112.225806,95.9061,22.225806,44.1561
28,2021-05-04,57,73,4,65,6065,30.0,56.437768,49.9,80.231,19.569,4.0,6912.048701,49.9,334.662338,110.877347,-10.337662,38.127347
29,2021-05-05,69,93,85,75,2054,10.0,52.235294,50.2,80.759,19.641,4.0,-4882.571429,50.2,109.719101,116.756159,69.719101,37.506159


In [338]:
columns_to_drop = ['Momentum_1D', 'aupband', 'adownband']
raw_df = raw_df.drop(labels = columns_to_drop, axis=1)
raw_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,RSI_14D,BB_Middle_Band,BB_Upper_Band,BB_Lower_Band,Aroon_Oscillator,PVT,AB_Middle_Band,AB_Upper_Band,AB_Lower_Band
0,2021-04-06,41,62,31,10,7856,0.0,0.0,0.0,0.0,0.0,0.0,,,
1,2021-04-07,2,8,54,26,9452,0.0,0.0,0.0,0.0,0.0,0.0,,,
2,2021-04-08,85,48,0,33,5168,0.0,0.0,0.0,0.0,0.0,-13731.815385,,,
3,2021-04-09,9,7,70,64,1740,0.0,0.0,0.0,0.0,0.0,243.160839,,,
4,2021-04-10,95,15,58,2,2257,0.0,0.0,0.0,0.0,0.0,-3821.014205,,,


visualization

# 정리

## 파라미터
 - (Required)시가, 종가, 고가, 저가, Volume
 - (Optional)RSI, AROON, ...
 - (Optional)금리, 환율
 - Visualization
 
시가, 종가, 고가, 저가, Volume

## Process

 1. 데이터 정규화
 2. 데이터 전처리
 3. 데이터 시각화
 
 4. 인공지능 학습
 5. 결과 분석 및 최적화
 6. (4-5)은 몇 번 반복
 
 7. 비즈니스 모델 제안
