# `finaStat.ipynb`

- **코스피 개별 종목의 종목 코드로 재무정보 가져오기**
- 종목코드 -> 고유번호 -> 법인등록번호 -> 재무정보

1. '금융감독원\_고유번호'를 통해 '종목 코드'로 '고유번호' 가져오기
   - [**금융감독원\_고유번호**](https://opendart.fss.or.kr/guide/detail.do?apiGrpCd=DS001&apiId=2019018)
2. '금융감독원*공시정보*기업개황'에서 '고유번호'로 '법인등록번호' 가져오기
   - [**금융감독원*공시정보*기업개황**](https://opendart.fss.or.kr/guide/detail.do?apiGrpCd=DS001&apiId=2019002)
   - KOSPI200 기업 목록 활용
3. 가져온 '법인등록번호'으로 재무정보 가져오기
   - [**금융위원회\_기업 재무정보**](https://www.data.go.kr/tcs/dss/selectApiDataDetailView.do?publicDataPk=15043459)

---

- [DART](https://dart.fss.or.kr/main.do) : 금융감독원에서 운영하는 기업정보전자공시시스템
- FSC : 금융위원회
- 연결재무제표(ConsolidatedMember)와 별도재무제표(SeparateMember)
  - 연결재무제표는 종속기업의 실적이 포함
  - 별도재무제표는 종속기업의 실적이 포함되지 않습니다.


# import


In [1]:
import sys
import time
import pickle
import json
from glob import glob
from io import BytesIO
from zipfile import ZipFile

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import koreanize_matplotlib
import requests
from bs4 import BeautifulSoup as bs

import FinanceDataReader as fdr
from tqdm import tqdm
import xmltodict

pd.options.display.max_columns = None


sys.path.append("../import")
import module as m
from gitig_auth import authKey

data_path = m.data_path
fp_fs = f"""{m.fp["finaStat"]}"""

fp_fs_cm = f"{data_path}finaStat_cm.parquet"
fp_fs_sm = f"{data_path}finaStat_sm.parquet"

authKey_dart = authKey["dart"]
authKey_fscfs = authKey["fsc_finaStatInfo"]

data_path : ../data/
fp
{'esgRating': '../data/esgRating.parquet',
 'finaStat': '../data/finaStat.parquet',
 'stockPrice': '../data/stockPrice.parquet',
 'stockPrice_year': '../data/stockPrice_year.parquet'}


# `DART_corpCode`

1. '금융감독원\_고유번호'를 통해 '종목 코드'로 '고유번호' 가져오기
   - [**금융감독원\_고유번호**](https://opendart.fss.or.kr/guide/detail.do?apiGrpCd=DS001&apiId=2019018)
2. '금융감독원*공시정보*기업개황'에서 '고유번호'로 '법인등록번호' 가져오기
   - [**금융감독원*공시정보*기업개황**](https://opendart.fss.or.kr/guide/detail.do?apiGrpCd=DS001&apiId=2019002)
   - KOSPI200 기업 목록 활용


## 금융감독원\_고유번호


In [2]:
url = f"https://opendart.fss.or.kr/api/corpCode.xml?crtfc_key={authKey_dart}"
response = requests.get(url)
if response.status_code == 200:
    with ZipFile(BytesIO(response.content)) as f:
        df_cc_raw = f.read("CORPCODE.xml")
        df_cc_raw = pd.read_xml(df_cc_raw)

    display(df_cc_raw)
    # (선택) 실행 시간이 오래걸려서 백업
    df_cc = df_cc_raw.copy()
    
else:
    print(response.status_code)

Unnamed: 0,corp_code,corp_name,stock_code,modify_date
0,434003,다코,,20170630
1,434456,일산약품,,20170630
2,430964,굿앤엘에스,,20170630
3,432403,한라판지,,20170630
4,388953,크레디피아제이십오차유동화전문회사,,20170630
...,...,...,...,...
97184,151571,청림실업,,20221114
97185,1143889,에이치엠지하우징,,20221114
97186,1359578,성남대장피에프브이,,20221114
97187,1002944,스마트에프앤디,,20221114


### 전처리


In [3]:
# stock_code가 없는 행 제거
df_cc = df_cc.dropna(subset=["stock_code"])
# code 글자수
df_cc["corp_code"] = df_cc["corp_code"].astype(int).astype(str).apply(lambda x: x.zfill(8))
df_cc["stock_code"] = df_cc["stock_code"].astype(int).astype(str).apply(lambda x: x.zfill(6))
# 컬럼 순서 설정
df_cc = df_cc[["stock_code", "corp_code", "corp_name"]]
df_cc.head()

Unnamed: 0,stock_code,corp_code,corp_name
2009,36720,260985,한빛네트
2021,40130,264529,엔플렉스
2022,55000,358545,동서정보기술
2784,32600,231567,애드모바일
3889,37600,247939,씨모스


### 전처리 : 분석 종목만 남김


In [4]:
df_components = pd.read_csv(f"{data_path}components_list.csv")
df_components["종목코드"] = df_components["종목코드"].astype(str).apply(lambda x: x.zfill(6))
df_components

Unnamed: 0,종목코드,종목명
0,000020,동화약품
1,000030,우리은행
2,000050,경방
3,000070,삼양홀딩스
4,000080,하이트진로
...,...,...
294,271560,오리온
295,282330,BGF리테일
296,285130,SK케미칼
297,294870,HDC현대산업개발


In [5]:
list_components = set(df_components["종목코드"].to_list())  # KOSPI200 종목코드 리스트
df_cc.drop(df_cc[~df_cc["stock_code"].isin(list_components)].index, inplace=True)
df_cc

Unnamed: 0,stock_code,corp_code,corp_name
4915,103150,00684547,하이트맥주
10911,003640,00140380,유니온스틸
13182,064420,00399773,케이피케미칼
14064,053000,00375302,우리금융지주
25839,068870,00423609,LG생명과학
...,...,...,...
97058,079980,00362238,휴비스
97062,010120,00105855,엘에스일렉트릭
97068,005930,00126380,삼성전자
97131,096760,00632304,JW홀딩스


## 금융감독원*공시정보*기업개황


### 함수 : 종목코드 -> 법인등록번호


In [6]:
# 종목코드 -> 고유번호
def stockCode_to_corpCode(stock_code, df=df_cc):
    cropCode = df[df["stock_code"] == stock_code]["corp_code"].values[0]
    return cropCode


# 고유번호 -> 법인등록번호
def corpCode_to_jurirNo(corp_code, authKey=authKey_dart):

    url = "https://opendart.fss.or.kr/api/company.json"
    params = {"crtfc_key": authKey, "corp_code": corp_code}

    response = requests.get(url, params=params)
    if response.status_code == 200:
        return response.json()["jurir_no"]
    else:
        print(response.status_code)

    time.sleep(0.01)


# 종목코드 -> 고유번호 -> 법인등록번호
def stockCode_to_jurirNo(stock_code, df=df_cc, authKey=authKey_dart):

    cropCode = stockCode_to_corpCode(stock_code, df)
    jurirNo = corpCode_to_jurirNo(cropCode, authKey)
    return jurirNo


# 함수 테스트 : 삼성전자, 00126380, 1301110006246
stock_code = "005930"
cropCode = stockCode_to_corpCode(stock_code)
jurirNo = stockCode_to_jurirNo(stock_code)
print(cropCode == "00126380")
print(jurirNo == "1301110006246")

True
True


In [7]:
def temp(stock_code):
    jurirNo = stockCode_to_jurirNo(stock_code)
    return jurirNo


df_cc["jurir_no"] = df_cc["stock_code"].map(temp)
df_cc

Unnamed: 0,stock_code,corp_code,corp_name,jurir_no
4915,103150,00684547,하이트맥주,1101113927427
10911,003640,00140380,유니온스틸,1101110041501
13182,064420,00399773,케이피케미칼,2301110082112
14064,053000,00375302,우리금융지주,1101112202797
25839,068870,00423609,LG생명과학,1101112581183
...,...,...,...,...
97058,079980,00362238,휴비스,1101112102070
97062,010120,00105855,엘에스일렉트릭,1101110520076
97068,005930,00126380,삼성전자,1301110006246
97131,096760,00632304,JW홀딩스,1101113710468


## (선택) 영속화


In [8]:
m.DfPrst(df_cc, "./DART_corpCode.pickle")

['./DART_corpCode.pickle']


In [9]:
if glob("DART_corpCode.pickle"):
    df_cc = m.DataLoad("./DART_corpCode.pickle")

Mem. usage decreased to  0.01 Mb (0.0% reduction)


[1m┌▣ [4mdf.shape[0m ---- ---- ---- ----
(299, 4)


[1m┌▣ [4mdf.info()[0m ---- ---- ---- ----
<class 'pandas.core.frame.DataFrame'>
Int64Index: 299 entries, 4915 to 97138
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   stock_code  299 non-null    object
 1   corp_code   299 non-null    object
 2   corp_name   299 non-null    object
 3   jurir_no    299 non-null    object
dtypes: object(4)
memory usage: 11.7+ KB
None


[1m┌▣ [4mdf.head()[0m ---- ---- ---- ----


Unnamed: 0,stock_code,corp_code,corp_name,jurir_no
4915,103150,684547,하이트맥주,1101113927427
10911,3640,140380,유니온스틸,1101110041501
13182,64420,399773,케이피케미칼,2301110082112
14064,53000,375302,우리금융지주,1101112202797
25839,68870,423609,LG생명과학,1101112581183




[1m┌▣ [4mdf.columns.to_list()[0m ---- ---- ---- ----
['stock_code', 'corp_code', 'corp_name', 'jurir_no']


# `finaStat`

금융위원회\_기업 재무정보

- 요청 URL : http://apis.data.go.kr/1160100/service/GetFinaStatInfoService/getSummFinaStat


## 금융위원회\_기업 재무정보


In [10]:
# 함수
def Get_FinaStatInfo(
    crno,
    authKey,
    bizYear="",
    numOfRows="",
    pageNo="",
    url="http://apis.data.go.kr/1160100/service/GetFinaStatInfoService/getSummFinaStat",
):

    params = {
        "serviceKey": authKey,
        "numOfRows": numOfRows,
        "pageNo": numOfRows,
        "resultType": "json",
        "crno": crno,
        "bizYear": bizYear,
    }

    c = 0

    def func(c):
        try:
            response = requests.get(url, params=params)
            time.sleep(0.01)

            rsc = response.status_code
            if rsc == 200:
                rj = response.json()
                # totalCount(회계 정보 데이터 행의 수)가 0이면 pass
                totalCount = rj["response"]["body"]["totalCount"]
                if totalCount != 0:
                    data_json = rj["response"]["body"]["items"]["item"]
                    return pd.json_normalize(data_json)

            else:
                print(rsc)

        except:
            c += 1
            print(f"errCount : {c}, crno : {crno}")
            time.sleep(2)
            func(c)

    return func(c)


# test, 삼성전자
jurirNo = "1301110006246"
t = Get_FinaStatInfo(jurirNo, authKey=authKey_fscfs)
t

Unnamed: 0,basDt,crno,bizYear,fnclDcd,fnclDcdNm,enpSaleAmt,enpBzopPft,iclsPalClcAmt,enpCrtmNpf,enpTastAmt,enpTdbtAmt,enpTcptAmt,enpCptlAmt,fnclDebtRto
0,20151231,1301110006246,2015,ifrs_ConsolidatedMember,연결요약재무제표,200653482000000,26413442000000,25960995000000,19060144000000,242179521000000,63119716000000,179059805000000,0,35.2506337198
1,20151231,1301110006246,2015,ifrs_SeparateMember,별도요약재무제표,135205045000000,13398215000000,14352617000000,12238469000000,168969630000000,32541375000000,136428255000000,0,23.8523720764
2,20161231,1301110006246,2016,ifrs_ConsolidatedMember,연결요약재무제표,201866745000000,29240672000000,30713652000000,22726092000000,262174324000000,69211291000000,192963033000000,0,35.8676425862
3,20161231,1301110006246,2016,ifrs_SeparateMember,별도요약재무제표,133947204000000,13647436000000,14725074000000,11579749000000,174802959000000,37256197000000,137546762000000,0,27.0862043266
4,20171231,1301110006246,2017,ifrs_ConsolidatedMember,연결요약재무제표,239575376000000,53645038000000,56195967000000,42186747000000,301752090000000,87260662000000,214491428000000,897514000000,40.6825870915
5,20171231,1301110006246,2017,ifrs_SeparateMember,별도요약재무제표,161915007000000,34857091000000,36533552000000,28800837000000,198241360000000,46671585000000,151569775000000,897514000000,30.7921450698
6,20181231,1301110006246,2018,ifrs_ConsolidatedMember,연결요약재무제표,243771415000000,58886669000000,61159958000000,44344857000000,339357244000000,91604067000000,247753177000000,897514000000,36.9739222355
7,20181231,1301110006246,2018,ifrs_SeparateMember,별도요약재무제표,170381870000000,43699451000000,44398855000000,32815127000000,219021357000000,46033232000000,172988125000000,897514000000,26.6106312211
8,20191231,1301110006246,2019,ifrs_ConsolidatedMember,연결요약재무제표,230400881000000,27768509000000,30432189000000,21738865000000,352564497000000,89684076000000,262880421000000,897514000000,34.1159207136
9,20191231,1301110006246,2019,ifrs_SeparateMember,별도요약재무제표,154772859000000,14115067000000,19032469000000,15353323000000,216180920000000,38310673000000,177870247000000,897514000000,21.5385505143


In [11]:
df_fs = pd.DataFrame()

for jurirNo in tqdm(df_cc["jurir_no"].values[:]):
    tmp = Get_FinaStatInfo(jurirNo, authKey=authKey_fscfs)
    df_fs = pd.concat([df_fs, tmp], axis=0, sort=False)

# (선택) 실행 시간이 오래걸려서 백업
m.DfPrst(df_fs, "./df_fs_raw.pickle")

 35%|███▌      | 106/299 [01:02<00:24,  7.96it/s]

errCount : 1, crno : 1101110005078


 60%|█████▉    | 179/299 [01:41<00:13,  9.14it/s]

errCount : 1, crno : 1101110095285


 81%|████████▏ | 243/299 [03:18<00:39,  1.41it/s]

errCount : 1, crno : 1845110000642


 83%|████████▎ | 247/299 [03:57<03:58,  4.58s/it]

errCount : 1, crno : 1801110003268


100%|██████████| 299/299 [04:44<00:00,  1.05it/s]

['../df_fs_raw.pickle']





In [14]:
# (선택) 백업한 피클 불러오기
if glob("./df_fs_raw.pickle"):
    df_fs = m.DataLoad("./df_fs_raw.pickle")

Mem. usage decreased to  0.29 Mb (0.0% reduction)


[1m┌▣ [4mdf.shape[0m ---- ---- ---- ----
(2551, 14)


[1m┌▣ [4mdf.info()[0m ---- ---- ---- ----
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2551 entries, 0 to 9
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   basDt          2551 non-null   object
 1   crno           2551 non-null   object
 2   bizYear        2551 non-null   object
 3   fnclDcd        2551 non-null   object
 4   fnclDcdNm      2551 non-null   object
 5   enpSaleAmt     2551 non-null   object
 6   enpBzopPft     2551 non-null   object
 7   iclsPalClcAmt  2551 non-null   object
 8   enpCrtmNpf     2551 non-null   object
 9   enpTastAmt     2551 non-null   object
 10  enpTdbtAmt     2551 non-null   object
 11  enpTcptAmt     2551 non-null   object
 12  enpCptlAmt     2551 non-null   object
 13  fnclDebtRto    2551 non-null   object
dtypes: object(14)
memory usage: 298.9+ KB
None


[1m

Unnamed: 0,basDt,crno,bizYear,fnclDcd,fnclDcdNm,enpSaleAmt,enpBzopPft,iclsPalClcAmt,enpCrtmNpf,enpTastAmt,enpTdbtAmt,enpTcptAmt,enpCptlAmt,fnclDebtRto
0,20151231,1101112581183,2015,ifrs_ConsolidatedMember,연결요약재무제표,450526355034,25201939449,13894239541,11371548367,706980326807,449358007227,257622319580,84066030000,174.4251072499
1,20151231,1101112581183,2015,ifrs_SeparateMember,별도요약재무제표,435446816306,26136971686,15021017434,12443782196,703705604242,445384074854,258321529388,84066030000,172.4146167411
0,20150331,1748110000151,2015,ifrs_ConsolidatedMember,연결요약재무제표,207890217005,-871462412,-5660470477,-4315628598,727951594676,430331568399,297620026277,0,144.590931525
1,20150331,1748110000151,2015,ifrs_SeparateMember,별도요약재무제표,175932552323,-2615227674,-6690401088,-5677659364,669744438135,397542364257,272202073878,0,146.0467800973
2,20160331,1748110000151,2016,ifrs_ConsolidatedMember,연결요약재무제표,856457055850,48660898849,33307151671,21854084841,713772955053,397204546704,316568408349,0,125.4719473669




[1m┌▣ [4mdf.columns.to_list()[0m ---- ---- ---- ----
['basDt', 'crno', 'bizYear', 'fnclDcd', 'fnclDcdNm', 'enpSaleAmt', 'enpBzopPft', 'iclsPalClcAmt', 'enpCrtmNpf', 'enpTastAmt', 'enpTdbtAmt', 'enpTcptAmt', 'enpCptlAmt', 'fnclDebtRto']


## 전처리 : 기간 설정

In [15]:
s = df_fs["bizYear"].astype(int)
df_fs = df_fs[(s >= 2010) & (s <= 2018)]
df_fs["bizYear"].unique()

array(['2015', '2016', '2018', '2017', '2011', '2012', '2013', '2014',
       '2010'], dtype=object)

## 병합


In [16]:
df_fs = pd.merge(df_fs, df_cc, how="left", left_on="crno", right_on="jurir_no")
df_fs

Unnamed: 0,basDt,crno,bizYear,fnclDcd,fnclDcdNm,enpSaleAmt,enpBzopPft,iclsPalClcAmt,enpCrtmNpf,enpTastAmt,enpTdbtAmt,enpTcptAmt,enpCptlAmt,fnclDebtRto,stock_code,corp_code,corp_name,jurir_no
0,20151231,1101112581183,2015,ifrs_ConsolidatedMember,연결요약재무제표,450526355034,25201939449,13894239541,11371548367,706980326807,449358007227,257622319580,84066030000,174.4251072499,068870,00423609,LG생명과학,1101112581183
1,20151231,1101112581183,2015,ifrs_SeparateMember,별도요약재무제표,435446816306,26136971686,15021017434,12443782196,703705604242,445384074854,258321529388,84066030000,172.4146167411,068870,00423609,LG생명과학,1101112581183
2,20150331,1748110000151,2015,ifrs_ConsolidatedMember,연결요약재무제표,207890217005,-871462412,-5660470477,-4315628598,727951594676,430331568399,297620026277,0,144.590931525,008000,00148717,도레이케미칼,1748110000151
3,20150331,1748110000151,2015,ifrs_SeparateMember,별도요약재무제표,175932552323,-2615227674,-6690401088,-5677659364,669744438135,397542364257,272202073878,0,146.0467800973,008000,00148717,도레이케미칼,1748110000151
4,20160331,1748110000151,2016,ifrs_ConsolidatedMember,연결요약재무제표,856457055850,48660898849,33307151671,21854084841,713772955053,397204546704,316568408349,0,125.4719473669,008000,00148717,도레이케미칼,1748110000151
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1850,20161231,1101110003733,2016,ifrs_SeparateMember,별도요약재무제표,302367877201,28499972317,27147592557,20463037279,687450034887,74990936935,612459097952,0,12.2442359311,001130,00113243,대한제분,1101110003733
1851,20171231,1101110003733,2017,ifrs_ConsolidatedMember,연결요약재무제표,810844358507,36178257282,53127342018,51048348041,873461794681,163053585174,710408209507,8450000000,22.9520975394,001130,00113243,대한제분,1101110003733
1852,20171231,1101110003733,2017,ifrs_SeparateMember,별도요약재무제표,286386535237,19514857445,19981045677,13814407827,689594455044,66157192173,623437262871,8450000000,10.6116839838,001130,00113243,대한제분,1101110003733
1853,20181231,1101110003733,2018,ifrs_ConsolidatedMember,연결요약재무제표,864585835647,32798360071,74914386097,51472655804,919690198948,170684939463,749005259485,8450000000,22.7882164112,001130,00113243,대한제분,1101110003733


## 전처리


In [17]:
# 컬럼명 변경
dict_colReName = {
    "stock_code": "종목코드",
    "corp_name": "종목명",
    "basDt": "연_월_일",
    "crno": "법인등록번호",
    "bizYear": "사업연도",
    "fnclDcd": "재무제표구분코드",
    "fnclDcdNm": "재무제표구분코드명",
    "enpSaleAmt": "기업매출금액",
    "enpBzopPft": "기업영업이익",
    "iclsPalClcAmt": "포괄손익계산금액",
    "enpCrtmNpf": "기업당기순이익",
    "enpTastAmt": "기업총자산금액",
    "enpTdbtAmt": "기업총부채금액",
    "enpTcptAmt": "기업총자본금액",
    "enpCptlAmt": "기업자본금액",
    "fnclDebtRto": "재무제표부채비율",
}
df_fs = df_fs.rename(columns=dict_colReName)

# 컬럼 순서 변경
list_colOrder = [
    "종목코드",
    "종목명",
    "연_월_일",
    "재무제표구분코드명",
    "기업매출금액",
    "기업영업이익",
    "포괄손익계산금액",
    "기업당기순이익",
    "기업총자산금액",
    "기업총부채금액",
    "기업총자본금액",
    "기업자본금액",
    "재무제표부채비율",
]
df_fs = df_fs[list_colOrder]

#
list_roof = [
    "기업매출금액",
    "기업영업이익",
    "포괄손익계산금액",
    "기업당기순이익",
    "기업총자산금액",
    "기업총부채금액",
    "기업총자본금액",
    "기업자본금액",
    "재무제표부채비율",
]
for i in list_roof:
    df_fs[f"{i}"] = pd.to_numeric(df_fs[f"{i}"])

# 파생변수 추가
col = pd.to_datetime(df_fs["연_월_일"], format="%Y-%m-%d")
df_fs["연"] = col.dt.year
df_fs["분기"] = col.dt.quarter
df_fs["월"] = col.dt.month
df_fs["연_분기"] = df_fs["연"].astype("str") + "-" + df_fs["분기"].astype("str")
df_fs["연_월"] = df_fs["연"].astype("str") + "-" + df_fs["월"].astype("str")
df_fs["분기_월"] = df_fs["분기"].astype("str") + "-" + df_fs["월"].astype("str")
df_fs["연_분기_월"] = (
    df_fs["연"].astype("str") + "-" + df_fs["분기"].astype("str") + "-" + df_fs["월"].astype("str")
)


# 정렬
df_fs = df_fs.sort_values(by=["종목코드", "재무제표구분코드명", "연_월_일"], ascending=[True, True, True])

# 아래에 활용
list_col = df_fs.columns.to_list()

# 확인
df_fs.head(2)

Unnamed: 0,종목코드,종목명,연_월_일,재무제표구분코드명,기업매출금액,기업영업이익,포괄손익계산금액,기업당기순이익,기업총자산금액,기업총부채금액,기업총자본금액,기업자본금액,재무제표부채비율,연,분기,월,연_분기,연_월,분기_월,연_분기_월
336,20,동화약품,20151231,별도요약재무제표,223201285434,4812973681,6000622879,5608652157,317187030052,87069287627,230117742425,27931470000,37.836842,2015,4,12,2015-4,2015-12,4-12,2015-4-12
337,20,동화약품,20161231,별도요약재무제표,237470834801,11259333902,35655076190,26254318411,324604536650,71679236748,252925299902,27931470000,28.340082,2016,4,12,2016-4,2016-12,4-12,2016-4-12


### MinMaxScaling
- 일반적인 MinMaxScaling은 컬럼의 Min과 Max를 기준으로 스케일링되지만
- 이 분석의 경우에는 적절하지 못하므로 개별 종목의 Min과 Max를 기준으로 스케일링을 진행함.


In [18]:
l = [
    "기업매출금액",
    "기업영업이익",
    "포괄손익계산금액",
    "기업당기순이익",
    "기업총자산금액",
    "기업총부채금액",
    "기업총자본금액",
    "기업자본금액",
    "재무제표부채비율",
]
df_fs = m.DerivedCol_Groupby_MinMaxScaler(df_fs, ["종목코드", "종목명"],l)
df_fs

Unnamed: 0,종목코드,종목명,연_월_일,재무제표구분코드명,기업매출금액,기업영업이익,포괄손익계산금액,기업당기순이익,기업총자산금액,기업총부채금액,기업총자본금액,기업자본금액,재무제표부채비율,연,분기,월,연_분기,연_월,분기_월,연_분기_월,기업매출금액_mmscl,기업영업이익_mmscl,포괄손익계산금액_mmscl,기업당기순이익_mmscl,기업총자산금액_mmscl,기업총부채금액_mmscl,기업총자본금액_mmscl,기업자본금액_mmscl,재무제표부채비율_mmscl
336,000020,동화약품,20151231,별도요약재무제표,223201285434,4812973681,6000622879,5608652157,317187030052,87069287627,230117742425,27931470000,37.836842,2015,4,12,2015-4,2015-12,4-12,2015-4-12,0.000000,0.000000,0.000000,0.000000,0.000000,1.000000,0.000000,0.0,1.000000
337,000020,동화약품,20161231,별도요약재무제표,237470834801,11259333902,35655076190,26254318411,324604536650,71679236748,252925299902,27931470000,28.340082,2016,4,12,2016-4,2016-12,4-12,2016-4-12,0.171095,1.000000,0.500767,0.498683,0.138873,0.083319,0.338980,0.0,0.329751
338,000020,동화약품,20171231,별도요약재무제표,258881616575,10987308187,65218742497,47009013175,367225133428,70280404999,296944728429,27931470000,23.667841,2017,4,12,2017-4,2017-12,4-12,2017-4-12,0.427815,0.957802,1.000000,1.000000,0.936829,0.000000,0.993224,0.0,0.000000
340,000020,동화약품,20181231,별도요약재무제표,306602589029,11232142004,14545424343,10074474538,370294498762,73197485231,297097013531,27931470000,24.637570,2018,4,12,2018-4,2018-12,4-12,2018-4-12,1.000000,0.995782,0.144294,0.107869,0.994294,0.173751,0.995487,0.0,0.068440
339,000020,동화약품,20181231,연결요약재무제표,306602589029,11225780035,14539062374,10068112569,370599242793,73198591231,297400651562,27931470000,24.612788,2018,4,12,2018-4,2018-12,4-12,2018-4-12,1.000000,0.994795,0.144186,0.107715,1.000000,0.173817,1.000000,0.0,0.066691
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
411,285130,SK케미칼,20181231,연결요약재무제표,1367719213257,45733649359,5161291228,-16419747980,1950805010838,1225244545252,725560465586,65192610000,168.868703,2018,4,12,2018-4,2018-12,4-12,2018-4-12,1.000000,0.846582,0.520921,0.000000,1.000000,1.000000,0.120693,0.0,1.000000
705,294870,HDC현대산업개발,20181231,별도요약재무제표,2793597716191,315189287868,321909452540,227725549819,4852055260385,3013408604106,1838646656279,219691100000,163.892752,2018,4,12,2018-4,2018-12,4-12,2018-4-12,1.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,1.000000
704,294870,HDC현대산업개발,20181231,연결요약재무제표,2792736856185,317930502477,324798010807,229852940958,4863355063497,3018991446361,1844363617136,219691100000,163.687432,2018,4,12,2018-4,2018-12,4-12,2018-4-12,0.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,0.0,0.000000
1236,298040,효성중공업,20181231,별도요약재무제표,2135067783832,60389967118,25196944107,20629190435,3238850841791,2298566217473,940284624318,46622740000,244.454302,2018,4,12,2018-4,2018-12,4-12,2018-4-12,0.000000,1.000000,1.000000,1.000000,0.000000,0.000000,1.000000,0.0,0.000000


## (선택) '요약ㅇㅇ'와 'ㅇㅇ요약'을 'ㅇㅇ요약'으로 통일

- '연결요약재무제표'와 '요약연결재무제표'을 '연결요약재무제표'로 통일
- '별도요약재무제표'와 '요약별도재무정보'을 '별도요약재무제표'로 통일


In [19]:
df_fs.loc[
    (df_fs["재무제표구분코드명"] == "요약연결재무제표") | (df_fs["재무제표구분코드명"] == "연결요약재무제표"), "재무제표구분코드명"
] = "연결요약재무제표"

df_fs.loc[
    (df_fs["재무제표구분코드명"] == "요약별도재무정보") | (df_fs["재무제표구분코드명"] == "별도요약재무제표"), "재무제표구분코드명"
] = "별도요약재무제표"

df_fs["재무제표구분코드명"].unique()

array(['별도요약재무제표', '연결요약재무제표'], dtype=object)

## (선택) '연결재무제표'와 '별도재무제표' 분리

- '연결재무제표'만 있는 데이터프레임과
- '별도재무제표'만 있는 데이터프레임으로 분리


In [20]:
df_fs_cm = df_fs[(df_fs["재무제표구분코드명"] == "요약연결재무제표") | (df_fs["재무제표구분코드명"] == "연결요약재무제표")]
df_fs_cm.head(2)

Unnamed: 0,종목코드,종목명,연_월_일,재무제표구분코드명,기업매출금액,기업영업이익,포괄손익계산금액,기업당기순이익,기업총자산금액,기업총부채금액,기업총자본금액,기업자본금액,재무제표부채비율,연,분기,월,연_분기,연_월,분기_월,연_분기_월,기업매출금액_mmscl,기업영업이익_mmscl,포괄손익계산금액_mmscl,기업당기순이익_mmscl,기업총자산금액_mmscl,기업총부채금액_mmscl,기업총자본금액_mmscl,기업자본금액_mmscl,재무제표부채비율_mmscl
339,20,동화약품,20181231,연결요약재무제표,306602589029,11225780035,14539062374,10068112569,370599242793,73198591231,297400651562,27931470000,24.612788,2018,4,12,2018-4,2018-12,4-12,2018-4-12,1.0,0.994795,0.144186,0.107715,1.0,0.173817,1.0,0.0,0.066691
448,50,경방,20151231,연결요약재무제표,357628146446,38961511221,23310479464,16692163219,1318643312104,655690394504,662952917600,0,98.904519,2015,4,12,2015-4,2015-12,4-12,2015-4-12,0.798527,0.068985,0.0,0.0,1.0,1.0,0.0,0.0,1.0


In [21]:
df_fs_sm = df_fs[(df_fs["재무제표구분코드명"] == "요약별도재무정보") | (df_fs["재무제표구분코드명"] == "별도요약재무제표")]
df_fs_sm.head(2)

Unnamed: 0,종목코드,종목명,연_월_일,재무제표구분코드명,기업매출금액,기업영업이익,포괄손익계산금액,기업당기순이익,기업총자산금액,기업총부채금액,기업총자본금액,기업자본금액,재무제표부채비율,연,분기,월,연_분기,연_월,분기_월,연_분기_월,기업매출금액_mmscl,기업영업이익_mmscl,포괄손익계산금액_mmscl,기업당기순이익_mmscl,기업총자산금액_mmscl,기업총부채금액_mmscl,기업총자본금액_mmscl,기업자본금액_mmscl,재무제표부채비율_mmscl
336,20,동화약품,20151231,별도요약재무제표,223201285434,4812973681,6000622879,5608652157,317187030052,87069287627,230117742425,27931470000,37.836842,2015,4,12,2015-4,2015-12,4-12,2015-4-12,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
337,20,동화약품,20161231,별도요약재무제표,237470834801,11259333902,35655076190,26254318411,324604536650,71679236748,252925299902,27931470000,28.340082,2016,4,12,2016-4,2016-12,4-12,2016-4-12,0.171095,1.0,0.500767,0.498683,0.138873,0.083319,0.33898,0.0,0.329751


## 영속화


In [22]:
m.DfPrst(df_fs, fp_fs)

['../data/finaStat.parquet']


In [23]:
m.DfPrst(df_fs_cm, fp_fs_cm)
m.DfPrst(df_fs_sm, fp_fs_sm)

['../data/finaStat_cm.parquet']
['../data/finaStat_sm.parquet']
