## 금융투자협회 전자공시서비스의 펀드표준코드조회 정보 크롤링

1. 펀드표준코드조회 URL로 접근하여 cURL에 대한 기본 정보 가지고 오기
    > https://curlconverter.com/ 참고하기
2. cURL 정보를 통해서 기본적인 header, cookie 데이터 가져오기
3. 수집해야 하는 항목들을 담은 규칙 확인 후에 코드 적용하기
4. 펀드 이름, 코드, 종류 등에 대한 데이터 수집 후 데이터프레임 저장하기

In [1]:
import pandas as pd
import numpy as np

import requests
from bs4 import BeautifulSoup, NavigableString, Tag
from tqdm import tqdm
from datetime import datetime

In [2]:
data = '<?xml version="1.0" encoding="utf-8"?>\n<message>\n  <proframeHeader>\n    <pfmAppName>FS-DIS2</pfmAppName>\n    <pfmSvcName>DISMngCompInqSO</pfmSvcName>\n    <pfmFnName>select</pfmFnName>\n  </proframeHeader>\n  <systemHeader></systemHeader>\n    <DISMngCompInqListDTO>\n    <option>M2</option>\n    <standardDt></standardDt>\n</DISMngCompInqListDTO>\n</message>\n'

# 홈페이지에서 각 펀드 별로 기본 정보 값을 가져오기 - managecompcd, salecompcd, ciorgtypcdlist, koreannm
response = requests.post('https://dis.kofia.or.kr/proframeWeb/XMLSERVICES/', data=data)
soups = BeautifulSoup(response.text, 'html.parser')
fund_lst = soups.select('list')

# 가져온 정보에서 각 펀드의 이름을 뜻하는 managecompcd 값만 추려서 리스트에 저장하기
fund_code_lst = [code.select_one('managecompcd').text for code in fund_lst]
print(len(fund_code_lst))

345


In [3]:
colname = ['BASE_DATE', 'SRTN_CD', 'FND_NAME', 'CTG', 'SETP_DT', 'FND_TP', 'PRD_CLSF_CD', 'ASO_STD_CD']
fund_df = pd.DataFrame(columns = colname)
for code in tqdm(fund_code_lst):
    data = f'<?xml version="1.0" encoding="utf-8"?>\n<message>\n  <proframeHeader>\n    <pfmAppName>FS-DIS2</pfmAppName>\n    <pfmSvcName>DISFundStandardCdSO</pfmSvcName>\n    <pfmFnName>selectExcel</pfmFnName>\n  </proframeHeader>\n  <systemHeader></systemHeader>\n    <DISStdCdPageDTO>\n    <companyCd>{code}</companyCd>\n    <fundNm></fundNm>\n    <shortCd></shortCd>\n    <businessGb></businessGb>\n</DISStdCdPageDTO>\n</message>\n'

    response = requests.post('https://dis.kofia.or.kr/proframeWeb/XMLSERVICES/', data=data)
    soups = BeautifulSoup(response.text, 'html.parser')
    
    if len(soups.select('stdcdpage')) != 0:
        for fund in soups.select('stdcdpage'):
            BASE_DATE = datetime.today().strftime("%Y%m%d")
            SRTN_CD = fund.select_one('shortcd').text
            FND_NAME = fund.select_one('koreancdtnm').text
            CTG = fund.select_one('ufundtypnm').text
            SETP_DT = fund.select_one('startdt').text
            FND_TP = fund.select_one('fundnm').text
            PRD_CLSF_CD = fund.select_one('classcd').text
            ASO_STD_CD = fund.select_one('standardcd').text
    
            df = pd.DataFrame([BASE_DATE, SRTN_CD, FND_NAME, CTG, SETP_DT, FND_TP, PRD_CLSF_CD, ASO_STD_CD], index = colname).T
            fund_df = pd.concat([fund_df, df], axis = 0)

fund_df = fund_df.reset_index(drop = True).reset_index().rename(columns = {'index':'ID'})
fund_df['CRT_ID'] = None
fund_df['CRT_DT'] = None
fund_df['MDF_ID'] = None
fund_df['MDF_DT'] = None
fund_df

100%|█████████████████████████████████████████| 345/345 [32:19<00:00,  5.62s/it]


Unnamed: 0,ID,BASE_DATE,SRTN_CD,FND_NAME,CTG,SETP_DT,FND_TP,PRD_CLSF_CD,ASO_STD_CD,CRT_ID,CRT_DT,MDF_ID,MDF_DT
0,0,20220328,33020,PEI Korea사모M&A 1호,투자회사,20011020,혼합주식형,23141Z32A38ZZ21ZZZZZ,KRM320330207,,,,
1,1,20220328,34785,PEI Korea사모M&A 2호,투자회사,20020715,혼합주식형,23141Z32A38ZZ21ZZZZZ,KRM320347854,,,,
2,2,20220328,37211,PEI Korea사모M&A 3호,투자회사,20030312,혼합주식형,23141Z32A38ZZ21ZZZZZ,KRM320372118,,,,
3,3,20220328,67470,PEI-RICH사모기업인수증권투자회사3호,투자회사,20070410,혼합주식형,23141Z42A38011111ZZ2,KRM320674703,,,,
4,4,20220328,67523,PEI-RICH사모기업인수증권투자회사5호,투자회사,20070411,혼합주식형,22141Z42A38011111ZZ2,KRM320675239,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
155334,155334,20220328,AA053,흥국하이클래스사모특별자산투자신탁34호[사업수익권],자산운용,20120329,특별자산,18141Z42001016922ZZ1,KR5224AA0539,,,,
155335,155335,20220328,AB085,흥국하이클래스사모특별자산투자신탁33호[대출채권],자산운용,20120521,특별자산,18141Z42001016923ZZ1,KR5224AB0850,,,,
155336,155336,20220328,AJ931,흥국하이클래스사모특별자산투자신탁36호[신탁수익권],자산운용,20130628,특별자산,18141Z42001016923ZZ1,KR5224AJ9317,,,,
155337,155337,20220328,AL167,흥국하이클래스사모특별자산투자신탁38호[신탁수익권],자산운용,20130827,특별자산,18141Z42001016923ZZ1,KR5224AL1674,,,,
