[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dakimura/jquants-sample/blob/main/predictor.ipynb)

# J-Quants APIを用いた価格予測モデル生成

本ノートブックでは、data_retrieve.ipynbで保存したデータを用いて、
モデルの生成とそれによる価格の予測までを行います。

モデルには、J-Quantsファンダメンタルズ分析チャレンジで第2位を受賞された[UKIさんのモデル](https://github.com/UKI000/JQuants-Forum/blob/452a4f4bc086ef0a8b087efc707c51abad5ed50e/jquants01_fund_uki_predictor.py)を
ほぼそのまま使用させていただいています。

In [2]:
# 必要なライブラリのインストールとインポート
!pip install scikit-learn
!pip install xgboost pandas numpy

from datetime import datetime
import pandas as pd
import numpy as np
from dateutil import tz
import pickle
import os
import io
from typing import List

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [23]:
# Googleドライブをマウント
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [26]:
# pandas の表示制限を調整します
pd.set_option("display.max_rows", 1000)
pd.set_option("display.max_columns", 1000)
pd.set_option("display.width", 2000)

In [24]:
# 必要なコンフィグの定義
# データを保存しているGoogleドライブ上のディレクトリ
STORAGE_DIR_PATH = "/content/drive/MyDrive/drive_ws/marketdata"
# デバッグ中
# STORAGE_DIR_PATH = "/tmp/marketdata"

# CSVデータが保存されているファイルパス
stock_fin_csvfile_path = STORAGE_DIR_PATH + "/stock_fin.csv.gz"
stock_price_csvfile_path = STORAGE_DIR_PATH + "/stock_price.csv.gz"
# stock_labelsは今このノートブック内で作っているので特に保存する意味はないけれど
stock_labels_csvfile_path = STORAGE_DIR_PATH + "/stock_labels.csv.gz"
# 生成したモデルを保存するパス
model_dir = STORAGE_DIR_PATH + "/model/"

# デバッグ用のコード
code = 13010

TRAIN_START = "2017-01-01"
TRAIN_END = "2019-12-31"
TEST_START = "2020-01-01"
TEST_END = "2020-11-15"

# 訓練結果を保存するファイルパス
model_path = STORAGE_DIR_PATH + "/model"

# データを使用する期間
start_dt: datetime = datetime(2017, 1, 1, tzinfo=tz.gettz("Asia/Tokyo"))
end_dt: datetime = datetime(2022, 7, 31, tzinfo=tz.gettz("Asia/Tokyo"))

## データセットの読み込み

事前に生成しておいたcsvデータを読み込み、データ型を調整します。

In [25]:
# ノートブックの冪等性を高めるためこれらの変数はこのセル以外でいじらない
stock_price_load:pd.DataFrame = pd.read_csv(stock_price_csvfile_path)
stock_fin_load:pd.DataFrame = pd.read_csv(stock_fin_csvfile_path)

# 財務情報のいくつかがobject型になっているので数値型に変換
numeric_cols_fin = ['AverageNumberOfShares', 'BookValuePerShare', 'EarningsPerShare','Equity', 'EquityToAssetRatio',
                'ForecastDividendPerShare1stQuarter', 'ForecastDividendPerShare2ndQuarter', 'ForecastDividendPerShare3rdQuarter',
                'ForecastDividendPerShareAnnual', 'ForecastDividendPerShareFiscalYearEnd', 'ForecastEarningsPerShare', 'ForecastNetSales', 'ForecastOperatingProfit',
                'ForecastOrdinaryProfit', 'ForecastProfit', 'NetSales', 'NumberOfIssuedAndOutstandingSharesAtTheEndOfFiscalYearIncludingTreasuryStock',
                'OperatingProfit', 'OrdinaryProfit', 'Profit', 'ResultDividendPerShare1stQuarter','ResultDividendPerShare2ndQuarter','ResultDividendPerShare3rdQuarter',
                'ResultDividendPerShareAnnual','ResultDividendPerShareFiscalYearEnd','TotalAssets']
stock_fin_load[numeric_cols_fin] = stock_fin_load[numeric_cols_fin].apply(pd.to_numeric, errors='coerce', axis=1)

# object型をdatetime64[ns]型に変換
stock_price_load["Date"] = pd.to_datetime(stock_price_load["Date"])
stock_fin_load["DisclosedDate"] = pd.to_datetime(stock_fin_load["DisclosedDate"]) #開示時刻
stock_fin_load["CurrentFiscalYearEndDate"] = pd.to_datetime(stock_fin_load["CurrentFiscalYearEndDate"])  # 当事業年度終了日
stock_fin_load["CurrentFiscalYearStartDate"] = pd.to_datetime(stock_fin_load["CurrentFiscalYearStartDate"])
stock_fin_load["CurrentPeriodEndDate"] = pd.to_datetime(stock_fin_load["CurrentPeriodEndDate"]) # 当会計期間終了日

stock_fin_load

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0.1,Unnamed: 0,DisclosureNumber,DisclosedDate,ApplyingOfSpecificAccountingOfTheQuarterlyFinancialStatements,AverageNumberOfShares,BookValuePerShare,ChangesBasedOnRevisionsOfAccountingStandard,ChangesInAccountingEstimates,ChangesOtherThanOnesBasedOnRevisionsOfAccountingStandard,CurrentFiscalYearEndDate,...,Profit,ResultDividendPerShare1stQuarter,ResultDividendPerShare2ndQuarter,ResultDividendPerShare3rdQuarter,ResultDividendPerShareAnnual,ResultDividendPerShareFiscalYearEnd,RetrospectiveRestatement,TotalAssets,TypeOfCurrentPeriod,TypeOfDocument
2877,2,20161130449148,2017-02-10,,10503022.0,,True,False,False,2017-03-31,...,2449000000.0,,,,,,False,117168000000.0,3Q,3QFinancialStatements_Consolidated_JP
4069,17,20170217401799,2017-02-17,,,,,,,2017-03-31,...,,,,,,,,,FY,ForecastRevision
7115,15,20170329429729,2017-05-11,,10502960.0,2378.09,True,False,False,2017-03-31,...,2422000000.0,,,,60.0,60.0,False,97391000000.0,FY,FYFinancialStatements_Consolidated_JP
11639,7,20170619410034,2017-08-04,,10502773.0,,False,False,False,2018-03-31,...,754000000.0,,,,,,False,107422000000.0,1Q,1QFinancialStatements_Consolidated_JP
16380,1,20170905468428,2017-11-06,,10503343.0,,False,False,False,2018-03-31,...,1633000000.0,,,,,,False,119806000000.0,2Q,2QFinancialStatements_Consolidated_JP
21816,249,20180206464852,2018-02-09,,,,,,,2018-03-31,...,,,,,,,,,FY,ForecastRevision
21819,6,20171215436591,2018-02-09,,10505430.0,,False,False,False,2018-03-31,...,2784000000.0,,,,,,False,124543000000.0,3Q,3QFinancialStatements_Consolidated_JP
26010,12,20180402403415,2018-05-10,,10552710.0,2679.0,False,False,False,2018-03-31,...,3211000000.0,,,,60.0,60.0,False,106305000000.0,FY,FYFinancialStatements_Consolidated_JP
30483,3,20180627471686,2018-08-03,,10783962.0,,False,False,False,2019-03-31,...,555000000.0,,,,,,False,112367000000.0,1Q,1QFinancialStatements_Consolidated_JP
35079,2,20180921409376,2018-11-05,,10801591.0,,False,False,False,2019-03-31,...,824000000.0,,,,,,False,121834000000.0,2Q,2QFinancialStatements_Consolidated_JP


[株式分析チュートリアル](https://japanexchangegroup.github.io/J-Quants-Tutorial/#introduction)で用いられたデータの形式を[J-Quants API](https://jpx.gitbook.io/j-quants-api/api-reference)を用いて再現するため、データを加工します。

In [28]:
# stock_price: データの互換性のための各種列名変換など
stock_price: pd.DataFrame = pd.DataFrame()
stock_price["Local Code"] = stock_price_load["Code"]
#stock_price["Date"] = stock_price_load["Date"]
stock_price["base_date"] = stock_price_load["Date"]
stock_price['EndOfDayQuote Date'] = stock_price_load["Date"]
stock_price["EndOfDayQuote Open"] = stock_price_load["AdjustmentOpen"].replace({0.0: np.nan})
stock_price["EndOfDayQuote High"] = stock_price_load["AdjustmentHigh"].replace({0.0: np.nan})
stock_price["EndOfDayQuote Low"] = stock_price_load["AdjustmentLow"].replace({0.0: np.nan})
stock_price["EndOfDayQuote Close"] = stock_price_load["AdjustmentClose"].replace({0.0: np.nan})
stock_price["EndOfDayQuote ExchangeOfficialClose"] = stock_price_load["AdjustmentClose"].replace({0.0: np.nan})
stock_price["EndOfDayQuote Volume"] = stock_price_load["AdjustmentVolume"]
#stock_price = stock_price.set_index("base_date")
#stock_price = stock_price.sort_index()

# stock_price["EndOfDayQuote Open"][stock_price["EndOfDayQuote Close"] == 0] = \
#     stock_price["EndOfDayQuote ExchangeOfficialClose"]
# stock_price["EndOfDayQuote High"][stock_price["EndOfDayQuote Close"] == 0] = \
#     stock_price["EndOfDayQuote ExchangeOfficialClose"]
# stock_price["EndOfDayQuote Low"][stock_price["EndOfDayQuote Close"] == 0] = \
#     stock_price["EndOfDayQuote ExchangeOfficialClose"]
# stock_price["EndOfDayQuote Close"][stock_price["EndOfDayQuote Close"] == 0] = \
#     stock_price["EndOfDayQuote ExchangeOfficialClose"]

# 前日終値の列を終値列から作成
stock_price["EndOfDayQuote PreviousClose"] = stock_price.groupby(["Local Code"])["EndOfDayQuote Close"].shift(1)

stock_price

Unnamed: 0,Local Code,base_date,EndOfDayQuote Date,EndOfDayQuote Open,EndOfDayQuote High,EndOfDayQuote Low,EndOfDayQuote Close,EndOfDayQuote ExchangeOfficialClose,EndOfDayQuote Volume,EndOfDayQuote PreviousClose
0,13010,2017-01-04,2017-01-04,2734.0,2755.0,2730.0,2742.0,2742.0,31400.0,
1,13010,2017-01-05,2017-01-05,2743.0,2747.0,2735.0,2738.0,2738.0,17900.0,2742.0
2,13010,2017-01-06,2017-01-06,2734.0,2744.0,2720.0,2740.0,2740.0,19900.0,2738.0
3,13010,2017-01-10,2017-01-10,2745.0,2754.0,2735.0,2748.0,2748.0,24200.0,2740.0
4,13010,2017-01-11,2017-01-11,2748.0,2752.0,2737.0,2745.0,2745.0,9300.0,2748.0
...,...,...,...,...,...,...,...,...,...,...
5455316,99970,2022-07-25,2022-07-25,829.0,831.0,816.0,826.0,826.0,151200.0,829.0
5455317,99970,2022-07-26,2022-07-26,826.0,827.0,816.0,825.0,825.0,133600.0,826.0
5455318,99970,2022-07-27,2022-07-27,819.0,822.0,811.0,811.0,811.0,136500.0,825.0
5455319,99970,2022-07-28,2022-07-28,813.0,816.0,801.0,816.0,816.0,187300.0,811.0


In [31]:
# stock_financial: データの互換性のための各種列名変換など
stock_fin: pd.DataFrame = pd.DataFrame()
stock_fin["Local Code"] = stock_fin_load["LocalCode"]
stock_fin["Result_FinancialStatement FiscalPeriodEnd"] = stock_fin_load["CurrentFiscalYearEndDate"]
stock_fin["Result_FinancialStatement TotalAssets"] = stock_fin_load["TotalAssets"] # 総資産
stock_fin["Result_FinancialStatement NetAssets"] = stock_fin_load["Equity"] # 純資産
stock_fin["Result_FinancialStatement NetSales"] = stock_fin_load["NetSales"] # 純売上高
stock_fin["Result_FinancialStatement OperatingIncome"] = stock_fin_load["OperatingProfit"]
stock_fin["Result_FinancialStatement OrdinaryIncome"] = stock_fin_load["OrdinaryProfit"]  # 経常利益
stock_fin["Result_FinancialStatement NetIncome"] = stock_fin_load["Profit"]  # 当期純利益
stock_fin["Result_FinancialStatement ReportType"] = stock_fin_load["TypeOfCurrentPeriod"]
stock_fin["base_date"] = stock_fin_load["DisclosedDate"]

stock_fin["TypeOfDocument"] = stock_fin_load["TypeOfDocument"] # 書類種別
stock_fin["RetrospectiveRestatement"] = stock_fin_load["RetrospectiveRestatement"] #修正再表示フラグ
stock_fin["Forecast_FinancialStatement FiscalPeriodEnd"] = stock_fin_load["CurrentFiscalYearEndDate"]
stock_fin["Forecast_FinancialStatement ReportType"] = stock_fin_load["TypeOfCurrentPeriod"]
stock_fin["Forecast_FinancialStatement NetSales"] = stock_fin_load["ForecastNetSales"]
stock_fin["Forecast_FinancialStatement OperatingIncome"] = stock_fin_load["ForecastOperatingProfit"]
stock_fin["Forecast_FinancialStatement NetIncome"] = stock_fin_load["ForecastProfit"]
stock_fin["Forecast_FinancialStatement OrdinaryIncome"] = stock_fin_load["ForecastOrdinaryProfit"]
#stock_fin = stock_fin.set_index("base_date")
#stock_fin = stock_fin.sort_index()

stock_fin

Unnamed: 0,Local Code,Result_FinancialStatement FiscalPeriodEnd,Result_FinancialStatement TotalAssets,Result_FinancialStatement NetAssets,Result_FinancialStatement NetSales,Result_FinancialStatement OperatingIncome,Result_FinancialStatement OrdinaryIncome,Result_FinancialStatement NetIncome,Result_FinancialStatement ReportType,base_date,TypeOfDocument,RetrospectiveRestatement,Forecast_FinancialStatement FiscalPeriodEnd,Forecast_FinancialStatement ReportType,Forecast_FinancialStatement NetSales,Forecast_FinancialStatement OperatingIncome,Forecast_FinancialStatement NetIncome,Forecast_FinancialStatement OrdinaryIncome
926,80700,2017-03-31,43613000000.0,21295000000.0,67048000000.0,794000000.0,1061000000.0,733000000.0,3Q,2017-01-30,3QFinancialStatements_NonConsolidated_JP,False,2017-03-31,3Q,95000000000.0,1700000000.0,1300000000.0,2000000000.0
6278,80700,2017-03-31,,,,,,,FY,2017-05-02,ForecastRevision,,2017-03-31,FY,84900000000.0,1350000000.0,1140000000.0,1590000000.0
8530,80700,2017-03-31,41966000000.0,21582000000.0,84972000000.0,1354000000.0,1591000000.0,1142000000.0,FY,2017-05-12,FYFinancialStatements_NonConsolidated_JP,False,2017-03-31,FY,100000000000.0,1900000000.0,1400000000.0,2100000000.0
11190,80700,2018-03-31,44741000000.0,21686000000.0,21938000000.0,201000000.0,351000000.0,262000000.0,1Q,2017-07-31,1QFinancialStatements_NonConsolidated_JP,False,2018-03-31,1Q,100000000000.0,1900000000.0,1400000000.0,2100000000.0
14695,80700,2018-03-31,,,,,,,FY,2017-10-19,ForecastRevision,,2018-03-31,FY,,,,
14700,80700,2018-03-31,,,,,,,FY,2017-10-19,ForecastRevision,,2018-03-31,FY,,,,
15530,80700,2018-03-31,40596000000.0,22275000000.0,42925000000.0,494000000.0,666000000.0,533000000.0,2Q,2017-10-30,2QFinancialStatements_NonConsolidated_JP,False,2018-03-31,2Q,100000000000.0,1900000000.0,1400000000.0,2100000000.0
19857,80700,2018-03-31,50885000000.0,22686000000.0,65956000000.0,1166000000.0,1387000000.0,976000000.0,3Q,2018-01-29,3QFinancialStatements_NonConsolidated_JP,False,2018-03-31,3Q,100000000000.0,1900000000.0,1400000000.0,2100000000.0
27244,80700,2018-03-31,59907000000.0,22962000000.0,104586000000.0,2197000000.0,2335000000.0,1627000000.0,FY,2018-05-11,FYFinancialStatements_NonConsolidated_JP,False,2018-03-31,FY,130000000000.0,2400000000.0,1700000000.0,2500000000.0
29755,80700,2019-03-31,59763000000.0,23517000000.0,29053000000.0,872000000.0,1081000000.0,740000000.0,1Q,2018-07-30,1QFinancialStatements_NonConsolidated_JP,False,2019-03-31,1Q,130000000000.0,2400000000.0,1700000000.0,2500000000.0


## 訓練とモデルの保存

特徴量を生成していきます。

In [39]:
# stock_priceを使ったテクニカル指標
def get_technical(stock_price:pd.DataFrame, code:int)->pd.DataFrame:
    technical_df = stock_price[stock_price["Local Code"] == code].copy()
    # 終値
    technical_df["close"] = technical_df["EndOfDayQuote Close"]
    # 騰落率
    technical_df["ror_1"] = technical_df["EndOfDayQuote Close"].pct_change(1)
    technical_df["ror_5"] = technical_df["EndOfDayQuote Close"].pct_change(5)
    technical_df["ror_10"] = technical_df["EndOfDayQuote Close"].pct_change(10)
    technical_df["ror_20"] = technical_df["EndOfDayQuote Close"].pct_change(20)
    technical_df["ror_40"] = technical_df["EndOfDayQuote Close"].pct_change(40)
    technical_df["ror_60"] = technical_df["EndOfDayQuote Close"].pct_change(60)
    technical_df["ror_100"] = technical_df["EndOfDayQuote Close"].pct_change(100)

    # 売買代金
    technical_df["volume"] = technical_df["EndOfDayQuote Close"] * technical_df["EndOfDayQuote Volume"]
    technical_df = technical_df.replace([np.inf, -np.inf], np.nan)

    technical_df["vol_1"] = technical_df["volume"]
    technical_df["vol_5"] = technical_df["volume"].rolling(5).mean() # 5日移動平均
    technical_df["vol_10"] = technical_df["volume"].rolling(10).mean()
    technical_df["vol_20"] = technical_df["volume"].rolling(20).mean()
    technical_df["vol_40"] = technical_df["volume"].rolling(40).mean()
    technical_df["vol_60"] = technical_df["volume"].rolling(60).mean()
    technical_df["vol_100"] = technical_df["volume"].rolling(100).mean()
    technical_df["d_vol"] = technical_df["volume"] / technical_df["vol_20"]

    # レンジ (前日の終値に対して何%値動きしたか)
    technical_df["range"] = (
        technical_df[["EndOfDayQuote PreviousClose", "EndOfDayQuote High"]].max(axis=1) 
        - technical_df[["EndOfDayQuote PreviousClose", "EndOfDayQuote Low"]].min(axis=1)
        ) / technical_df["EndOfDayQuote PreviousClose"]
    technical_df = technical_df.replace([np.inf, -np.inf], np.nan)

    # レンジの移動平均
    technical_df["atr_1"] = technical_df["range"]
    technical_df["atr_5"] = technical_df["range"].rolling(5).mean()
    technical_df["atr_10"] = technical_df["range"].rolling(10).mean()
    technical_df["atr_20"] = technical_df["range"].rolling(20).mean()
    technical_df["atr_40"] = technical_df["range"].rolling(40).mean()
    technical_df["atr_60"] = technical_df["range"].rolling(60).mean()
    technical_df["atr_100"] = technical_df["range"].rolling(100).mean()
    technical_df["d_atr"] = technical_df["range"] / technical_df["atr_20"]

    # ギャップレンジ
    technical_df["gap_range"] = (np.abs(technical_df["EndOfDayQuote Open"] - technical_df["EndOfDayQuote PreviousClose"])) / technical_df[
        "EndOfDayQuote PreviousClose"]
    technical_df["g_atr_1"] = technical_df["gap_range"]
    technical_df["g_atr_5"] = technical_df["gap_range"].rolling(5).mean()
    technical_df["g_atr_10"] = technical_df["gap_range"].rolling(10).mean()
    technical_df["g_atr_20"] = technical_df["gap_range"].rolling(20).mean()
    technical_df["g_atr_40"] = technical_df["gap_range"].rolling(40).mean()
    technical_df["g_atr_60"] = technical_df["gap_range"].rolling(60).mean()
    technical_df["g_atr_100"] = technical_df["gap_range"].rolling(100).mean()

    # デイレンジ
    technical_df["day_range"] = (technical_df["EndOfDayQuote High"] - technical_df["EndOfDayQuote Low"]) / technical_df[
        "EndOfDayQuote PreviousClose"]
    technical_df["d_atr_1"] = technical_df["day_range"]
    technical_df["d_atr_5"] = technical_df["day_range"].rolling(5).mean()
    technical_df["d_atr_10"] = technical_df["day_range"].rolling(10).mean()
    technical_df["d_atr_20"] = technical_df["day_range"].rolling(20).mean()
    technical_df["d_atr_40"] = technical_df["day_range"].rolling(40).mean()
    technical_df["d_atr_60"] = technical_df["day_range"].rolling(60).mean()
    technical_df["d_atr_100"] = technical_df["day_range"].rolling(100).mean()

    # ヒゲレンジ
    technical_df["hig_range"] = ((technical_df["EndOfDayQuote High"] - technical_df["EndOfDayQuote Low"]) - np.abs(
        technical_df["EndOfDayQuote Open"] - technical_df["EndOfDayQuote Close"])) / technical_df["EndOfDayQuote PreviousClose"]
    technical_df["h_atr_1"] = technical_df["hig_range"]
    technical_df["h_atr_5"] = technical_df["hig_range"].rolling(5).mean()
    technical_df["h_atr_10"] = technical_df["hig_range"].rolling(10).mean()
    technical_df["h_atr_20"] = technical_df["hig_range"].rolling(20).mean()
    technical_df["h_atr_40"] = technical_df["hig_range"].rolling(40).mean()
    technical_df["h_atr_60"] = technical_df["hig_range"].rolling(60).mean()
    technical_df["h_atr_100"] = technical_df["hig_range"].rolling(100).mean()

    # ボラティリティ
    technical_df["vola_5"] = technical_df["ror_1"].rolling(5).std()
    technical_df["vola_10"] = technical_df["ror_1"].rolling(10).std()
    technical_df["vola_20"] = technical_df["ror_1"].rolling(20).std()
    technical_df["vola_40"] = technical_df["ror_1"].rolling(40).std()
    technical_df["vola_60"] = technical_df["ror_1"].rolling(60).std()
    technical_df["vola_100"] = technical_df["ror_1"].rolling(100).std()

    # HLバンド
    technical_df["hl_5"] = technical_df["EndOfDayQuote High"].rolling(5).max() - technical_df["EndOfDayQuote Low"].rolling(5).min()
    technical_df["hl_10"] = technical_df["EndOfDayQuote High"].rolling(10).max() - technical_df["EndOfDayQuote Low"].rolling(10).min()
    technical_df["hl_20"] = technical_df["EndOfDayQuote High"].rolling(20).max() - technical_df["EndOfDayQuote Low"].rolling(20).min()
    technical_df["hl_40"] = technical_df["EndOfDayQuote High"].rolling(40).max() - technical_df["EndOfDayQuote Low"].rolling(40).min()
    technical_df["hl_60"] = technical_df["EndOfDayQuote High"].rolling(60).max() - technical_df["EndOfDayQuote Low"].rolling(60).min()
    technical_df["hl_100"] = technical_df["EndOfDayQuote High"].rolling(100).max() - technical_df["EndOfDayQuote Low"].rolling(100).min()

    # マーケットインパクト
    technical_df["mi"] = technical_df["range"] / (technical_df["EndOfDayQuote Volume"] * technical_df["EndOfDayQuote Close"])
    technical_df = technical_df.replace([np.inf, -np.inf], np.nan)

    technical_df["mi_5"] = technical_df["mi"].rolling(5).mean()
    technical_df["mi_10"] = technical_df["mi"].rolling(10).mean()
    technical_df["mi_20"] = technical_df["mi"].rolling(20).mean()
    technical_df["mi_40"] = technical_df["mi"].rolling(40).mean()
    technical_df["mi_60"] = technical_df["mi"].rolling(60).mean()
    technical_df["mi_100"] = technical_df["mi"].rolling(100).mean()

    feat = ["EndOfDayQuote Date", "Local Code", "close",
            "ror_1", "ror_5", "ror_10", "ror_20", "ror_40", "ror_60", "ror_100",
            "vol_1", "vol_5", "vol_10", "vol_20", "vol_40", "vol_60", "vol_100", "d_vol",
            "atr_1", "atr_5", "atr_10", "atr_20", "atr_40", "atr_60", "atr_100", "d_atr",
            "g_atr_1", "g_atr_5", "g_atr_10", "g_atr_20", "g_atr_40", "g_atr_60", "g_atr_100",
            "d_atr_1", "d_atr_5", "d_atr_10", "d_atr_20", "d_atr_40", "d_atr_60", "d_atr_100",
            "h_atr_1", "h_atr_5", "h_atr_10", "h_atr_20", "h_atr_40", "h_atr_60", "h_atr_100",
            "vola_5", "vola_10", "vola_20", "vola_40", "vola_60", "vola_100",
            "hl_5", "hl_10", "hl_20", "hl_40", "hl_60", "hl_100",
            "mi_5", "mi_10", "mi_20", "mi_40", "mi_60", "mi_100"]
    technical_df = technical_df[feat]
    technical_df.columns = ["datetime", "code", "close",
                      "ror_1", "ror_5", "ror_10", "ror_20", "ror_40", "ror_60", "ror_100",
                      "vol_1", "vol_5", "vol_10", "vol_20", "vol_40", "vol_60", "vol_100", "d_vol",
                      "atr_1", "atr_5", "atr_10", "atr_20", "atr_40", "atr_60", "atr_100", "d_atr",
                      "g_atr_1", "g_atr_5", "g_atr_10", "g_atr_20", "g_atr_40", "g_atr_60", "g_atr_100",
                      "d_atr_1", "d_atr_5", "d_atr_10", "d_atr_20", "d_atr_40", "d_atr_60", "d_atr_100",
                      "h_atr_1", "h_atr_5", "h_atr_10", "h_atr_20", "h_atr_40", "h_atr_60", "h_atr_100",
                      "vola_5", "vola_10", "vola_20", "vola_40", "vola_60", "vola_100",
                      "hl_5", "hl_10", "hl_20", "hl_40", "hl_60", "hl_100",
                      "mi_5", "mi_10", "mi_20", "mi_40", "mi_60", "mi_100"]
    technical_df["datetime"] = pd.to_datetime(technical_df["datetime"])
    technical_df = technical_df.set_index(["datetime", "code"])
    return technical_df

get_technical(stock_price, 80700)

Unnamed: 0_level_0,Unnamed: 1_level_0,close,ror_1,ror_5,ror_10,ror_20,ror_40,ror_60,ror_100,vol_1,vol_5,vol_10,vol_20,vol_40,vol_60,vol_100,d_vol,atr_1,atr_5,atr_10,atr_20,atr_40,atr_60,atr_100,d_atr,g_atr_1,g_atr_5,g_atr_10,g_atr_20,g_atr_40,g_atr_60,g_atr_100,d_atr_1,d_atr_5,d_atr_10,d_atr_20,d_atr_40,d_atr_60,d_atr_100,h_atr_1,h_atr_5,h_atr_10,h_atr_20,h_atr_40,h_atr_60,h_atr_100,vola_5,vola_10,vola_20,vola_40,vola_60,vola_100,hl_5,hl_10,hl_20,hl_40,hl_60,hl_100,mi_5,mi_10,mi_20,mi_40,mi_60,mi_100
datetime,code,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1
2017-01-04,80700,495.0,,,,,,,,32719500.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2017-01-05,80700,493.0,-0.004040,,,,,,,13458900.0,,,,,,,,0.008081,,,,,,,,0.002020,,,,,,,0.008081,,,,,,,0.006061,,,,,,,,,,,,,,,,,,,,,,,,
2017-01-06,80700,493.0,0.000000,,,,,,,26819200.0,,,,,,,,0.030426,,,,,,,,0.002028,,,,,,,0.030426,,,,,,,0.028398,,,,,,,,,,,,,,,,,,,,,,,,
2017-01-10,80700,490.0,-0.006085,,,,,,,12740000.0,,,,,,,,0.014199,,,,,,,,0.002028,,,,,,,0.014199,,,,,,,0.010142,,,,,,,,,,,,,,,,,,,,,,,,
2017-01-11,80700,493.0,0.006122,,,,,,,14839300.0,20115380.0,,,,,,,0.008163,,,,,,,,0.000000,,,,,,,0.008163,,,,,,,0.002041,,,,,,,,,,,,,15.0,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-07-25,80700,727.0,0.001377,0.019635,0.023944,0.032670,0.025388,0.049062,0.047550,20356000.0,17639880.0,17863060.0,31536895.0,26654892.5,2.405023e+07,24690438.0,0.645466,0.011019,0.009426,0.012131,0.018422,0.016994,0.017415,0.019013,0.598151,0.001377,0.002778,0.002926,0.004116,0.004751,0.005339,0.006252,0.011019,0.009147,0.010595,0.017252,0.015666,0.015914,0.016840,0.008264,0.005817,0.005571,0.006932,0.006936,0.006909,0.008065,0.003037,0.008954,0.018315,0.014593,0.014428,0.015239,15.0,24.0,55.0,70.0,72.0,93.0,5.517691e-10,7.013448e-10,6.612621e-10,7.366161e-10,8.412422e-10,8.874062e-10
2022-07-26,80700,721.0,-0.008253,0.005579,-0.001385,0.024148,0.024148,0.032951,0.019802,13410600.0,18543840.0,16243920.0,30577665.0,26025677.5,2.408644e+07,24514171.0,0.438575,0.009629,0.010230,0.011122,0.018193,0.016917,0.017407,0.018850,0.529235,0.001376,0.003053,0.002359,0.003901,0.004644,0.005338,0.006137,0.009629,0.009951,0.010290,0.017023,0.015590,0.015906,0.016677,0.002751,0.006367,0.005564,0.006643,0.007005,0.006883,0.007891,0.005984,0.007901,0.018449,0.014610,0.014452,0.015153,10.0,24.0,55.0,70.0,72.0,93.0,5.691661e-10,7.065319e-10,6.753720e-10,7.463399e-10,8.382279e-10,8.862294e-10
2022-07-27,80700,722.0,0.001387,-0.001383,0.016901,0.014045,0.024113,0.055556,0.024113,18122200.0,16673480.0,15833840.0,30451375.0,25738482.5,2.374210e+07,24321743.0,0.595119,0.011096,0.010218,0.010293,0.018038,0.016768,0.017258,0.018735,0.615129,0.001387,0.001657,0.001806,0.003970,0.004679,0.005361,0.006108,0.011096,0.010218,0.010153,0.016868,0.015441,0.015757,0.016562,0.011096,0.006633,0.006397,0.007056,0.006892,0.007068,0.007846,0.004503,0.005365,0.018299,0.014610,0.014194,0.015150,12.0,24.0,55.0,70.0,72.0,93.0,6.103973e-10,6.805045e-10,6.715887e-10,7.472550e-10,8.398129e-10,8.862955e-10
2022-07-28,80700,721.0,-0.001385,-0.004144,0.006983,0.030000,0.015493,0.015493,0.041908,23576700.0,19231300.0,16959990.0,29855710.0,25972900.0,2.374099e+07,24383126.0,0.789688,0.013850,0.011881,0.010833,0.017607,0.016902,0.016855,0.018689,0.786646,0.004155,0.002488,0.002221,0.003827,0.004783,0.005308,0.006079,0.013850,0.011881,0.010693,0.016437,0.015575,0.015403,0.016559,0.008310,0.007466,0.007228,0.006839,0.007064,0.007133,0.007901,0.004419,0.004866,0.017832,0.014577,0.013338,0.015032,12.0,24.0,55.0,70.0,72.0,93.0,6.253182e-10,6.706306e-10,6.693023e-10,7.469580e-10,8.228083e-10,8.815959e-10


In [None]:
# stock_finを使った指標
def get_financial(stock_fin:pd.DataFrame, code:int)->pd.DataFrame:
    fin_df = stock_fin[stock_fin["Local Code"] == code].copy()

    # TypeOfDocumentの値によってはTotalAssetsなどの値がNaNになっているのでffill
    fin_df = fin_df.ffill()

    # --- 本決算／中間決算フラグ、修正開示フラグ、事後修正有無フラグ ---
    fin_df["annual"] = 0 # 0: 中間決算, 1:本決算
    fin_df["revision"] = 0 # 1: 修正再表示
    # FYFinancialStatements*** は本決算
    fin_df.loc[fin_df["TypeOfDocument"].isin(["FYFinancialStatements_Consolidated_JP", "FYFinancialStatements_Consolidated_US", "FYFinancialStatements_Consolidated_IFRS"]), "annual"] = 1
    #fin_df.loc[fin_df["RetrospectiveRestatement"]]
    fin_df.loc[fin_df["RetrospectiveRestatement"]==True, "revision"] = 1
    feat1 = ["annual", "revision"]

    # --- 原系列 ---

    # --- r_sales ---
    fin_df["pre_result_period_end"] = fin_df["Result_FinancialStatement FiscalPeriodEnd"].shift(1)
    fin_df["r_sales"] = np.nan

    fin_df.loc[((fin_df["Result_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_result_period_end"]) & (
            fin_df["Result_FinancialStatement ReportType"] != "1Q")), "r_sales"] = fin_df[
        "Result_FinancialStatement NetSales"].diff(1)
    fin_df.loc[((fin_df["Result_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_result_period_end"]) & (
            fin_df["Result_FinancialStatement ReportType"] == "1Q")), "r_sales"] = fin_df[
        "Result_FinancialStatement NetSales"]
    fin_df["r_sales"] = fin_df["r_sales"].ffill()

    # --- r_ope_income ---
    fin_df["r_ope_income"] = np.nan
    fin_df.loc[((fin_df["Result_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_result_period_end"]) & (
            fin_df["Result_FinancialStatement ReportType"] != "1Q")), "r_ope_income"] = fin_df[
        "Result_FinancialStatement OperatingIncome"].diff(1)
    fin_df.loc[((fin_df["Result_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_result_period_end"]) & (
            fin_df["Result_FinancialStatement ReportType"] == "1Q")), "r_ope_income"] = fin_df[
        "Result_FinancialStatement OperatingIncome"]
    fin_df["r_ope_income"] = fin_df["r_ope_income"].ffill()

    # --- r_ord_income ---
    fin_df["r_ord_income"] = np.nan
    fin_df.loc[((fin_df["Result_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_result_period_end"]) & (
            fin_df["Result_FinancialStatement ReportType"] != "1Q")), "r_ord_income"] = fin_df[
        "Result_FinancialStatement OrdinaryIncome"].diff(1)
    fin_df.loc[((fin_df["Result_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_result_period_end"]) & (
            fin_df["Result_FinancialStatement ReportType"] == "1Q")), "r_ord_income"] = fin_df[
        "Result_FinancialStatement OrdinaryIncome"]
    fin_df["r_ord_income"] = fin_df["r_ord_income"].ffill()

    # --- r_net_income ---
    fin_df["r_net_income"] = np.nan
    fin_df.loc[((fin_df["Result_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_result_period_end"]) & (
            fin_df["Result_FinancialStatement ReportType"] != "1Q")), "r_net_income"] = fin_df[
        "Result_FinancialStatement NetIncome"].diff(1)
    fin_df.loc[((fin_df["Result_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_result_period_end"]) & (
            fin_df["Result_FinancialStatement ReportType"] == "1Q")), "r_net_income"] = fin_df[
        "Result_FinancialStatement NetIncome"]
    fin_df["r_net_income"] = fin_df["r_net_income"].ffill()

    # --- pre_forcast_period_end ---
    fin_df["pre_forecast_period_end"] = fin_df["Forecast_FinancialStatement FiscalPeriodEnd"].shift(1)

    # --- f_sales ---
    fin_df["f_sales"] = np.nan
    fin_df.loc[((fin_df["Forecast_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_forecast_period_end"]) & (
            fin_df["Forecast_FinancialStatement ReportType"] != "1Q")), "f_sales"] = fin_df[
        "Forecast_FinancialStatement NetSales"].diff(1)
    fin_df.loc[((fin_df["Forecast_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_forecast_period_end"]) & (
            fin_df["Forecast_FinancialStatement ReportType"] == "1Q")), "f_sales"] = fin_df[
        "Forecast_FinancialStatement NetSales"]
    fin_df["f_sales"] = fin_df["f_sales"].ffill()

    # --- f_ope_income ---
    fin_df["f_ope_income"] = np.nan
    fin_df.loc[((fin_df["Forecast_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_forecast_period_end"]) & (
            fin_df["Forecast_FinancialStatement ReportType"] != "1Q")), "f_ope_income"] = fin_df[
        "Forecast_FinancialStatement OperatingIncome"].diff(1)
    fin_df.loc[((fin_df["Forecast_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_forecast_period_end"]) & (
            fin_df["Forecast_FinancialStatement ReportType"] == "1Q")), "f_ope_income"] = fin_df[
        "Forecast_FinancialStatement OperatingIncome"]
    fin_df["f_ope_income"] = fin_df["f_ope_income"].ffill()

    # --- f_ord_income ---
    fin_df["f_ord_income"] = np.nan
    fin_df.loc[((fin_df["Forecast_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_forecast_period_end"]) & (
            fin_df["Forecast_FinancialStatement ReportType"] != "1Q")), "f_ord_income"] = fin_df[
        "Forecast_FinancialStatement OrdinaryIncome"].diff(1)
    fin_df.loc[((fin_df["Forecast_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_forecast_period_end"]) & (
            fin_df["Forecast_FinancialStatement ReportType"] == "1Q")), "f_ord_income"] = fin_df[
        "Forecast_FinancialStatement OrdinaryIncome"]
    fin_df["f_ord_income"] = fin_df["f_ord_income"].ffill()

    # --- f_net_income ---
    fin_df["f_net_income"] = np.nan
    fin_df.loc[((fin_df["Forecast_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_forecast_period_end"]) & (
            fin_df["Forecast_FinancialStatement ReportType"] != "1Q")), "f_net_income"] = fin_df[
        "Forecast_FinancialStatement NetIncome"].diff(1)
    fin_df.loc[((fin_df["Forecast_FinancialStatement FiscalPeriodEnd"] != fin_df["pre_forecast_period_end"]) & (
            fin_df["Forecast_FinancialStatement ReportType"] == "1Q")), "f_net_income"] = fin_df[
        "Forecast_FinancialStatement NetIncome"]
    fin_df["f_net_income"] = fin_df["f_net_income"].ffill()

    # --------------------
    fin_df["r_expense1"] = fin_df["r_sales"] - fin_df["r_ope_income"]
    fin_df["r_expense2"] = fin_df["r_ope_income"] - fin_df["r_ord_income"]
    fin_df["r_expense3"] = fin_df["r_ord_income"] - fin_df["r_net_income"]

    fin_df["f_expense1"] = fin_df["f_sales"] - fin_df["f_ope_income"]
    fin_df["f_expense2"] = fin_df["f_ope_income"] - fin_df["f_ord_income"]
    fin_df["f_expense3"] = fin_df["f_ord_income"] - fin_df["f_net_income"]

    fin_df["r_assets"] = fin_df["Result_FinancialStatement TotalAssets"]
    fin_df["r_equity"] = fin_df["Result_FinancialStatement NetAssets"]

    # 現在 J-Quants APIからは取れなさそう
    # fin_df["operating_cf"] = fin_df["Result_FinancialStatement CashFlowsFromOperatingActivities"]
    # fin_df["financial_cf"] = fin_df["Result_FinancialStatement CashFlowsFromFinancingActivities"]
    # fin_df["investing_cf"] = fin_df["Result_FinancialStatement CashFlowsFromInvestingActivities"]

    feat2 = ["r_sales", "r_ope_income", "r_ord_income", "r_net_income", "f_sales", "f_ope_income", "f_ord_income",
            "f_net_income",
            "r_expense1", "r_expense2", "r_expense3", "f_expense1", "f_expense2", "f_expense3",
            "r_assets", "r_equity",] #"operating_cf", "financial_cf", "investing_cf"]


    # --- 複合指標　原系列 ---
    # ------ 純利益系 ------
    fin_df["r_pm1"] = fin_df["Result_FinancialStatement NetIncome"] / fin_df["Result_FinancialStatement NetSales"]
    fin_df["r_roe1"] = fin_df["Result_FinancialStatement NetIncome"] / fin_df["Result_FinancialStatement NetAssets"]
    fin_df["r_roa1"] = fin_df["Result_FinancialStatement NetIncome"] / fin_df[
        "Result_FinancialStatement TotalAssets"]

    fin_df["f_pm1"] = fin_df["Forecast_FinancialStatement NetIncome"] / fin_df[
        "Forecast_FinancialStatement NetSales"]
    fin_df["f_roe1"] = fin_df["Forecast_FinancialStatement NetIncome"] / fin_df[
        "Result_FinancialStatement NetAssets"]
    fin_df["f_roa1"] = fin_df["Forecast_FinancialStatement NetIncome"] / fin_df[
        "Result_FinancialStatement TotalAssets"]

    # 経常利益系
    fin_df["r_pm2"] = fin_df["Result_FinancialStatement OrdinaryIncome"] / fin_df[
        "Result_FinancialStatement NetSales"]
    fin_df["r_roe2"] = fin_df["Result_FinancialStatement OrdinaryIncome"] / fin_df[
        "Result_FinancialStatement NetAssets"]
    fin_df["r_roa2"] = fin_df["Result_FinancialStatement OrdinaryIncome"] / fin_df[
        "Result_FinancialStatement TotalAssets"]

    fin_df["f_pm2"] = fin_df["Forecast_FinancialStatement OrdinaryIncome"] / fin_df[
        "Forecast_FinancialStatement NetSales"]
    fin_df["f_roe2"] = fin_df["Forecast_FinancialStatement OrdinaryIncome"] / fin_df[
        "Result_FinancialStatement NetAssets"]
    fin_df["f_roa2"] = fin_df["Forecast_FinancialStatement OrdinaryIncome"] / fin_df[
        "Result_FinancialStatement TotalAssets"]

    # 営業利益系
    fin_df["r_pm3"] = fin_df["Result_FinancialStatement OperatingIncome"] / fin_df[
        "Result_FinancialStatement NetSales"]
    fin_df["r_roe3"] = fin_df["Result_FinancialStatement OperatingIncome"] / fin_df[
        "Result_FinancialStatement NetAssets"]
    fin_df["r_roa3"] = fin_df["Result_FinancialStatement OperatingIncome"] / fin_df[
        "Result_FinancialStatement TotalAssets"]

    fin_df["f_pm3"] = fin_df["Forecast_FinancialStatement OperatingIncome"] / fin_df[
        "Forecast_FinancialStatement NetSales"]
    fin_df["f_roe3"] = fin_df["Forecast_FinancialStatement OperatingIncome"] / fin_df[
        "Result_FinancialStatement NetAssets"]
    fin_df["f_roa3"] = fin_df["Forecast_FinancialStatement OperatingIncome"] / fin_df[
        "Result_FinancialStatement TotalAssets"]

    # コスト
    fin_df["r_cost1"] = ((fin_df["Result_FinancialStatement NetSales"] - fin_df[
        "Result_FinancialStatement OperatingIncome"]) / fin_df["Result_FinancialStatement NetSales"])
    fin_df["r_cost2"] = ((fin_df["Result_FinancialStatement OperatingIncome"] - fin_df[
        "Result_FinancialStatement OrdinaryIncome"]) / fin_df["Result_FinancialStatement NetSales"])
    fin_df["r_cost3"] = ((fin_df["Result_FinancialStatement OrdinaryIncome"] - fin_df[
        "Result_FinancialStatement NetIncome"]) / fin_df["Result_FinancialStatement NetSales"])

    fin_df["f_cost1"] = ((fin_df["Forecast_FinancialStatement NetSales"] - fin_df[
        "Forecast_FinancialStatement OperatingIncome"]) / fin_df["Forecast_FinancialStatement NetSales"])
    fin_df["f_cost2"] = ((fin_df["Forecast_FinancialStatement OperatingIncome"] - fin_df[
        "Forecast_FinancialStatement OrdinaryIncome"]) / fin_df["Forecast_FinancialStatement NetSales"])
    fin_df["f_cost3"] = ((fin_df["Forecast_FinancialStatement OrdinaryIncome"] - fin_df[
        "Forecast_FinancialStatement NetIncome"]) / fin_df["Forecast_FinancialStatement NetSales"])

    # 売上高回転率
    fin_df["r_turn"] = fin_df["Result_FinancialStatement NetSales"] / fin_df[
        "Result_FinancialStatement TotalAssets"]
    fin_df["f_turn"] = fin_df["Forecast_FinancialStatement NetSales"] / fin_df[
        "Result_FinancialStatement TotalAssets"]

    # 財務健全性
    fin_df["equity_ratio"] = fin_df["Result_FinancialStatement NetAssets"] / fin_df[
        "Result_FinancialStatement TotalAssets"]

    # 総資本キャッシュフロー比率 --- 現在J-Quants APIからは取得できなさそう
    # fin_df["o_cf_ratio"] = (fin_df["Result_FinancialStatement CashFlowsFromOperatingActivities"] / fin_df[
    #     "Result_FinancialStatement TotalAssets"])
    # fin_df["f_cf_ratio"] = (fin_df["Result_FinancialStatement CashFlowsFromFinancingActivities"] / fin_df[
    #     "Result_FinancialStatement TotalAssets"])
    # fin_df["i_cf_ratio"] = (fin_df["Result_FinancialStatement CashFlowsFromInvestingActivities"] / fin_df[
    #     "Result_FinancialStatement TotalAssets"])

    feat3 = ["r_pm1", "r_roe1", "r_roa1", "f_pm1", "f_roe1", "f_roa1",
             "r_pm2", "r_roe2", "r_roa2", "f_pm2", "f_roe2", "f_roa2",
             "r_pm3", "r_roe3", "r_roa3", "f_pm3", "f_roe3", "f_roa3",
             "r_cost1", "r_cost2", "r_cost3", "f_cost1", "f_cost2", "f_cost3",
             "r_turn", "f_turn", "equity_ratio", ] # "o_cf_ratio", "f_cf_ratio", "i_cf_ratio"]

    # Inf値をNan値化
    fin_df = fin_df.replace([np.inf, -np.inf], np.nan)

    # 差分系列
    d_feat2 = []

    for f in feat2:
        fin_df["d_" + f] = fin_df[f].diff(1)
        d_feat2.append("d_" + f)

    d_feat3 = []
    for f in feat3:
        fin_df["d_" + f] = fin_df[f].diff(1)
        d_feat3.append("d_" + f)

    d_feat4 = ["m_sales", "m_ope_income", "m_ord_income", "m_net_income", "m_expense1", "m_expense2", "m_expense3",
               "m_pm1", "m_pm2", "m_pm3", "m_roe1", "m_roe2", "m_roe3", "m_roa1", "m_roa2", "m_roa3",
               "m_cost1", "m_cost2", "m_cost3"]

    fin_df["m_sales"] = fin_df["r_sales"] - fin_df["f_sales"].shift(1)
    fin_df["m_ope_income"] = fin_df["r_ope_income"] - fin_df["f_ope_income"].shift(1)
    fin_df["m_ord_income"] = fin_df["r_ord_income"] - fin_df["f_ord_income"].shift(1)
    fin_df["m_net_income"] = fin_df["r_net_income"] - fin_df["f_net_income"].shift(1)
    fin_df["m_expense1"] = fin_df["r_expense1"] - fin_df["f_expense1"].shift(1)
    fin_df["m_expense2"] = fin_df["r_expense2"] - fin_df["f_expense2"].shift(1)
    fin_df["m_expense3"] = fin_df["r_expense3"] - fin_df["f_expense3"].shift(1)

    fin_df["m_pm1"] = fin_df["r_pm1"] - fin_df["f_pm1"].shift(1)
    fin_df["m_pm2"] = fin_df["r_pm2"] - fin_df["f_pm2"].shift(1)
    fin_df["m_pm3"] = fin_df["r_pm3"] - fin_df["f_pm3"].shift(1)
    fin_df["m_roe1"] = fin_df["r_roe1"] - fin_df["f_roe1"].shift(1)
    fin_df["m_roe2"] = fin_df["r_roe2"] - fin_df["f_roe2"].shift(1)
    fin_df["m_roe3"] = fin_df["r_roe3"] - fin_df["f_roe3"].shift(1)
    fin_df["m_roa1"] = fin_df["r_roa1"] - fin_df["f_roa1"].shift(1)
    fin_df["m_roa2"] = fin_df["r_roa2"] - fin_df["f_roa2"].shift(1)
    fin_df["m_roa3"] = fin_df["r_roa3"] - fin_df["f_roa3"].shift(1)
    fin_df["m_cost1"] = fin_df["r_cost1"] - fin_df["f_cost1"].shift(1)
    fin_df["m_cost2"] = fin_df["r_cost2"] - fin_df["f_cost2"].shift(1)
    fin_df["m_cost3"] = fin_df["r_cost3"] - fin_df["f_cost3"].shift(1)

    feat = ["base_date", "Local Code"]
    feat.extend(feat1)
    feat.extend(feat2)
    feat.extend(feat3)
    feat.extend(d_feat2)
    feat.extend(d_feat3)
    feat.extend(d_feat4)

    col_names = ["datetime", "code"]
    col_names.extend(feat1)
    col_names.extend(feat2)
    col_names.extend(feat3)
    col_names.extend(d_feat2)
    col_names.extend(d_feat3)
    col_names.extend(d_feat4)

    fin_df = fin_df[feat]
    fin_df.columns = col_names
    fin_df["datetime"] = pd.to_datetime(fin_df["datetime"])
    fin_df = fin_df.set_index(["datetime", "code"])
    return fin_df

get_financial()

目的変数（ラベル）を生成しておきます。

In [None]:
def create_label_high_low(stock_code:int, target_date, lookaheads:List[int], df_price:pd.DataFrame):
   df_price = df_price.loc[(df_price["Local Code"] == stock_code) & (df_price["base_date"] <= target_date)]
   df_price.loc[:, "t_date"] = pd.to_datetime(df_price["base_date"], format="%Y/%m/%d")

   output_columns = ["Local Code"]
   for lookahead in lookaheads:
       output_columns.append("label_date_{}".format(lookahead))
       output_columns.append("label_high_{}".format(lookahead))
       output_columns.append("label_low_{}".format(lookahead))
       t_col = "label_date_{}".format(lookahead)
       df_price.loc[:, t_col] = df_price.loc[:, "t_date"].shift(-lookahead)

   if len(df_price) == 0:
       return pd.DataFrame(None, columns=output_columns)

   df_a_stock = df_price.loc[:, ["EndOfDayQuote ExchangeOfficialClose", "EndOfDayQuote High", "EndOfDayQuote Low"]].copy()

   df_a_stock.loc[df_a_stock.loc[:, "EndOfDayQuote High"] == 0.0] = np.nan
   df_a_stock.loc[df_a_stock.loc[:, "EndOfDayQuote Low"] == 0.0] = np.nan

   for lookahead in lookaheads:
       df_high_high = df_a_stock.loc[:, "EndOfDayQuote High"].rolling(lookahead).max()
       df_high_high = df_high_high.shift(-lookahead)
       df_high_high_diff = df_high_high - df_price.loc[:, "EndOfDayQuote ExchangeOfficialClose"]
       df_price.loc[:, "label_high_{}".format(lookahead)] = df_high_high_diff / df_price.loc[:, "EndOfDayQuote ExchangeOfficialClose"]

       df_low_low = df_a_stock.loc[:, "EndOfDayQuote Low"].rolling(lookahead).min()
       df_low_low = df_low_low.shift(-lookahead)
       df_low_low_diff = df_low_low - df_price.loc[:, "EndOfDayQuote ExchangeOfficialClose"]
       df_price.loc[:, "label_low_{}".format(lookahead)] = df_low_low_diff / df_price.loc[:, "EndOfDayQuote ExchangeOfficialClose"]

   df_price.replace(np.inf, np.nan, inplace=True)
   return df_price.loc[:, output_columns]

def create_delivery_label_high_low(stock_codes:List[int], target_date, lookaheads:List[int], df_price:pd.DataFrame):
   buff = []
   for stock_code in stock_codes:
       df = create_label_high_low(stock_code, target_date, lookaheads, df_price)
       buff.append(df)
   df_labels = pd.concat(buff)
   return df_labels

def output_stock_labels(stock_labels_csvfile_path:str, df_labels:pd.DataFrame, output_start_dt, end_dt):
   df_labels = df_labels.loc[df_labels.index <= end_dt].copy()
   df_labels.index.name = "base_date"
   df_labels_output = df_labels.loc[(df_labels.index >= output_start_dt) & (df_labels.index <= end_dt)]
   label_output_columns = [
       "Local Code",
       "label_date_5",
       "label_high_5",
       "label_low_5",
       "label_date_10",
       "label_high_10",
       "label_low_10",
       "label_date_20",
       "label_high_20",
       "label_low_20",
   ]
   df_labels_output.to_csv(stock_labels_csvfile_path, compression="gzip", float_format="%.5f", columns=label_output_columns)


df_price = stock_price.copy()
stock_codes = sorted(df_price["Local Code"].unique())

target_date = pd.Timestamp("2022-07-24", tz="Asia/Tokyo").to_datetime64() #pd.Timestamp.now(tz="Asia/Tokyo")
lookaheads = [5, 10, 20]
stock_labels = create_delivery_label_high_low(stock_codes, target_date, lookaheads, df_price)
stock_labels["base_date"] = df_price["base_date"]

output_start_dt = pd.Timestamp("2017-07-24", tz="Asia/Tokyo").to_datetime64()
output_stock_labels(stock_labels_csvfile_path, stock_labels, output_start_dt, target_date)

stock_labels

[1;30;43mストリーミング出力は最後の 5000 行に切り捨てられました。[0m
  self.obj[key] = value
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  method=method,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  method=method,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-do

生成した特徴量を用いて訓練を行います。

In [None]:
def get_df_merge(stock_price:pd.DataFrame, stock_fin:pd.DataFrame, stock_labels:pd.DataFrame, train:bool=True):
    df_technical = []
    for code in stock_codes:
        df_technical.append(get_technical(stock_price, code))
    df_technical = pd.concat(df_technical)

    df_financial = []
    for code in stock_codes:
        df_financial.append(get_financial(stock_fin, code))
    df_financial = pd.concat(df_financial)

    if train:
        df_label = stock_labels.copy()
        feat = ["base_date", "Local Code", "label_high_20", "label_low_20"]
        df_label = df_label[feat]
        df_label.columns = ["datetime", "code", "label_high_20", "label_low_20"]
        df_label["datetime"] = pd.to_datetime(df_label["datetime"])
        df_label = df_label.set_index(["datetime", "code"])

        df_merge = pd.concat([df_financial,
                              df_technical[df_technical.index.isin(df_financial.index)],
                              df_label[df_label.index.isin(df_financial.index)]
                              ], axis=1)
    else:
        df_merge = pd.concat([df_financial,
                              df_technical[df_technical.index.isin(df_financial.index)],
                              ], axis=1)

    df_merge = df_merge.reset_index()
    return df_merge

def get_df_for_ml(stock_price:pd.DataFrame, stock_fin:pd.DataFrame, stock_labels:pd.DataFrame, train=True):
    df_merge = get_df_merge(stock_price, stock_fin, stock_labels, train=train)
    df_merge = df_merge.replace([np.inf, -np.inf], np.nan)
    df_merge = df_merge.fillna(0)
    return df_merge

def get_model(model_path="/tmp/marketdata/model"):
    models = {}
    labels = ["model_h_final", "model_l_final"]
    for label in labels:
        m = os.path.join(model_path, f"my_model_{label}.pkl")
        with open(m, "rb") as f:
            models[label] = pickle.load(f)
    return models["model_h_final"], models["model_l_final"]


def save_model(model, label, model_path):
    os.makedirs(model_path, exist_ok=True)
    with open(os.path.join(model_path, f"my_model_{label}.pkl"), "wb") as f:
        pickle.dump(model, f)


def get_predict(df_for_ml, models_h, models_l):
    tmp_df = df_for_ml.copy()

    x_feats = [f for f in tmp_df.columns if f not in ["datetime", "code", "label_high_20", "label_low_20"]]

    tmp_df["pred_high"] = models_h.predict(tmp_df[x_feats])
    tmp_df["pred_low"] = models_l.predict(tmp_df[x_feats])

    tmp_df = tmp_df.set_index("datetime")
    cols = ["code", "pred_high", "pred_low"]
    tmp_df = tmp_df[cols]
    tmp_df.columns = ["code", "label_high_20", "label_low_20"]

    return tmp_df

def train_and_save_model(stock_price:pd.DataFrame, stock_fin:pd.DataFrame, stock_labels:pd.DataFrame, model_path):
    from xgboost.sklearn import XGBRegressor
    # 特徴量を作成
    df_for_ml = get_df_for_ml(stock_price, stock_fin, stock_labels, train=True)


    train_df = df_for_ml[df_for_ml["datetime"] <= TRAIN_END].copy()

    model_h_final = XGBRegressor(max_depth=6, learning_rate=0.01, n_estimators=3000, n_jobs=-1,
                                 colsample_bytree=0.1, random_state=0)
    model_l_final = XGBRegressor(max_depth=6, learning_rate=0.01, n_estimators=3000, n_jobs=-1,
                                 colsample_bytree=0.1, random_state=0)

    x_feats = [f for f in df_for_ml.columns if f not in ["datetime", "code", "label_high_20", "label_low_20"]]
    y_labels = ["label_high_20", "label_low_20"]

    model_h_final.fit(train_df[x_feats], train_df["label_high_20"])
    model_l_final.fit(train_df[x_feats], train_df["label_low_20"])

    save_model(model_h_final, "model_h_final", model_path=model_path)
    save_model(model_l_final, "model_l_final", model_path=model_path)

train_and_save_model(stock_price, stock_fin, df_labels, model_path=model_path)

predictを行います。

In [None]:
def predict(stock_price:pd.DataFrame, stock_fin:pd.DataFrame, stock_labels:pd.DataFrame):
    # 特徴量を作成
    df_for_ml = get_df_for_ml(stock_price, stock_fin, stock_labels, train=False)

    # 訓練および予測
    models_h, models_l = get_model(model_path)
    df = get_predict(df_for_ml, models_h, models_l)
    df.loc[:, "code"] = df.index.strftime("%Y-%m-%d-") + df.loc[:, "code"].astype(str)

    # 出力対象列を定義
    output_columns = ["code", "label_high_20", "label_low_20"]
    out = io.StringIO()
    df.to_csv(out, header=False, index=False, columns=output_columns)
    # df.to_csv("test_submit.csv", index=False)

    return out.getvalue()

predict(stock_price, stock_fin, df_labels)

'2017-02-10-13010,0.1560088,0.0032594586\n2017-02-17-13010,0.13611506,0.0016471758\n2017-05-11-13010,0.044564642,-0.026186232\n2017-08-04-13010,0.055419262,-0.015894793\n2017-11-06-13010,0.022355927,-0.0863256\n2018-02-09-13010,0.03755246,-0.0435152\n2018-02-09-13010,0.03818148,-0.043264728\n2018-05-10-13010,0.011141784,-0.03375414\n2018-08-03-13010,0.045905795,-0.03687029\n2018-11-05-13010,0.10179619,-0.014346857\n2019-02-08-13010,0.078215554,0.005704828\n2019-05-13-13010,0.043551955,-0.06281043\n2019-08-02-13010,0.006264968,-0.086346745\n2019-11-05-13010,0.008645722,-0.03810172\n2019-11-05-13010,0.008402585,-0.038013235\n2020-02-07-13010,0.069841325,-0.0047561857\n2020-05-12-13010,0.056083035,-0.03464483\n2020-08-07-13010,0.07780329,-0.023511099\n2020-11-06-13010,0.07897522,-0.027603304\n2021-02-05-13010,0.07649437,-0.026230797\n2021-05-14-13010,0.06413278,-0.04517928\n2021-08-06-13010,0.07205637,-0.015331292\n2021-11-05-13010,0.06738146,-0.023249581\n2022-02-04-13010,0.06097549,-0.0

これでチュートリアルは終了です。お疲れ様でした！