# Table of Content

1. Introduction
2. Setup and Import
3. Getting to know the Data
4. Visualisation


# Introduction

-> Picture

Success in any financial market requires one to identify solid investments. When a stock or derivative is undervalued, it makes sense to buy. If it's overvalued, perhaps it's time to sell. While these finance decisions were historically made manually by professionals, technology has ushered in new opportunities for retail investors. Data scientists, specifically, may be interested to explore quantitative trading, where decisions are executed programmatically based on predictions from trained models.

There are plenty of existing quantitative trading efforts used to analyze financial markets and formulate investment strategies. To create and execute such a strategy requires both historical and real-time data, which is difficult to obtain especially for retail investors. This competition will provide financial data for the Japanese market, allowing retail investors to analyze the market to the fullest extent.

Japan Exchange Group, Inc. (JPX) is a holding company operating one of the largest stock exchanges in the world, Tokyo Stock Exchange (TSE), and derivatives exchanges Osaka Exchange (OSE) and Tokyo Commodity Exchange (TOCOM). JPX is hosting this competition and is supported by AI technology company AlpacaJapan Co.,Ltd.

This competition on Kaggle will compare our final model against real future returns after the training phase is complete.



## Setup and Import

As always, the first step is to import the required libraries and data. Since we do not want to run the SQL query every time, we can simply import the csv file we created in the first notebook.

In [1]:
# Import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import altair as alt
import numpy as np

from ipywidgets import HTML
from io import BytesIO
import base64

import warnings
warnings.simplefilter("ignore")

# Turn off the max column width so the images won't be truncated
pd.set_option('display.max_colwidth', None)
# Show all Columns
pd.set_option('display.max_columns', None)
 
# Turning off the max column will display all the data
# if gathering into sets / array we might want to restrict to a few items
pd.set_option('display.max_seq_items', 3)


In [2]:
# Import dataframes for Stock Prices
stock_price_df = pd.read_csv('../data/train_files/stock_prices.csv', parse_dates=['Date'])
sec_df = pd.read_csv('../data/train_files/secondary_stock_prices.csv', parse_dates=['Date'])
tra_df = pd.read_csv('../data/train_files/trades.csv', parse_dates=['Date'])

stock_desc_df = pd.read_csv('../data/stock_price_spec.csv')
stock_list_desc_df = pd.read_csv('../data/stock_list_spec.csv')
stock_list = pd.read_csv('../data/stock_list.csv')

In [3]:
stock_price_df.head(5)

Unnamed: 0,RowId,Date,SecuritiesCode,Open,High,Low,Close,Volume,AdjustmentFactor,ExpectedDividend,SupervisionFlag,Target
0,20170104_1301,2017-01-04,1301,2734.0,2755.0,2730.0,2742.0,31400,1.0,,False,0.00073
1,20170104_1332,2017-01-04,1332,568.0,576.0,563.0,571.0,2798500,1.0,,False,0.012324
2,20170104_1333,2017-01-04,1333,3150.0,3210.0,3140.0,3210.0,270800,1.0,,False,0.006154
3,20170104_1376,2017-01-04,1376,1510.0,1550.0,1510.0,1550.0,11300,1.0,,False,0.011053
4,20170104_1377,2017-01-04,1377,3270.0,3350.0,3270.0,3330.0,150800,1.0,,False,0.003026


In [4]:
stock_list.head()

Unnamed: 0,SecuritiesCode,EffectiveDate,Name,Section/Products,NewMarketSegment,33SectorCode,33SectorName,17SectorCode,17SectorName,NewIndexSeriesSizeCode,NewIndexSeriesSize,TradeDate,Close,IssuedShares,MarketCapitalization,Universe0
0,1301,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928280.0,33659110000.0,True
1,1305,20211230,Daiwa ETF-TOPIX,ETFs/ ETNs,,-,-,-,-,-,-,20211230.0,2097.0,3634636000.0,7621831000000.0,False
2,1306,20211230,NEXT FUNDS TOPIX Exchange Traded Fund,ETFs/ ETNs,,-,-,-,-,-,-,20211230.0,2073.5,7917718000.0,16417390000000.0,False
3,1308,20211230,Nikko Exchange Traded Index Fund TOPIX,ETFs/ ETNs,,-,-,-,-,-,-,20211230.0,2053.0,3736943000.0,7671945000000.0,False
4,1309,20211230,NEXT FUNDS ChinaAMC SSE50 Index Exchange Traded Fund,ETFs/ ETNs,,-,-,-,-,-,-,20211230.0,44280.0,72632.0,3216145000.0,False


In [5]:
stock_desc_df 


Unnamed: 0,Column,Sample value,Type,Addendum,Remarks
0,RowId,20170104_1301,string,,Unique ID of price records
1,Date,2017-01-04 0:00:00,date,,Trade date
2,SecuritiesCode,1301,Int64,,Local securities code
3,Open,2734,float,,first traded price on a day
4,High,2755,float,,highest traded price on a day
5,Low,2730,float,,lowest traded price on a day
6,Close,2742,float,,last traded price on a day
7,Volume,31400,Int64,,number of traded stocks on a day
8,AdjustmentFactor,1,float,,to calculate theoretical price/volume when split/reverse-split happens (NOT including dividend/allotment of shares/)
9,SupervisionFlag,FALSE,boolean,,Flag of Securities Under Supervision & Securities to Be Delisted\nhttps://www.jpx.co.jp/english/listing/market-alerts/supervision/00-archives/index.html )


In [6]:
stock_list_desc_df

Unnamed: 0,Column,Sample value,Type,Addendum,Remarks
0,SecuritiesCode,1301,Int64,,Local Securities Code
1,EffectiveDate,20211230,date,,the effective date
2,Name,"KYOKUYO CO.,LTD.",string,,Name of security
3,Section/Products,First Section (Domestic),string,,Section/Product
4,NewMarketSegment,Prime Market,string,,New market segment effective from 2022-04-04 (as of 15:30 JST on Mar 11 2022)\nref. https://www.jpx.co.jp/english/equities/market-restructure/market-segments/index.html
5,33SectorCode,50,Int64,,33 Sector Name\n\nref. https://www.jpx.co.jp/english/markets/indices/line-up/files/e_fac_13_sector.pdf
6,33SectorName,"Fishery, Agriculture and Forestry",string,,33 Sector Name\n\nref. https://www.jpx.co.jp/english/markets/indices/line-up/files/e_fac_13_sector.pdf
7,17SectorCode,1,Int64,,17 Sector Code\nref. https://www.jpx.co.jp/english/markets/indices/line-up/files/e_fac_13_sector.pdf
8,17SectorName,FOODS,string,,17 Sector Name\nref. https://www.jpx.co.jp/english/markets/indices/line-up/files/e_fac_13_sector.pdf
9,NewIndexSeriesSizeCode,7,Int64,,TOPIX New Index Series code\n\nref. https://www.jpx.co.jp/english/markets/indices/line-up/files/e_fac_12_size.pdf


## Additional Features

### Metrics
BOP: Balance of Power = (Close price – Open price) / (High price – Low price) <br>
WP: weighted Price = (Close price + Open price + High price + Low price) / 4 <br>
HLr: High-Low-Range = High price – Low price <br>
OCr: Open-Close-Range = Close price – Open price <br>
OC: Open-Close = Close price * Open price <br>
HL: High-Low = High price * Low price <br>
logC: logarithmic scaled Close price <br>
logR: log(Close price) - log(Open price) <br>
OHLCstd: Open, High, Low, Close, standard deviated. <br>
OHLCskew: Open, High, Low, Close, skewed. <br>
OHLCkur: Open, High, Low, Close, kurtosis. <br>
Cpos: = (Close price – Low price) / (High price – Low price) -0.5 <br>
Opos: = (Open price – Low price) / (High price – Low price) -0.5 <br>
bsforce: = Cpos * Volume <br>
    
    
### Weekdays
Weekday = Day of the Week (1-5) <br>
Monday = if Monday == 1 <br>
Tuesday = if Tuesday == 1 <br>
Wednesday = if Wednesday == 1 <br>
Thursday = if Thursday == 1 <br>
Friday = if Fryday == 1 <br>
Date = Date

In [7]:
def FE(stock_price_df):
    stock_price_df['BOP'] = (stock_price_df['Open']-stock_price_df['Close'])/(stock_price_df['High']-stock_price_df['Low'])
    stock_price_df['WP'] = (stock_price_df['Open']+stock_price_df['Close']+stock_price_df['High']+stock_price_df['Low'])/4
    stock_price_df['HLr'] = stock_price_df['High'] - stock_price_df['Low']
    stock_price_df['OCr'] = stock_price_df['Open'] - stock_price_df['Close']
    stock_price_df['OC'] = stock_price_df['Open'] * stock_price_df['Close']
    stock_price_df['HL'] = stock_price_df['High'] * stock_price_df['Low']
    stock_price_df['logC'] = np.log(stock_price_df['Close']+1)
    stock_price_df['logR'] = np.log(stock_price_df['Close'])-np.log(stock_price_df['Open'])
    stock_price_df['OHLCstd'] = stock_price_df[['Open','High','Low','Close']].std(axis=1)
    stock_price_df['OHLCskew'] = stock_price_df[['Open','High','Low','Close']].skew(axis=1)
    stock_price_df['OHLCkur'] = stock_price_df[['Open','High','Low','Close']].kurtosis(axis=1)

    stock_price_df['Cpos'] = (stock_price_df['Close']-stock_price_df['Low'])/(stock_price_df['High']-stock_price_df['Low']) -0.5
    stock_price_df['bsforce'] = stock_price_df['Cpos'] * stock_price_df['Volume']
    stock_price_df['Opos'] = (stock_price_df['Open']-stock_price_df['Low'])/(stock_price_df['High']-stock_price_df['Low']) -0.5
    
    stock_price_df['weekday'] = stock_price_df['Date'].dt.weekday+1
    stock_price_df['Monday'] = np.where(stock_price_df['weekday']==1,1,0)
    stock_price_df['Tuesday'] = np.where(stock_price_df['weekday']==2,1,0)
    stock_price_df['Wednesday'] = np.where(stock_price_df['weekday']==3,1,0)
    stock_price_df['Thursday'] = np.where(stock_price_df['weekday']==4,1,0)
    stock_price_df['Friday'] = np.where(stock_price_df['weekday']==5,1,0)
    stock_price_df['Date'] = pd.to_datetime(stock_price_df['Date'])

    

    return stock_price_df
stock_price_df = FE(stock_price_df)
stock_price_df = pd.merge(stock_price_df,stock_list, on='SecuritiesCode')

In [8]:
stock_price_df.head(10)

Unnamed: 0,RowId,Date,SecuritiesCode,Open,High,Low,Close_x,Volume,AdjustmentFactor,ExpectedDividend,SupervisionFlag,Target,BOP,WP,HLr,OCr,OC,HL,logC,logR,weekday,Monday,Tuesday,Wednesday,Thursday,Friday,OHLCstd,OHLCskew,OHLCkur,Cpos,bsforce,Opos,EffectiveDate,Name,Section/Products,NewMarketSegment,33SectorCode,33SectorName,17SectorCode,17SectorName,NewIndexSeriesSizeCode,NewIndexSeriesSize,TradeDate,Close_y,IssuedShares,MarketCapitalization,Universe0
0,20170104_1301,2017-01-04,1301,2734.0,2755.0,2730.0,2742.0,31400,1.0,,False,0.00073,-0.32,2740.25,25.0,-8.0,7496628.0,7521150.0,7.916807,0.002922,3,0,0,1,0,0,11.026483,0.94153,0.008495,-0.02,-628.0,-0.34,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
1,20170105_1301,2017-01-05,1301,2743.0,2747.0,2735.0,2738.0,17900,1.0,,False,0.00292,0.416667,2740.75,12.0,5.0,7510334.0,7513045.0,7.915348,-0.001824,4,0,0,0,1,0,5.315073,0.198134,-2.215052,-0.25,-4475.0,0.166667,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
2,20170106_1301,2017-01-06,1301,2734.0,2744.0,2720.0,2740.0,19900,1.0,,False,-0.001092,-0.25,2734.5,24.0,-6.0,7491160.0,7463680.0,7.916078,0.002192,5,0,0,0,0,1,10.503968,-1.16486,1.085094,0.333333,6633.333333,0.083333,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
3,20170110_1301,2017-01-10,1301,2745.0,2754.0,2735.0,2748.0,24200,1.0,,False,-0.0051,-0.157895,2745.5,19.0,-3.0,7543260.0,7532190.0,7.918992,0.001092,2,0,1,0,0,0,7.937254,-0.703934,1.12522,0.184211,4457.894737,0.026316,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
4,20170111_1301,2017-01-11,1301,2748.0,2752.0,2737.0,2745.0,9300,1.0,,False,-0.003295,0.2,2745.5,15.0,3.0,7543260.0,7532224.0,7.917901,-0.001092,3,0,0,1,0,0,6.350853,-0.843252,0.933953,0.033333,310.0,0.233333,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
5,20170112_1301,2017-01-12,1301,2745.0,2747.0,2703.0,2731.0,28700,1.0,,False,-0.006613,0.318182,2731.5,44.0,14.0,7496595.0,7425141.0,7.912789,-0.005113,4,0,0,0,1,0,20.28957,-1.354079,1.2654,0.136364,3913.636364,0.454545,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
6,20170113_1301,2017-01-13,1301,2707.0,2730.0,2707.0,2722.0,19400,1.0,,False,-0.006657,-0.652174,2716.5,23.0,-15.0,7368454.0,7390110.0,7.909489,0.005526,5,0,0,0,0,1,11.445523,0.405505,-3.706427,0.152174,2952.173913,-0.5,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
7,20170116_1301,2017-01-16,1301,2725.0,2725.0,2696.0,2704.0,20100,1.0,,False,0.002978,0.724138,2712.5,29.0,21.0,7368400.0,7346600.0,7.902857,-0.007736,1,1,0,0,0,0,14.798649,-0.246845,-4.592189,-0.224138,-4505.172414,0.5,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
8,20170117_1301,2017-01-17,1301,2702.0,2704.0,2682.0,2686.0,18400,1.0,,False,0.001856,0.727273,2693.5,22.0,16.0,7257572.0,7252128.0,7.896181,-0.005939,2,0,1,0,0,0,11.120552,-0.082895,-5.211209,-0.318182,-5854.545455,0.409091,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
9,20170118_1301,2017-01-18,1301,2689.0,2695.0,2681.0,2694.0,12100,1.0,,False,0.014079,-0.357143,2689.75,14.0,-5.0,7244166.0,7225295.0,7.899153,0.001858,3,0,0,1,0,0,6.396614,-1.143362,0.333846,0.428571,5185.714286,0.071429,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,50,"Fishery, Agriculture and Forestry",1,FOODS,7,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True


In [9]:
df = stock_price_df.drop(columns=['17SectorCode','NewIndexSeriesSizeCode', '33SectorCode', 'ExpectedDividend', 'RowId'])

In [10]:
df.shape

(2332531, 42)

In [12]:
df.head()

Unnamed: 0,Date,SecuritiesCode,Open,High,Low,Close_x,Volume,AdjustmentFactor,SupervisionFlag,Target,BOP,WP,HLr,OCr,OC,HL,logC,logR,weekday,Monday,Tuesday,Wednesday,Thursday,Friday,OHLCstd,OHLCskew,OHLCkur,Cpos,bsforce,Opos,EffectiveDate,Name,Section/Products,NewMarketSegment,33SectorName,17SectorName,NewIndexSeriesSize,TradeDate,Close_y,IssuedShares,MarketCapitalization,Universe0
0,2017-01-04,1301,2734.0,2755.0,2730.0,2742.0,31400,1.0,False,0.00073,-0.32,2740.25,25.0,-8.0,7496628.0,7521150.0,7.916807,0.002922,3,0,0,1,0,0,11.026483,0.94153,0.008495,-0.02,-628.0,-0.34,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,"Fishery, Agriculture and Forestry",FOODS,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
1,2017-01-05,1301,2743.0,2747.0,2735.0,2738.0,17900,1.0,False,0.00292,0.416667,2740.75,12.0,5.0,7510334.0,7513045.0,7.915348,-0.001824,4,0,0,0,1,0,5.315073,0.198134,-2.215052,-0.25,-4475.0,0.166667,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,"Fishery, Agriculture and Forestry",FOODS,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
2,2017-01-06,1301,2734.0,2744.0,2720.0,2740.0,19900,1.0,False,-0.001092,-0.25,2734.5,24.0,-6.0,7491160.0,7463680.0,7.916078,0.002192,5,0,0,0,0,1,10.503968,-1.16486,1.085094,0.333333,6633.333333,0.083333,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,"Fishery, Agriculture and Forestry",FOODS,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
3,2017-01-10,1301,2745.0,2754.0,2735.0,2748.0,24200,1.0,False,-0.0051,-0.157895,2745.5,19.0,-3.0,7543260.0,7532190.0,7.918992,0.001092,2,0,1,0,0,0,7.937254,-0.703934,1.12522,0.184211,4457.894737,0.026316,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,"Fishery, Agriculture and Forestry",FOODS,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True
4,2017-01-11,1301,2748.0,2752.0,2737.0,2745.0,9300,1.0,False,-0.003295,0.2,2745.5,15.0,3.0,7543260.0,7532224.0,7.917901,-0.001092,3,0,0,1,0,0,6.350853,-0.843252,0.933953,0.033333,310.0,0.233333,20211230,"KYOKUYO CO.,LTD.",First Section (Domestic),Prime Market,"Fishery, Agriculture and Forestry",FOODS,TOPIX Small 2,20211230.0,3080.0,10928283.0,33659110000.0,True


In [13]:
df.shape

(2332531, 42)

## Notes
### Potential Features

- Earthquakes?