<h1 id='dataset-merger' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:center; font-size:240%;padding:0'>📝 Dataset Merger</h1>

<h1 id='0-settings' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>0 | Settings</h1>

In [18]:
# ---- Settings ----
import numpy as np # pip install numpy
import pandas as pd # pip install pandas
DATASETS_PATH = ('./datasets')

<h1 id='1-preparing-datasets' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>1 | Preparing Datasets</h1>

In [2]:
# ---- KNRI11 Dataset ----
knri_df = pd.read_csv(f'{DATASETS_PATH}/Dataset - KNRI11.csv')
knri_df.rename(columns={"Date": "date", "Close": "knri"}, inplace=True)
knri_df.date = pd.to_datetime(knri_df.date, format='%d/%m/%y %H:%M')
knri_df.set_index(keys=['date'], inplace=True)

knri_df.knri.replace(to_replace=',', value='.', regex=True, inplace=True)
knri_df.knri = pd.to_numeric(knri_df.knri)
knri_df.head()

Unnamed: 0_level_0,knri
date,Unnamed: 1_level_1
2016-01-04 16:56:00,106.0
2016-01-05 16:56:00,105.01
2016-01-06 16:56:00,108.35
2016-01-07 16:56:00,107.5
2016-01-08 16:56:00,106.5


In [3]:
# ---- HGLG11 Dataset ----
hglg_df = pd.read_csv(f'{DATASETS_PATH}/Dataset - HGLG11.csv')
hglg_df.rename(columns={"Date": "date", "Close": "hglg"}, inplace=True)
hglg_df.date = pd.to_datetime(hglg_df.date, format='%d/%m/%y %H:%M')
hglg_df.set_index(keys=['date'], inplace=True)

hglg_df.hglg.replace(to_replace=',', value='.', regex=True, inplace=True)
hglg_df.hglg = pd.to_numeric(hglg_df.hglg)
hglg_df.head()

Unnamed: 0_level_0,hglg
date,Unnamed: 1_level_1
2016-01-04 16:56:00,980.92
2016-01-05 16:56:00,957.67
2016-01-06 16:56:00,981.99
2016-01-07 16:56:00,982.02
2016-01-08 16:56:00,972.2


In [4]:
# ---- HGCR11 Dataset ----
hgcr_df = pd.read_csv(f'{DATASETS_PATH}/Dataset - HGCR11.csv')
hgcr_df.rename(columns={"Date": "date", "Close": "hgcr"}, inplace=True)
hgcr_df.date = pd.to_datetime(hgcr_df.date, format='%d/%m/%y %H:%M')
hgcr_df.set_index(keys=['date'], inplace=True)

hgcr_df.hgcr.replace(to_replace=',', value='.', regex=True, inplace=True)
hgcr_df.hgcr = pd.to_numeric(hgcr_df.hgcr)
hgcr_df.head()

Unnamed: 0_level_0,hgcr
date,Unnamed: 1_level_1
2016-01-04 16:56:00,923.51
2016-01-05 16:56:00,923.51
2016-01-06 16:56:00,923.51
2016-01-07 16:56:00,923.51
2016-01-11 16:56:00,908.54


In [5]:
# ---- Full Dataset ----
full_df = knri_df.merge(hglg_df, on='date', how='left').merge(hgcr_df, on='date', how='left')
full_df.fillna(method='ffill', inplace=True)
full_df.head()

Unnamed: 0_level_0,knri,hglg,hgcr
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-01-04 16:56:00,106.0,980.92,923.51
2016-01-05 16:56:00,105.01,957.67,923.51
2016-01-06 16:56:00,108.35,981.99,923.51
2016-01-07 16:56:00,107.5,982.02,923.51
2016-01-08 16:56:00,106.5,972.2,923.51


<h1 id='2-calculating-mean-returns-risks-volatilities' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>2 | Calculating Mean Returns, Risks and Volatilities</h1>

<h3 id='2.1-discrete-datas' style='color:#7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>2.1 | Discrete Datas</h3>

> **Discrete Return**

Used to calculate the Returns taking a block of period into consideration, such as days, weeks, months and years.

$$
\text{Rt} = \frac{\text{Pt}} {\text{P(t-1)}} - 1 = \frac{\text{Pt} - \text{P(t-1)}} {\text{P(t-1)}}
$$

where:

$\text{- Rt: Discrete Return}$

$\text{- Pt: Current Price}$

$\text{- P(t-1): Previous Price}$

---

> **Discrete Mean Return**

Used to calculate the Mean Return of Discrete Returns.

$$
\text{MRt} = \frac{\sum_{i=0}^{n} {(Rt(i))}} {n}
$$

where

$\text{- MRt: Mean Discrete Return}$

$\text{- RT: Discrete Return}$

$\text{- n: Number of Discrete Returns}$

---

> **Risk**

Used to calculate the `Standard Deviation` of Discrete Returns.

When dealing with Discrete Returns, the Standard Deviation is called `Risk`.

<h3 id='2.2-continuous-datas' style='color:#7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>2.2 | Continuous Datas</h3>

> **Continuous Returns**

Used to calculate the Returns taking all periods into consideration. For stocks, it's the daily or every moment of the day.

$$
\text{Rt} = \ln{(\frac{\text{Pt}} {\text{P(t-1)}})}
$$

where:

$\text{- Rt: Continuous Returns}$

$\text{- Pt: Current Price}$

$\text{- P(t-1): Previous Price}$

---

> **Continuous Mean Return**

Used to calculate the Mean Return of Continuous Returns.

$$
\text{MRt} = \frac{\sum_{i=0}^{n} {(\text{Rt(i)})}} {n}
$$

where:

$\text{- MRt: Continuous Mean Return}$

$\text{- Rt: Continuous Returns}$

$\text{- n: Number of Continuous Returns}$

---

> **Volatility**

Used to calculate the `Standard Deviation` of Continuous Returns.

When dealing with Continuous Returns, the Standard Deviation is called `Volatility`.

In [15]:
# ---- Calculating Discrete Returns ----
#
# - pct_change(): native Pandas function that calculates the Discrete Return
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html
#
discrete_returns_df = full_df.copy()
discrete_returns_df.knri = discrete_returns_df.knri.pct_change(periods=1)
discrete_returns_df.hglg = discrete_returns_df.hglg.pct_change(periods=1)
discrete_returns_df.hgcr = discrete_returns_df.hgcr.pct_change(periods=1)
discrete_returns_df.dropna(inplace=True)
discrete_returns_df.head()

Unnamed: 0_level_0,knri,hglg,hgcr
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-01-05 16:56:00,-0.00934,-0.023702,0.0
2016-01-06 16:56:00,0.031806,0.025395,0.0
2016-01-07 16:56:00,-0.007845,3.1e-05,0.0
2016-01-08 16:56:00,-0.009302,-0.01,0.0
2016-01-11 16:56:00,-9.4e-05,1e-05,-0.01621


In [20]:
# ---- Calculating Continuous Returns ----
continuous_returns_df = full_df.copy()
continuous_returns_df.knri = np.log(continuous_returns_df.knri / continuous_returns_df.knri.shift(1))
continuous_returns_df.hglg = np.log(continuous_returns_df.hglg / continuous_returns_df.hglg.shift(1))
continuous_returns_df.hgcr = np.log(continuous_returns_df.hgcr / continuous_returns_df.hgcr.shift(1))
continuous_returns_df.dropna(inplace=True)
continuous_returns_df.head()

Unnamed: 0_level_0,knri,hglg,hgcr
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-01-05 16:56:00,-0.009384,-0.023988,0.0
2016-01-06 16:56:00,0.031311,0.025078,0.0
2016-01-07 16:56:00,-0.007876,3.1e-05,0.0
2016-01-08 16:56:00,-0.009346,-0.01005,0.0
2016-01-11 16:56:00,-9.4e-05,1e-05,-0.016343


<h1 id='3-exporting-datasets' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>3 | Exporting Datasets</h1>

In [23]:
with pd.ExcelWriter(f'{DATASETS_PATH}/Dataset - Full.xlsx') as writer:
    full_df.to_excel(writer, sheet_name='Dataset - Prices')
    discrete_returns_df.to_excel(writer, sheet_name='Dataset - Discrete Returns')
    continuous_returns_df.to_excel(writer, sheet_name='Dataset - Continuous Returns')

---

<h1 id='reach-me' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📫 | Reach Me</h1>

> **Email** - [csfelix08@gmail.com](mailto:csfelix08@gmail.com?)

> **Linkedin** - [linkedin.com/in/csfelix/](https://www.linkedin.com/in/csfelix/)

> **GitHub:** - [CSFelix](https://github.com/CSFelix)

> **Kaggle** - [DSFelix](https://www.kaggle.com/dsfelix)

> **Portfolio** - [CSFelix.io](https://csfelix.github.io/).