# MarketData

* **主要功能**：记录对冲端资产的市场行情数据

* **属性**：

  1. asset_category：list，说明MarketData所包含的资产类型（eg. stock, future, index, bond, interest, etc.）
  2. lot_size：list，记录每个asset最小交易的单位，比如股票就是100股为1手
  3. data：dict，key为资产类型和代码，value为对应的行情数据DataFrame

* **方法**：

  1. get_data(asset_type, code_list, start_date, end_date)：输入资产类型、代码以及起始日期，返回市场行情数据

* **股票数据示例**

  1. 开盘价、收盘价、成交量、股票分红事件以及调整因子
  2. 考虑到股票分红会影响后续收益分析的计算，使用调整因子adj_factor对收益进行调整
     1. ISP_with_factor = ISP 累乘 adj_factor
     2. 调整后的期权收益 = notional *（Final Spot Price - ISP_with_factor * Strike）/ ISP_with_factor



In [1]:
import pandas as pd
import numpy as np
import pickle
from scipy import stats as st

In [3]:
# market_data.py

class MarketData:

    def __init__(self, asset_category: list):
        assert len(asset_category) > 0, 'empty asset_category'
        self.asset_category = asset_category
        self.data = {}
        for _asset in self.asset_category:
            self.data[_asset] = {}
        
    def get_data(self, asset, code, start_date:int =0, end_date:int=22222222):
        return self.data[asset][code].loc[start_date:end_date]

    def add_data(self, asset, code, data):
        self.data[asset][code] = data
# test_data.py

single_stock_data = MarketData(['stock'])
data = pd.read_csv('./02_data/MarketData/single_stock_data.csv', index_col=0)

single_stock_data.add_data('stock', '300015.SZ', data)

print(single_stock_data)
print(single_stock_data.get_data('stock', '300015.SZ'))

<__main__.MarketData object at 0x000001CB5036C7C0>
           OPEN  CLOSE      VOLUME  ADJFACTOR  DIV_CAPITALIZATION  DIV_STOCK
20170103  30.00  29.98   2361048.0   7.446103                 NaN        NaN
20170104  30.26  31.09   4623138.0   7.446103                 NaN        NaN
20170105  30.80  30.87   1583885.0   7.446103                 NaN        NaN
20170106  30.87  31.31   3176593.0   7.446103                 NaN        NaN
20170109  31.12  31.13   1942843.0   7.446103                 NaN        NaN
...         ...    ...         ...        ...                 ...        ...
20221121  27.71  26.88  55464596.0  48.871243                 NaN        NaN
20221122  26.86  26.64  27301615.0  48.871243                 NaN        NaN
20221123  26.78  26.27  26369594.0  48.871243                 NaN        NaN
20221124  26.45  26.35  17185086.0  48.871243                 NaN        NaN
20221125  26.27  26.42  15370955.0  48.871243                 NaN        NaN

[1434 rows x 6 columns]


# BaseOption

* **主要功能**
  1. 接收用户通过BackTest类传入的期权参数，根据BS公式计算期权价格，并在此基础上计算期权的希腊字母值
  2. 看涨看跌期权的价格、希腊字母计算公式不同，设置子类VanillaCall和VanillaOption继承BaseOption类

* **属性**
  1. underlying_asset：标的资产类型，对应MarketData类中的asset_category
  2. underlying_code：标的资产代码
  3. strike_date：起息日
  4. maturity：到期日
  5. ISP：Initial Spot Price $S_0$
  6. KSratio：K/ISP
  7. strike_level：根据KSratio判定OTM, ATM, ITM
  8. greeks：交易期间的希腊字母值delta, gamma, theta, vega
  9. BS_params：BS模型参数值spot_price, d1, d2, Nd1, Nd2, sigma, remain_period
  10. sigma_period：向前rolling sigma_period天，计算历史波动率替代隐含波动率

* **方法**
  1. cal_BS_params(): 计算BS公式参数，并存入BS_params
  2. cal_greeks()：基于BS_params计算希腊字母值，并存入greeks
* **子类：VanillaCall & VanillaPut**
  1. 拓展cal_BS_params()：除Call和Put的价格计算方法不同，剩余参数相同
  2. 重写cal_greeks()：Call和Put的部分希腊字母计算方法不同





In [12]:
class BaseOption:

    greek_columns = ['delta', 'gamma', 'vega', 'theta', 'option_price']
    underlying_asset_base_type = ['stock','index_futures']
    basic_paras_columns = ['sigma', 'left_days', 'left_times', 'sigma_T', 'stock_price']

    def __init__(self): #合约级的参数需要初始化输入，随着交易日时变的参数后面再计算
        self.reset_paras()
        self.all_trade_dates = single_stock_data.get_data('stock', '300015.SZ').index.tolist()
        #self.greek_df = pd.DataFrame(data=None, columns= self.greek_columns)
    
    def reset_paras(self):
        self.underlying_asset = None
        self.underlying_code = None
        self.strike_date = None
        self.maturity_date = None
        self.ISP = None
        self.KS_ratio = 1
        self.strike_level = 0
        self.look_back_num = 60
        self.r = 0.04

    def set_paras(self,underlying_asset=None,underlying_code=None, strike_date=None,
                    maturity_date=None,ISP=None,KS_ratio=1,strike_level=0):
        self.underlying_asset = underlying_asset
        self.underlying_code = underlying_code
        self.strike_date = strike_date
        self.maturity_date = maturity_date
        self.ISP = ISP
        self.KS_ratio = KS_ratio
        self.strike_level = strike_level
        self.look_back_num = 60
    
    def set_paras_by_dict(self, para_dict):
        self.parameters = para_dict
        self.set_basic_paras(para_dict)
    
    def set_basic_paras(self, para_dict):
        """通过字典输入初始化参数

        传入合约级别的不随时间变化的参数设置
        """
        self.set_underlying_asset(para_dict.get('underlying_asset'))
        self.set_underlying_code(para_dict.get('underlying_code'))
        self.set_strike_date(para_dict.get('strike_date'))
        self.set_maturity_date(para_dict.get('maturity_date'))
        self.set_ISP(para_dict.get('ISP'))
        self.set_KS_ratio(para_dict.get('KS_ratio'))
        self.set_strike_level(para_dict.get('strike_level'))
        self.set_look_back_num(para_dict.get('look_back_num'))
        self.set_K()

    def set_underlying_asset(self,underlying_asset_input=None):
        if underlying_asset_input in self.underlying_asset_base_type:
            self.underlying_asset = underlying_asset_input
        else:
            raise ValueError('Invalid underlying_asset!')
    
    def set_underlying_code(self,underlying_code=None):
        if underlying_code is not None:
            self.underlying_code = underlying_code
    
    def set_strike_date(self,strike_date=None):
        if strike_date is not None:
            self.strike_date = strike_date
    
    def set_maturity_date(self,maturity_date=None):
        if maturity_date is not None:
            self.maturity_date = maturity_date
    
    def set_ISP(self,ISP=None):
        if ISP is not None:
            self.ISP = ISP
    
    def set_KS_ratio(self,KS_ratio=1):
        if KS_ratio is not None:
            self.KS_ratio = KS_ratio
    
    def set_K(self):
        self.K = self.KS_ratio*self.ISP
    
    def set_strike_level(self,strike_level=0):
        if strike_level is not None:
            self.strike_level = strike_level
    
    def set_look_back_num(self,look_back_num=10):
        if look_back_num is not None:
            self.look_back_num = look_back_num

    def calculate_base_paras(self):
        self.calculate_trade_dates()
        self.get_stock_prices()
        self.calculate_vols()
        self.calculate_basic_paras()

    def calculate_trade_dates(self):
        """计算起息日到期日和要用于计算vol的时间窗口

        trade_dates时间段为起息日到到期日，look_back_dates为trade_dates再加上往前look_back_num个交易日的时间窗口
        """
        self.start_idx = self.all_trade_dates.index(self.strike_date)
        self.end_idx = self.all_trade_dates.index(self.maturity_date) + 1
        self.trade_dates = self.all_trade_dates[self.start_idx:self.end_idx]
        self.look_back_date = self.all_trade_dates[self.start_idx - self.look_back_num]
        self.look_back_dates = self.all_trade_dates[self.start_idx - self.look_back_num:self.end_idx]
        self.trade_dates_length = len(self.trade_dates)

    def get_stock_prices(self):
        """提取look_bakck_dates内的股票价格

        从single_stock_data里提取对应股票代码的CLOSE列收盘价
        """
        if self.underlying_code is None:
            print('标的资产代码未设定')
            return -1
        self.stock_prices = single_stock_data.get_data('stock', '300015.SZ').loc[self.look_back_dates, 'CLOSE']
        #BasicData.basicData['close'].loc[self.look_back_dates, self.underlying_code]
    
    def calculate_vols(self):
        """根据look_back_dates内的股票价格计算trade_dates内的vol

        用历史波动率表示隐含波动率
        """
        self.percent_change = self.stock_prices.pct_change()
        self.volatility = self.percent_change.rolling(self.look_back_num).std()[self.look_back_num:] #移动窗口的长度是look_back_num的长度，计算std

    def calculate_basic_paras(self):
    #     self.get_stock_prices()
    #     self.calculate_vols()
        self.basic_paras_df = pd.DataFrame(data=None, columns=self.basic_paras_columns)
        self.basic_paras_df.loc[:, 'sigma'] = self.volatility.dropna()
        self.basic_paras_df.loc[:, 'sigma_2'] = self.basic_paras_df.loc[:, 'sigma']*self.basic_paras_df.loc[:, 'sigma']
        self.basic_paras_df.loc[:, 'left_days'] = np.linspace(self.trade_dates_length - 1, 0, self.trade_dates_length)
        self.basic_paras_df.loc[:, 'left_times'] = self.basic_paras_df.loc[:, 'left_days'] / 252
        self.basic_paras_df.loc[:, 'sigma_T'] = self.basic_paras_df.loc[:, 'sigma'] * np.sqrt(self.basic_paras_df.loc[:, 'left_times'])
        self.basic_paras_df.loc[:, 'stock_price'] = self.stock_prices.loc[self.trade_dates]


In [16]:
paras = {
    'underlying_asset': 'stock',
    'underlying_code': '300015.SZ',
    'strike_date': 20190603,
    'maturity_date': 20191231,
    'K': 37.46,
    'ISP': 36.46,
    'KS_ratio':1,
    'strike_level':1
}

option1 = BaseOption()
option1.set_paras_by_dict(paras)
option1.calculate_base_paras()
option1.basic_paras_df

Unnamed: 0,sigma,left_days,left_times,sigma_T,stock_price,sigma_2
20190603,0.026539,144.0,0.571429,0.020062,37.46,0.000704
20190604,0.026553,143.0,0.567460,0.020002,37.30,0.000705
20190605,0.042198,142.0,0.563492,0.031676,27.84,0.001781
20190606,0.041959,141.0,0.559524,0.031386,28.09,0.001761
20190610,0.041683,140.0,0.555556,0.031069,28.60,0.001737
...,...,...,...,...,...,...
20191225,0.022871,4.0,0.015873,0.002881,38.97,0.000523
20191226,0.022871,3.0,0.011905,0.002495,39.00,0.000523
20191227,0.022916,2.0,0.007937,0.002042,38.56,0.000525
20191230,0.022898,1.0,0.003968,0.001442,38.68,0.000524


In [17]:
class VanillaCall(BaseOption):
    """VanillaCall继承BaseOption类

    属性列表
    ----------
        basic_paras_df: 在基类的基础上再计算一些需要用到的参数
            - 类型: pandas.dataframe
            - index: trade_dates
            - columns: 基类的基础上再加上'd1', 'd2', 'Nd1', 'Nd2'
        greek_df: 存算的期权价格和各个希腊字母
            - 类型: pandas.dataframe
            - index: trade_dates
            - columns: 'option_price', 'delta', 'gamma', 'vega', 'theta'
            
    方法列表
    ----------
        calculate_vanilla_call_paras:
            计算'd1', 'd2', 'Nd1', 'Nd2'
        calculate_vanilla_call_price:
            根据BS公式的解析解计算call option价格
        calculate_vanilla_call_greeks:
            根据解析解的公式计算各个希腊字母
    """
    def __init__(self):
        super().__init__()

    def calculate_vanilla_call_paras(self):
        self.calculate_base_paras()
        self.basic_paras_df.loc[:, 'd1'] = (np.log(self.basic_paras_df.loc[:, 'stock_price']/self.K) + (self.r+(0.5*self.basic_paras_df.loc[:, 'sigma_2']))*self.basic_paras_df.loc[:, 'left_times'])/self.basic_paras_df.loc[:, 'sigma_T']
        self.basic_paras_df.loc[:, 'd2'] = self.basic_paras_df.loc[:, 'd1'] - self.basic_paras_df.loc[:, 'sigma_T']
        self.basic_paras_df.loc[:, 'nd1'] = st.norm.pdf(self.basic_paras_df.loc[:, 'd1'])
        self.basic_paras_df.loc[:, 'Nd1'] = st.norm.cdf(self.basic_paras_df.loc[:, 'd1'])
        self.basic_paras_df.loc[:, 'Nd2'] = st.norm.cdf(self.basic_paras_df.loc[:, 'd2'])
    
    def calculate_vanilla_call_price(self):
        self.calculate_vanilla_call_paras()
        self.greek_df = pd.DataFrame(index = self.trade_dates, columns = self.greek_columns)
        self.greek_df.loc[:, 'option_price']  = self.basic_paras_df.loc[:, 'stock_price']*self.basic_paras_df.loc[:, 'Nd1']-self.K*np.exp(-self.r*self.basic_paras_df.loc[:,'left_times'])*self.basic_paras_df.loc[:, 'Nd2']

    def calculate_vanilla_call_greeks(self):
        self.calculate_vanilla_call_price()
        self.greek_df.loc[:,'delta'] = self.basic_paras_df.loc[:,'Nd1'] #看涨期权的delta是Nd1
        self.greek_df.loc[:,'gamma'] = self.basic_paras_df.loc[:,'nd1'] / (self.basic_paras_df.loc[:,'stock_price'] * self.basic_paras_df.loc[:,'sigma_T'])
        self.greek_df.loc[:,'vega'] = self.greek_df.loc[:,'gamma']*self.basic_paras_df.loc[:,'stock_price']*self.basic_paras_df.loc[:,'stock_price']*self.basic_paras_df.loc[:,'sigma_T']
        self.greek_df.loc[:,'theta'] = -(self.basic_paras_df.loc[:,'stock_price']*self.basic_paras_df.loc[:,'nd1']*self.basic_paras_df.loc[:,'sigma']/(2*np.sqrt(self.basic_paras_df.loc[:,'left_times']))) \
            - self.r*self.K*np.exp(-self.r*self.basic_paras_df.loc[:,'left_times'])*self.basic_paras_df.loc[:,'Nd2']

class VanillaPut(BaseOption):
    """ 和call option方法类似，只是计算的公式不同
    gamma和vega是一样的
    """
    def __init__(self):
        super().__init__()

    def calculate_vanilla_put_paras(self):
        self.calculate_base_paras()
        self.basic_paras_df.loc[:, 'd1'] = (np.log(self.basic_paras_df.loc[:, 'stock_price']/self.K) + (self.r+(0.5*self.basic_paras_df.loc[:, 'sigma_2']))*self.basic_paras_df.loc[:, 'left_times'])/self.basic_paras_df.loc[:, 'sigma_T']
        self.basic_paras_df.loc[:, 'd2'] = self.basic_paras_df.loc[:, 'd1'] - self.basic_paras_df.loc[:, 'sigma_T']
        self.basic_paras_df.loc[:, 'nd1'] = st.norm.pdf(self.basic_paras_df.loc[:, 'd1'])
        self.basic_paras_df.loc[:, 'Nd1'] = st.norm.cdf(self.basic_paras_df.loc[:, 'd1'])
        self.basic_paras_df.loc[:, 'Nd2'] = st.norm.cdf(self.basic_paras_df.loc[:, 'd2'])
    
    def calculate_vanilla_put_price(self):
        self.calculate_vanilla_put_paras()
        self.greek_df = pd.DataFrame(index = self.trade_dates, columns = self.greek_columns)
        self.greek_df.loc[:, 'option_price']  = self.basic_paras_df.loc[:, 'stock_price']*(self.basic_paras_df.loc[:, 'Nd1']-1)-self.K*np.exp(-self.r*self.basic_paras_df.loc[:, 'left_times'])*(self.basic_paras_df.loc[:, 'Nd2']-1)

    def calculate_vanilla_put_greeks(self):
        self.calculate_vanilla_put_price()
        self.greek_df.loc[:,'delta'] = self.basic_paras_df.loc[:,'Nd1']-1 
        self.greek_df.loc[:,'gamma'] = self.basic_paras_df.loc[:,'nd1'] / (self.basic_paras_df.loc[:,'stock_price'] * self.basic_paras_df.loc[:,'sigma_T'])
        self.greek_df.loc[:,'vega'] = self.greek_df.loc[:,'gamma']*self.basic_paras_df.loc[:,'stock_price']*self.basic_paras_df.loc[:,'stock_price']*self.basic_paras_df.loc[:,'sigma_T']
        self.greek_df.loc[:,'theta'] = -(self.basic_paras_df.loc[:,'stock_price']*self.basic_paras_df.loc[:,'nd1']*self.basic_paras_df.loc[:,'sigma']/(2*np.sqrt(self.basic_paras_df.loc[:,'left_times']))) \
            + self.r*self.K*np.exp(-self.r*self.basic_paras_df.loc[:,'left_times'])*self.basic_paras_df.loc[:,'Nd2']


In [20]:
option3 = VanillaPut()
option3.set_paras_by_dict(paras)
option3.calculate_vanilla_put_greeks()
option3.greek_df

Unnamed: 0,delta,gamma,vega,theta,option_price
20190603,-0.006243,2.343393e-02,6.596995e-01,1.404448e+00,0.001524
20190604,-0.011198,3.942494e-02,1.097141e+00,1.389510e+00,0.002884
20190605,-1.000000,3.050384e-14,7.489080e-13,-1.727869e-14,7.807396
20190606,-1.000000,1.501503e-13,3.718450e-12,-8.507233e-14,7.563054
20190610,-1.000000,5.673158e-12,1.441721e-10,-3.235998e-12,7.058714
...,...,...,...,...,...
20191225,0.000000,2.476698e-118,1.083804e-117,1.457474e+00,0.000000
20191226,0.000000,1.598595e-160,6.067663e-160,1.457706e+00,0.000000
20191227,0.000000,2.795806e-165,8.486594e-165,1.457937e+00,0.000000
20191230,0.000000,0.000000e+00,0.000000e+00,1.458169e+00,0.000000


# Portfolio

In [22]:
class Portfolio(BaseOption):    
    # option_type：期权组合的类型（垂直差价组合、水平差价组合、对角组合、看涨+看跌组合）
    # 直接在组合类里算option端的pnl
    def get_option_list():
        
        pass
    
    def cal_BS_params():
        pass

    def calculate_greeks():
        pass

属性:

option_type：期权组合的类型（垂直差价组合、水平差价组合、对角组合、看涨+看跌组合）

- 类型：str

option_list: portfolio里包含的期权组合

 - 类型：list，记录Vanilla的position

 - keys: option_object, option_position
 - values: VanillaCall/VanillaPut, int

greek_df：每个交易日的希腊字母的值

- 类型: pandas.dataframe
- index: trade_dates
- columns: 'option_price', 'delta', 'gamma', 'vega', 'theta'

BS_paras_df: 按照每个交易日计算得到的时变参数信息

- 类型: pandas.dataframe
- index: trade_dates
- columns: 'sigma', 'left_days', 'left_times', 'sigma_T', 'stock_price'

decompose_df:  每个交易日的pnl

- 类型：pandas.DataFrame

- index：trade_dates

- columns：'option_pnl','delta_pnl', 'theta_pnl', 'gamma_pnl', 'vega_pnl', 'higher_order_pnl'

K/S和赋权因子，用前复权的价格处理数据
$$
\begin{equation*}
\begin{split}
& price\_ex = \frac{close_{t-1}-chasD}{1+stockD}\\
& adj\_fac = \frac{price_ex}{close_{t-1}}
\end{split}
\end{equation*}
$$




方法：

get_option_list：

​	根据传入的期权组合类型和参数生成对应position的Vanilla，加入option_list中

​	同时调用对应Vanilla的set_paras_by_dict(parameter)

- 参数：self, para_dict

cal_BS_params

​	调用option_list中每个Vanilla的cal_BS_params()计算参数

calculate_greeks:

​	计算期权组合的希腊值，调用Vanilla的cal_greeks()，等于Vanilla greek_df的加权平均
$$
portofolio\_greeks = \sum_{i=0}^n optioni\_greeks\times optioni\_position
$$
calculate_return_decomposition：

​	计算option_pnl和每个希腊字母上的pnl，保存到self.decompose_df中
$$
\begin{equation*}
    \begin{split}
       df & = \frac{\partial f}{\partial S} dS + \frac{1}{2}\frac{\partial^2 f}{\partial S^2}{dS}^2 + \frac{\partial f}{\partial t} dt 
       + \frac{\partial f}{\partial \sigma} d\sigma + \epsilon \\
       & = \Delta dS + \frac{1}{2}\Gamma({dS}^2) + \theta dt + vd\sigma + \epsilon \\
       & = \Delta dS + \frac{1}{2}\Gamma({dS}^2) + v d\sigma + (rf-\Delta Sr-\frac{1}{2}\Gamma S^2\sigma^2) dt
    \end{split}
\end{equation*}
$$

$$
\begin{equation*}
\begin{split}
& option\_pnl = option\_value.diff() \\
& delta\_pnl = delta\times \Delta S = delta\times\Delta r\times S = cash\_delta\times \Delta r \\
& gamma\_pnl = \frac{1}{2}\times gamma\times(\Delta S)^2 = \frac{1}{2}\times cash\_gamma/S^2\times100\times(\Delta S)^2 \\
& theta\_pnl = -\frac{1}{2}\times gamma\times S^2\times \sigma^2\times \Delta t = -\frac{1}{2}\times cash\_gamma/S^2\times100\times S^2\times \sigma^2\times \Delta t \\
& disc\_pnl = rf\times \Delta t \\
& carry\_pnl = -delta\times S\times r\times \Delta t = -cash\_delta\times r\times \Delta t
\end{split}
\end{equation*}
$$

​	用cash_delta和cash_gamma计算，度量按标的涨幅计算的现金暴露规模，

​	交易员关注1%的价格变动对组合价值变化了多少钱

​	在计算股指期货的pnl的时候还要考虑basis_pnl，对冲端的pnl一部分来自指数的损益，一部分来自基差损益

```python
def calculate_return_decomposition(self):
        self.decompose_df.loc[:, 'option_pnl'] = ...
        self.decompose_df.loc[:, 'delta_pnl'] = ...
        self.decompose_df.loc[:, 'gamma_pnl'] = ...
        self.decompose_df.loc[:, 'theta_pnl'] = ...
        self.decompose_df.loc[:, 'higher_order_pnl'] = ...
       
```

# Strategy

输入greek_df和stock_price，返回每个交易日对应标的的position

属性：

multiplier：记录标的资产的最小可交易单位，例如股票为100股

- 类型：int

position：根据notional和每日的greeks以及每日的spot_price计算出每日的position

 - 类型：pandas.Series

   

方法:

cal_position：

​	根据策略类型和每个交易日的股价计算每天的position，返回spot_position

​	例如用多个不同期限结构的future对冲，那么如何分配权重，也写在这个方法中

​	不同的strategy，重写这个方法

- 参数：self, greek_df
- 返回：position

HedgeAllStrategy: 每个交易日让投资组合达到delta中性
$$
position = -cash\_delta/stock\_price/multiplier
$$
HedgeHalfStrategy: 每天对冲一半的delta


# 结果
