# Investments Project (Spring 2024)

**Authors:**
- Marc-Antoine Allard
- Adam Zinebi
- Paul Teiletche
- ...

**DUE Date: June 21 at 23:59**

---
# Imports

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from scipy import stats

from utils import plot_metrics

%load_ext autoreload 
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
DATA_PATH = "../data"
import warnings

---
# 8 - Industry neutral strategy

a) We now consider a different approach to building a portfolio that is not exposed to
industry risk. Repeat the construction of your fund strategy, but perform the strategy construction separately for each industry to build an industry-neutral portfolio.
Specifically, for all the stocks in industry $i$ (where $i \in \{1, \cdots , 12\}$) separately compute
a BaB$_i$ strategy, a IV$_i$, and a MoM$_i$ strategy as proposed above. Then, repeat the fund
strategy of Section 6 for each of the 12 industries, considering the risk-parity approach
to combine the three strategies (BAB$_i$, IV$_i$, and MOM$_i$) in each industry i targeting
a volatility of 10%. You obtain 12 separate strategy returns. Create a table with 12
rows where you report in the columns the mean, standard deviation, Sharpe ratio, and
t-statistic associated with the mean strategy return. Which strategy delivers the most
significant returns within the industry?

In [7]:
import re
data = pd.read_parquet(f'{DATA_PATH}/stock_data.parquet')
with open(f'{DATA_PATH}/Siccodes12.txt', 'r') as file:
    lines = file.readlines()


ff12_mapping = []
current_ff12 = None

for line in lines:
    
    category_match = re.match(r'^\s*(\d+)\s+\w+', line)
    if category_match:
        current_ff12 = int(category_match.group(1))
    else:
        
        interval_match = re.match(r'^\s*(\d+)-(\d+)', line)
        if interval_match:
            start = int(interval_match.group(1))
            end = int(interval_match.group(2))
            ff12_mapping.append((start, end, current_ff12))

def map_siccd_to_ff12(siccd):
    for start, end, ff12 in ff12_mapping:
        if start <= siccd <= end:
            return ff12
    return None


data['FF12'] = data['siccd'].apply(map_siccd_to_ff12)
data['FF12'].fillna(12, inplace=True)

data = data[data.permno.isin(data.permno.unique()[:100])]

display(data.head())
data.shape

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['FF12'].fillna(12, inplace=True)


Unnamed: 0,permno,date,Rn,shrout,prc,siccd,Rm,rf,mcap,mcap_l,Rn_f,const,Rn_e,Rm_e,w_m,FF12
5,10001,2010-01-29,-0.018932,4361.0,10.06,4925,-0.037172,1.3e-05,43871.66,44918.3,-0.000656,1,-0.018945,-0.037185,5e-06,8.0
4,10001,2010-02-26,-0.000656,4361.0,10.0084,4925,0.034744,6.1e-05,43646.6324,43871.66,0.020643,1,-0.000717,0.034683,5e-06,8.0
3,10001,2010-03-31,0.020643,4361.0,10.17,4925,0.063668,0.000112,44351.37,43646.6324,0.124385,1,0.020531,0.063556,5e-06,8.0
2,10001,2010-04-30,0.124385,6070.0,11.39,4925,0.020036,0.000118,69137.3,44351.37,0.004829,1,0.124267,0.019918,4e-06,8.0
1,10001,2010-05-28,0.004829,6071.0,11.4,4925,-0.07924,0.000114,69209.4,69137.3,-0.043421,1,0.004715,-0.079354,7e-06,8.0


(17377, 16)

In [8]:
from stratsgit import iv

industry_iv_strategies=list()

for i in range(1,13):
    industry_iv_strategies.append(iv(i))
    


In [9]:
from stratsgit import mom

industry_mom_strategies=list()

for i in range(1,13):
    industry_mom_strategies.append(mom(i))
    
print(industry_mom_strategies)


[        date       MoM
0    1964-12  0.041725
1    1965-01  0.123752
2    1965-02 -0.003598
3    1965-03  0.020872
4    1965-04  0.087199
..       ...       ...
703  2023-08 -0.122669
704  2023-09 -0.141136
705  2023-10 -0.078988
706  2023-11  0.267287
707  2023-12  0.199999

[708 rows x 2 columns],         date       MoM
0    1964-12 -0.010097
1    1965-01  0.056649
2    1965-02 -0.011770
3    1965-03  0.060749
4    1965-04  0.062705
..       ...       ...
703  2023-08 -0.333441
704  2023-09 -0.104409
705  2023-10 -0.248949
706  2023-11  0.153547
707  2023-12  0.245128

[708 rows x 2 columns],         date       MoM
0    1964-12 -0.082107
1    1965-01  0.130744
2    1965-02  0.044663
3    1965-03  0.004288
4    1965-04  0.078589
..       ...       ...
703  2023-08 -0.074757
704  2023-09 -0.108283
705  2023-10  0.014173
706  2023-11  0.071976
707  2023-12  0.168938

[708 rows x 2 columns],         date       MoM
0    1964-12  0.034059
1    1965-01  0.043437
2    1965-02 -0.040703
3   

In [10]:
from stratsgit import bab

industry_bab_strategies=list()

for i in range(1,13):
    industry_bab_strategies.append(bab(i))
    
print(industry_bab_strategies)


[          date       BAB
0      1969-02 -0.020069
172    1969-03 -0.010185
346    1969-04  0.006340
529    1969-05  0.015053
715    1969-06 -0.021468
...        ...       ...
95197  2023-08 -0.004722
95260  2023-09 -0.090583
95323  2023-10 -0.121905
95388  2023-11 -0.040277
95453  2023-12  0.003554

[659 rows x 2 columns],           date       BAB
0      1969-02  0.006338
91     1969-03 -0.010479
182    1969-04  0.010994
276    1969-05 -0.031587
372    1969-06 -0.000229
...        ...       ...
41933  2023-08  0.018139
41966  2023-09  0.008848
41999  2023-10  0.033546
42032  2023-11  0.005385
42065  2023-12  0.035546

[659 rows x 2 columns],            date       BAB
0       1969-02 -0.012502
374     1969-03  0.012957
752     1969-04 -0.000322
1146    1969-05  0.018904
1538    1969-06 -0.002090
...         ...       ...
189580  2023-08  0.015351
189730  2023-09 -0.022038
189880  2023-10  0.000482
190029  2023-11  0.000754
190178  2023-12  0.021871

[659 rows x 2 columns],           da

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from functools import reduce
from utils import annualized_metrics, plot_metrics, ew_strategy, rp_strategy, mv_strategy, scale_to_target_volatility

riskmerge = list()

# Function to convert 'date' columns to datetime
def convert_date_column(df, column_name='date'):
    if isinstance(df[column_name].dtype, pd.PeriodDtype):
        df[column_name] = df[column_name].dt.to_timestamp()
    df[column_name] = pd.to_datetime(df[column_name]).dt.to_period('M')
    return df

# Function to calculate t-statistic
def calculate_t_stat(mean, std, n):
    return mean / (std / np.sqrt(n))

# Placeholder for storing results
results = []

for i in range(0, 12):
    # Load the T_bill_returns DataFrame
    T_bill_returns = pd.read_parquet(f'{DATA_PATH}/tbills.parquet')
    
    
    # Convert 'date' column in strategies and T_bill_returns to datetime
    bab_df = convert_date_column(industry_bab_strategies[i])
    iv_df = convert_date_column(industry_iv_strategies[i])
    mom_df = convert_date_column(industry_mom_strategies[i])
    T_bill_returns = convert_date_column(T_bill_returns)

    # Filter the datasets to start from 1969-02
    start_date = '1969-02'
    bab_df = bab_df[bab_df['date'] >= start_date]
    iv_df = iv_df[iv_df['date'] >= start_date]
    mom_df = mom_df[mom_df['date'] >= start_date]

    # Prepare the list of DataFrames to merge
    returns_dfs = [bab_df, iv_df, mom_df, T_bill_returns]

    # Merge the DataFrames on 'date' column
    data = reduce(lambda left, right: pd.merge(left, right, on='date', sort=True), returns_dfs).groupby('date').mean().reset_index()

    # Convert 'date' column to string format 'YYYY-MM'
    data['date'] = data['date'].dt.strftime('%Y-%m')

    # Define assets and factors columns
    assets = ['rf']
    factors_cols = ['BAB', 'IV', 'MoM']
    # Calculate risk-parity returns
    rp_returns = rp_strategy(data, factors_cols)[1]
    df_rp_ret = pd.DataFrame({'date': data['date'], 'STRAT': rp_returns})

    # Create DataFrame for risk-parity returns
    df = pd.DataFrame({'Risk-Parity': rp_returns})

    # Function to calculate fund return
    def fund_return(strategy_returns, T_bill):
        return T_bill + scale_to_target_volatility(strategy_returns)

    # Calculate fund returns for risk-parity strategy

    df_rp_ret['fund_rp'] = fund_return(rp_returns.values, data['rf'])
    df_rp_ret.drop(columns='STRAT',inplace=True)
    riskmerge.append(df_rp_ret)

    # Calculate annualized metrics
    mean_return, std_dev, sharpe_ratio = annualized_metrics(df_rp_ret['fund_rp'])

    # Calculate t-statistic
    
    t_stat = calculate_t_stat(mean_return, std_dev, len(df_rp_ret['fund_rp']))

    # Append results for the current industry
    results.append({
        'Industry': i + 1,
        'Mean': mean_return,
        'Standard Deviation': std_dev,
        'Sharpe Ratio': sharpe_ratio,
        't-Statistic': t_stat
    })

# Create a DataFrame for the results
results_df = pd.DataFrame(results, columns=['Industry', 'Mean', 'Standard Deviation', 'Sharpe Ratio', 't-Statistic'])

# Display the results
print(results_df)


    Industry      Mean  Standard Deviation  Sharpe Ratio  t-Statistic
0          1  0.095701            0.100633      0.950987    24.412776
1          2  0.088384            0.100573      0.878805    22.559804
2          3  0.114106            0.100175      1.139073    29.241139
3          4  0.087960            0.100079      0.878907    22.562411
4          5  0.099613            0.100282      0.993336    25.499930
5          6  0.093732            0.099393      0.943038    24.208719
6          7  0.095488            0.100211      0.952872    24.461180
7          8  0.107521            0.100449      1.070403    27.478303
8          9  0.117009            0.099964      1.170516    30.048318
9         10  0.081932            0.099649      0.822206    20.522230
10        11  0.088327            0.099854      0.884557    21.721245
11        12  0.111800            0.099868      1.119476    28.738066


*b) Now combine these 12 returns using equal weights to generate a new industry-neutral STRAT. Compute the mean, standard deviation, and Sharpe ratio associated with this
new strategy. How does its performance compare to that of the previous STRAT and
the previously hedged STRAT?*

In [13]:
merged = riskmerge[0]
for idx,l in enumerate(riskmerge[1:]):   
    merged = merged.merge(l,on='date',suffixes=(str(idx),str(idx+1)))
           

In [14]:
merged['mean'] = (merged['fund_rp0'] + merged['fund_rp1'] + merged['fund_rp2'] +merged['fund_rp3'] +merged['fund_rp4'] +merged['fund_rp5'] +merged['fund_rp6'] +merged['fund_rp7'] +merged['fund_rp8'] +merged['fund_rp9'] +merged['fund_rp10'] +merged['fund_rp11'])/12 


In [15]:
mean_return, std_dev, sharpe_ratio = annualized_metrics(merged['mean'])
print('Mean', mean_return,
        'Standard Deviation', std_dev,
        'Sharpe Ratio', sharpe_ratio)


Mean 0.10015477168879464 Standard Deviation 0.05520989823169347 Sharpe Ratio 1.8140727459501162


*c) Regress the industry-neutral STRAT onto the 17 risk factors (12 industry portfolio
returns and 5 Fama-French Research Factors from Ken French’s website). Discuss the
alpha, betas, and R-square of that regression. How does it compare to the results you
get in Section 7a?*

In [16]:
Industry_Returns = pd.read_csv(f'{DATA_PATH}/12_Industry_Portfolios.txt', delimiter= '\s+')
Industries = Industry_Returns.columns.tolist()
Industry_Names = pd.DataFrame(Industries,index=np.arange(1,13)).reset_index()
Industry_Names.columns=['Industry','Name']
Industry_Returns = Industry_Returns.reset_index().rename(columns = {'index':'date'})
Industry_Returns['date'] = pd.to_datetime(Industry_Returns['date'], format='%Y%m', errors='coerce')+pd.offsets.MonthEnd(0)
Industry_Returns['date'] = Industry_Returns['date'].dt.to_period('M')

FamaF = pd.read_csv(f'{DATA_PATH}/FamaFrench5.txt', delimiter= '\s+')
Factors = FamaF.columns.tolist()
Factors_Names = pd.DataFrame(Factors,index=np.arange(1,7)).reset_index()
Factors_Names.columns=['Factor','Name']
FamaF = FamaF.reset_index().rename(columns = {'index':'date'})
FamaF['date'] = pd.to_datetime(FamaF['date'], format='%Y%m', errors='coerce')+pd.offsets.MonthEnd(0)

FamaF.drop('RF', axis=1, inplace = True)
FamaF['date'] = FamaF['date'].dt.to_period('M')



In [17]:
merged = merged[['date','mean']]

In [18]:
FamaF.date = FamaF.date.dt.strftime('%Y-%m')

In [19]:
Industry_Returns.date = Industry_Returns.date.dt.strftime('%Y-%m')


In [20]:
final_reg = merged.merge(Industry_Returns,on='date',how='left').merge(FamaF,on='date',how='left')

In [21]:
final_reg

Unnamed: 0,date,mean,NoDur,Durbl,Manuf,Enrgy,Chems,BusEq,Telcm,Utils,Shops,Hlth,Money,Other,Mkt-RF,SMB,HML,RMW,CMA
0,1969-02,0.004700,-6.59,-3.47,-6.04,-4.60,-4.63,-3.57,-3.33,-5.21,-4.13,-5.16,-8.76,-8.17,-5.84,-4.16,0.91,2.07,0.83
1,1969-03,0.004468,0.71,5.11,3.07,7.12,1.06,5.47,-0.42,-1.29,5.06,3.34,3.33,-0.43,2.64,-0.45,-0.51,-1.43,-0.35
2,1969-04,0.005104,1.06,1.22,1.51,-0.36,0.62,4.05,7.73,1.33,3.77,4.98,2.89,1.24,1.46,-0.80,-0.03,0.41,0.06
3,1969-05,0.004871,1.21,0.14,-1.36,3.76,0.97,-1.79,1.08,0.03,0.55,1.68,-1.36,-1.17,-0.10,-0.11,0.70,-0.95,1.39
4,1969-06,0.005401,-7.31,-5.99,-7.14,-9.89,-6.30,-0.02,-4.00,-5.73,-5.49,-5.04,-10.05,-10.99,-7.18,-5.45,-1.08,4.32,-1.60
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
596,2023-08,0.016305,-3.77,-4.31,-2.21,1.95,-2.75,-1.67,0.14,-5.29,-0.40,-0.22,-3.61,-2.98,-2.39,-3.65,-1.06,3.43,-2.37
597,2023-09,-0.017437,-4.57,-2.58,-7.30,3.17,-6.57,-5.95,-3.22,-5.04,-5.68,-4.71,-2.04,-5.57,-5.24,-1.80,1.52,1.86,-0.83
598,2023-10,-0.002689,-3.53,-17.88,-3.01,-6.24,-2.21,-1.73,-0.18,1.12,0.47,-4.58,-1.78,-3.78,-3.19,-4.05,0.18,2.47,-0.65
599,2023-11,0.021168,5.02,15.76,9.73,-1.29,6.28,11.94,6.97,5.08,7.18,5.87,10.25,10.73,8.84,-0.12,1.64,-3.91,-1.00


In [22]:
final_reg.drop('date', axis=1, inplace = True)

Y = final_reg.iloc[:, 0]
X = final_reg.drop(final_reg.columns[0], axis=1)


import statsmodels.api as sm

X = sm.add_constant(X)

model = sm.OLS(Y, X).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                   mean   R-squared:                       0.525
Model:                            OLS   Adj. R-squared:                  0.511
Method:                 Least Squares   F-statistic:                     37.90
Date:                Fri, 21 Jun 2024   Prob (F-statistic):           1.13e-82
Time:                        18:34:50   Log-Likelihood:                 1859.0
No. Observations:                 601   AIC:                            -3682.
Df Residuals:                     583   BIC:                            -3603.
Df Model:                          17                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0050      0.001      9.078      0.0