# Guided Studies into Financial Management
## Index Revisions and Stock Returns

### Colaborators
Dennis Blaufuss,
Lars Wrede,
Nicolas Kepper,
Sophie Merl,
Philipp Voit

### Instructor
Dr. Stefan Scharnoski

### Summary 
HIER MÜSSEN WIR NOCH EINE ZUSAMMENFASSUNG DER ERGEBNISSE SCHREIBEN - WIE ABSTRACT

---

## Table of Content:
1. [Data Proprocessing](#0-bullet)
* [Import Data](#first-bullet)
* [Calculate Daily Returns](#2-bullet)
* [Data Quality Checks](#3-bullet)
* [Descriptive Statistics](#4-bullet)
2. [Price Pressure](#5-bullet)
3. [Investor Attention](#6-bullet)
4. [Systematic Risk](#7-bullet)
___

In [2]:
import pandas as pd
import numpy as np 

import yfinance as yf

import performanceanalytics.table.table as pat
from performanceanalytics.charts import performance_summary
import statistics
import statsmodels.formula.api as smf

# 1. Data Proprocessing <a class="anchor" id="0-bullet"></a>

## Import Data <a class="anchor" id="first-bullet"></a>

In [3]:
''' Import all relevant data.
Parameters
----------
:stock_data: df
    Contains stock ticker as well as names of the stocks under consideration.
:benchmark:  df
    The MSCI Germany Index was used as a proxy for the market portfolio. 
    This index contains a large number of M-DAX and DAX stocks and is therefore more 
    broadly structured than the DAX. However, the high weight of the DAX shares in the index is problematic, 
    so that it is to be expected that the actual influence is underestimated.
:index_compositions: df
    Contains the deletions/ additions as well as date of change/ announcements & Merger/Spin-Off Information.
-------
'''
stock_data = pd.read_csv('Companies_Ticker.csv', sep = ';')
benchmark = pd.read_csv('Price History_20220305_0615.csv', sep = ';')
index_compositions = pd.read_csv('Historical_Index_Compositions.csv', sep = ';')

In [4]:
''' Pulls time series data for stocks on a daily basis from XXX until 2022-03-01.

Parameters
----------
:stock_dict:  dict
    Contains the stock symbols as key and the time series as values.
:stocks_as_df:  df
    Contains the time series data as one df.
-------
'''

stock_dict = {}
for s in stock_data['Symbol']: # iterate for every stock indices
    # Retrieve data from Yahoo Finance
    tickerData = yf.Ticker(s)
    # Save historical data 
    stock_dict[s] = yf.download(s, start='2020-1-1', end='2022-03-01', progress=False)
# Concatenate all data
stocks_as_df = pd.concat(stock_dict, axis = 0)

## Calculate Daily Returns <a class="anchor" id="2-bullet"></a>

In [5]:
''' Transform daily price data to daily returns
Parameters
----------
:returns_daily:  dict
    Contains the stock symbols as key and the daily returns as values.
:benchmark: df
    Contains daily returns from the benchmark.
-------
'''
returns_daily = {}
for s in stock_data['Symbol']:
    returns_daily[s] = stock_dict[s]['Adj Close'].pct_change()
benchmark['Umtauschdatum'] = pd.to_datetime(benchmark['Umtauschdatum'], format='%d.%m.%y')
benchmark = pd.DataFrame(benchmark['Schlusskurs'].astype(float).pct_change()).set_index(benchmark['Umtauschdatum'])

## Data Quality Checks <a class="anchor" id="3-bullet"></a>

In [6]:
'''Check if stocks_as_df contains NA or zeros in Volume & Adjusted Close

Parameters
----------
:stocks_as_df:  df
    Contains the time series data as one df.
:stocks_as_df_Volume_is_0:  df
    Contains the rows where Volume == 0.
-------
'''

stocks_as_df_has_nan = np.isnan(np.sum(stocks_as_df))

#(stocks_as_df < 0).any()
# (stocks_as_df = 0).any()

stocks_as_df_Volume_is_0 = stocks_as_df.loc[stocks_as_df["Volume"] == 0]

In [11]:
'''Check if Adj Close in stocks_as_df differs from previous/ following day.

Parameters
----------
:stocks_as_df:  df
    Contains the time series data as one df.
:stocks_as_df_adjclose_peak_bottom:  dataframe
    Contains the rows where Adj. Close differs
-------
'''
stocks_as_df_adjclose_peak_bottom_list = []
n = 1

while n < len(stocks_as_df)-1:
    if abs(stocks_as_df["Adj Close"][n] -
           statistics.mean([stocks_as_df["Adj Close"][n-1],
                            stocks_as_df["Adj Close"][n+1]])) > .5 * stocks_as_df["Adj Close"][n]:
        stocks_as_df_adjclose_peak_bottom_list.append(stocks_as_df.iloc[n])

    n += 1

stocks_as_df_adjclose_peak_bottom = pd.DataFrame(stocks_as_df_adjclose_peak_bottom_list)

## Descriptive Statistics of the whole Dataset <a class="anchor" id="4-bullet"></a>

In [13]:
''' Calculating measures of location, statistical dispersion and shape
Parameters
----------
:des_stat:  dataframe
    Contains the descriptive statistics.
-------
'''

des_stat = pd.DataFrame(columns=stock_data['Symbol'], 
                        index=['Observations', 'NAs', 'Minimum', 'Quartile 1', 'Median', 
                               'Artithmetic Mean', 'Geometric Mean', 'Quartile 3', 'Maximum', 'SE Mean',
                               'LCL Mean (.95)', 'UCL Mean (.95)', 'Variance', 'Stdev', 'Skewness','Kurtosis'])

for s in stock_data['Symbol']:
    df = pd.DataFrame(returns_daily[s])
    des_stat[s] = pat.stats_table(df, manager_col=0)
des_stat

Symbol,^GDAXI,ADS.DE,AIR.DE,ALV.DE,BAS.DE,BAYN.DE,BMW.DE,BNR.DE,BEI.DE,CON.DE,...,RWE.DE,SAP.DE,SRT3.DE,SIE.DE,ENR.DE,SHL.DE,SY1.DE,VOW3.DE,VNA.DE,ZAL.DE
Observations,549.0,549.0,549.0,549.0,549.0,549.0,549.0,549.0,549.0,549.0,...,549.0,549.0,549.0,549.0,360.0,549.0,549.0,549.0,549.0,549.0
NAs,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Minimum,-0.122386,-0.134064,-0.215624,-0.153277,-0.117939,-0.140271,-0.129714,-0.091662,-0.069778,-0.174502,...,-0.172973,-0.219375,-0.10603,-0.12696,-0.166303,-0.089523,-0.091072,-0.152115,-0.084064,-0.095694
Quartile 1,-0.005498,-0.012406,-0.013967,-0.008313,-0.008421,-0.009547,-0.010536,-0.007825,-0.006485,-0.013568,...,-0.010025,-0.008259,-0.011668,-0.009329,-0.013067,-0.007822,-0.006931,-0.012166,-0.007529,-0.012705
Median,0.000681,-0.001325,-0.0003,0.00022,0.000304,-0.000322,0.000528,0.001181,0.0,0.000405,...,0.000661,0.000169,0.0041,0.000177,-0.000632,0.0,0.00103,-0.000116,0.00036,0.000253
Artithmetic Mean,0.000268,-0.000277,0.000402,0.000252,0.000293,-0.000264,0.000737,0.001036,-0.000152,-0.000203,...,0.001198,-4.8e-05,0.001632,0.000509,0.000254,0.000801,0.000402,0.000477,0.000269,0.000801
Geometric Mean,0.000141,-0.000524,-0.000211,3.6e-05,7.7e-05,-0.000497,0.000464,0.000861,-0.000256,-0.000596,...,0.000984,-0.000255,0.001307,0.000281,-5.9e-05,0.000642,0.000289,8.4e-05,0.000125,0.00048
Quartile 3,0.007491,0.010576,0.013937,0.008318,0.009372,0.009268,0.011442,0.00986,0.007357,0.01397,...,0.012408,0.010804,0.016022,0.011515,0.01443,0.010277,0.008722,0.01165,0.008394,0.015385
Maximum,0.109759,0.084235,0.206724,0.158039,0.107292,0.087368,0.144724,0.086927,0.075579,0.135634,...,0.075238,0.077925,0.088746,0.115594,0.10265,0.096943,0.103354,0.190461,0.088012,0.124407
SE Mean,0.000679,0.000948,0.001494,0.000887,0.000886,0.000917,0.000996,0.000798,0.000614,0.00119,...,0.000878,0.000851,0.001085,0.000909,0.00131,0.000759,0.00064,0.001201,0.000725,0.001084


In [14]:
''' Calculating the downside statistics
Parameters
----------
:down_stat:  dataframe
    Contains the downside statistics.
-------
'''
down_stat = pd.DataFrame(columns=stock_data['Symbol'], 
                        index=['Semi Deviation', 'Gain Deviation', 'Loss Deviation', 'Downside Deviation (MAR=2.0%)',
                               'Downside Deviation (rf=0.5%)', 'Downside Deviation (0%)', 'Maximum Drawdown', 
                               'Historical VaR (95%)', 'Historical ES (95%)', 'Modified VaR (95%)', 'Modified ES (95%)'])

for s in stock_data['Symbol']:
    df = pd.DataFrame(returns_daily[s])
    down_stat[s] = pat.create_downside_table(df,0)
down_stat

Symbol,^GDAXI,ADS.DE,AIR.DE,ALV.DE,BAS.DE,BAYN.DE,BMW.DE,BNR.DE,BEI.DE,CON.DE,...,RWE.DE,SAP.DE,SRT3.DE,SIE.DE,ENR.DE,SHL.DE,SY1.DE,VOW3.DE,VNA.DE,ZAL.DE
Semi Deviation,0.013502,0.015321,0.025647,0.016302,0.015852,0.016632,0.017205,0.014461,0.01075,0.021651,...,0.015479,0.018162,0.018933,0.015608,0.018457,0.012646,0.011526,0.019712,0.012701,0.016942
Gain Deviation,0.011047,0.015725,0.028172,0.016548,0.015165,0.015334,0.017061,0.013109,0.009658,0.018751,...,0.013145,0.011139,0.015153,0.014899,0.015913,0.012403,0.009738,0.022327,0.011976,0.017638
Loss Deviation,0.013544,0.015322,0.025744,0.016337,0.015859,0.016601,0.017258,0.014592,0.010733,0.02163,...,0.015536,0.018085,0.018899,0.015641,0.018457,0.012669,0.011531,0.019748,0.012701,0.016912
Downside Deviation (MAR=2.0%),0.013398,0.017179,0.025177,0.015972,0.016339,0.016995,0.017787,0.015103,0.012258,0.02213,...,0.016732,0.01772,0.020585,0.016884,0.019489,0.014328,0.012936,0.020116,0.013933,0.019239
Downside Deviation (rf=0.5%),0.01273,0.015452,0.024979,0.015503,0.015556,0.016139,0.017008,0.014221,0.010855,0.021339,...,0.015409,0.017268,0.019169,0.015559,0.01839,0.012734,0.011605,0.019424,0.012646,0.017285
Downside Deviation (0%),0.013544,0.015322,0.025744,0.016337,0.015859,0.016601,0.017258,0.014592,0.010733,0.02163,...,0.015536,0.018085,0.018899,0.015641,0.018457,0.012669,0.011531,0.019748,0.012701,0.016912
Maximum Drawdown,0.139166,0.157224,0.2722,0.203497,0.154311,0.200588,0.189642,0.147711,0.13064,0.261082,...,0.222719,0.275808,0.169593,0.157642,0.215163,0.169987,0.136105,0.223116,0.139495,0.195748
Historical VaR (95%),0.024893,0.030471,0.044595,0.028619,0.03261,0.031383,0.034572,0.027771,0.02332,0.040328,...,0.028512,0.030993,0.041158,0.032416,0.031988,0.025381,0.025388,0.038826,0.025064,0.039078
Historical ES (95%),0.041987,0.051485,0.083153,0.048952,0.052402,0.052445,0.054955,0.045995,0.035137,0.068297,...,0.04568,0.049547,0.060182,0.050676,0.055078,0.041067,0.037048,0.064548,0.041886,0.055398
Modified VaR (95%),-0.001149,0.006122,-0.000605,-0.007958,0.003756,0.003806,0.003775,0.005738,0.003889,0.005643,...,0.003732,-0.006241,0.013839,0.004779,0.006623,0.005703,0.00208,0.00107,0.004398,0.009207


___
## 2. Price Pressure <a class="anchor" id="5-bullet"></a>

__To-Do's__
* Welche Index Inklusion Effekte gibt es.
* Gibt es einen Pre-announcement drift? 
* Gibt es sonstige Announcement Effekte?

__Paper 1 EMH vs. PPH nachbauen__
* Alle Aktien für die Analyse zusammen suchen - bisher sind nur die neusten 40 enthalten (Sophie hat hier schon eine CSV vorbereitet).
*    Analyse excess return & trading volume on the first 5 days with the cross sectional means.
*    Analysis before and after annoucment as well as after the inclusion day

* Vorschläge hierzu von Stefan:
    * Alles Herausnahmen und Hereinnahmen in den DAX zusammennehmen, da nur so statistische Test möglich sind
    * Bspw. vor 2010, nach 2010 Veränderungen anschauen





Änderungsvorschläge von Stefan
	•	Alles Änderungen zusammen (30 auf 40)
--> inferenzen, statistischen tests nur so möglich
	•	Vor 2010, nach 2010
	•	Überlegen, wie wir das empirisch machen --> counterfactual
	•	Abnormal returns, worauf basiert es; was ist expected return; komplexeres modell
	•	Index Inklusion Effekte
	•	Freiheitsgrade im empirischen Ansatz, solange die Frage beantwortet wird wie sich Inklusion auf Expected Returns und andere Metriken auswirkt
	•	Abnormal returns; Volatilität; Handelsvolumen (ETFs müssen sie nun auch handeln); Investors attention (das sollten wir absprechen; ggf. Aufnahme in Index als Maß für Attention; dazu: googeln)
	•	Korrelationen: Wie ändern sich Korrelation (das könnte attention sein) --> ökonomisch bedeutsam, weil systematisches Risiko
	•	Ggf. Datenfiltern; Daten Fehler; für Preise funktioniert gut; Handelsvolumen nicht so zuverlässig für damals
	•	Wichtig, dass Tage stimmen, ansonsten problematisch
	•	Announcement day & inclusion day
	•	Pre-announcement drift
	•	Announcement Effekte
	•	Dividends announcements --> Literatur

___
## 3. Investor Attention oder Correlation Analysis <a class="anchor" id="6-bullet"></a>

__To-Do's__ 
* Welche Index Inklusion Effekte gibt es.
* Gibt es einen Pre-announcement drift? 
* Gibt es sonstige Announcement Effekte?

__Paper 2 Investor Attention am Ende nachbauen__
* Korrelationen: Wie ändern sich Korrelation (das könnte attention sein) --> ökonomisch bedeutsam, weil systematisches Risiko --> Hinführung zum 4. Teil

___
## 4. Systematic Risk <a class="anchor" id="7-bullet"></a>

It is examined whether the inclusion of a share in the DAX affects the systematic risk and the liquidity of the share in question.

In [21]:
''' Creating a list and dictionary with all 10 newly added DAX stocks.
Parameters
----------
:newcomers:  list
    Contains the names of the stocks.
:dax_new: dict
    Contains the daily returns of the 10 new stocks.
-------
'''
newcomers = ['AIR.DE', 'SHL.DE', 'ZAL.DE', 'SY1.DE', 'SRT3.DE',  'POAHY', 'HFG.DE', 'BNR.DE', 'QIA.DE', 'PUM.DE']
dax_new = {new: returns_daily[new] for new in newcomers}

In [22]:
''' Creating the dummy variable - 0 before the inclusion day (2021-09-20) and 1 thereafter.
Parameters
----------
:benchmark:  df
    Contains daily returns as well as the dummy variable.
-------
'''
d = []
for date in benchmark.index:
    if str(date) < '2021-09-20 00:00:00':
        d.append(0)
    else: d.append(1)
benchmark['Dummy'] = d

In [23]:
''' Calculating the systmatic risk.
To estimate the regression equations, OLS was used in conjunction with a correction procedure (Newey/West) 
for serially correlated error terms. 
This approach leads to test statistics that are robust against autocorrelated and 
heteroskedastic disturbance terms.

Time Horizon
----------
Start: 1 Year before the inclusion day (2020-09-20)
End: 2022-03-01
----------

Parameters
----------
:sys_risk:  df
    Stock: Name of the specific stock.
    Rank: Sorted after index weight (ascending).
    Delta: Measures the change in the systematic risk of the share triggered by the inclusion.
    p-Value: The two-tailed p-values for the t-stats of the params.
    R^2: R-squared of the model.
-------
'''
i = 1
sys_risk = []
for key in dax_new:
    data = pd.DataFrame(dax_new[key]['2020-09-20':'2022-03-01'])
    data['Benchmark'] = benchmark['Schlusskurs']['2020-09-20':'2022-03-01']
    data['Dummy'] = benchmark['Dummy']['2020-09-20':'2022-03-01']
    data = data.rename(columns = {'Adj Close': 'y', 'Dummy': 'D', 'Benchmark': 'x'})
    reg = smf.ols('y ~ x + D*x', data).fit(cov_type='HAC',cov_kwds={'maxlags':1})
    sys_risk.append(
        {
            'Stock': key,
            'Rank': i, 
            r"$\Delta$": reg.params[3], 
            'p_Value': reg.pvalues[3], 
            r"$R^{2}$": reg.rsquared
        }
    )
    i += 1
sys_risk = pd.DataFrame(sys_risk)
sys_risk.append(
        {
            'Stock': r"$\varnothing$",
            r"$\Delta$": sys_risk[r"$\Delta$"].mean(),
            r"$R^{2}$": sys_risk[r"$R^{2}$"].mean()
        }, ignore_index=True
    )

Unnamed: 0,Stock,Rank,$\Delta$,p_Value,$R^{2}$
0,AIR.DE,1.0,-0.120415,0.672617,0.371271
1,SHL.DE,2.0,0.055156,0.678464,0.141512
2,ZAL.DE,3.0,0.042864,0.889129,0.082471
3,SY1.DE,4.0,0.206899,0.170773,0.07169
4,SRT3.DE,5.0,0.251416,0.393712,0.035776
5,POAHY,6.0,-0.001889,0.993333,0.242049
6,HFG.DE,7.0,0.660832,0.12812,0.033345
7,BNR.DE,8.0,0.118582,0.419508,0.375674
8,QIA.DE,9.0,0.117342,0.595405,0.004337
9,PUM.DE,10.0,-0.099872,0.455485,0.290521


In [24]:
'''Distribution of the shares with a higher unit share in the DAX and all those with a weighting of < 1 %. 
Parameters
----------
:des_stat:  df
    N: Number of stocks sorted after the index weight.
    Mean: Mean systematic risk. 
    R^2: Mean R-squared of the model.
-------
'''
des_stat = []
des_stat.append(
        {
            'N': '1-5',
            r"$\varnothing$": sys_risk[:5][r"$\Delta$"].mean(),
            r"$R^{2}$": sys_risk[:5][r"$R^{2}$"].mean()
        }
    )
des_stat.append(
        {
            'N': '6-10',
            r"$\varnothing$": sys_risk[5:][r"$\Delta$"].mean(),
            r"$R^{2}$": sys_risk[5:][r"$R^{2}$"].mean()
        }
    )
pd.DataFrame(des_stat)

Unnamed: 0,N,$\varnothing$,$R^{2}$
0,1-5,0.087184,0.140544
1,6-10,0.158999,0.189185
