# Stock Correlation

We analyse an often heard hypothesis of stock correlations: There are stocks which follow in their trends one after an other within a short time window. If such correlations are predictable, we can exploit them for buy/sell signals. 

Our analysis looks for such correlations and tests whether they follow a predictable pattern.

## Correlating Stock Compiliation

This notbook consumes `.parquet` files from two stocks and concats KPIs of consecutive windows. Here is an example

```
stock1_date stock1_win_length stock1_kpis stock2_date stock2_win_length stock2_kpis 
```

where $\textrm{stock2_date = stock1_date + stock1_win_length}$. The date is a result of index manipulation. Say, stock1_date is a Wednesday and one adds 5. This _does not_ result in Monday, but in Wednesday. Days without stock data are not counted.


### Parameters

In [31]:
input_file_1 = './SMI.CH.parquet'
input_file_2 = './IXX.DE.parquet'

In [30]:
import os
output_file = os.path.splitext(input_file_1)[0] + '_' + os.path.splitext(os.path.basename(input_file_2))[0]  + '.parquet'

### Imports

In [2]:
# libs we need
import pandas as pd
import statsmodels.api as sm
import numpy as np

### Load Data

Load data from `.parquet` files

In [20]:
stock1 = pd.read_parquet(input_file_1)
stock2 = pd.read_parquet(input_file_2)

# Summary stats
print('Data rows in {}: {}'.format(input_file_1, len(stock1)))
print('Data rows in {}: {}'.format(input_file_2, len(stock2)))

Data rows in ./SMI.CH.parquet: 8000
Data rows in ./IXX.DE.parquet: 8075


### Compile Dataframe

In [4]:
# for each date/win_length combination find the consecutive date after the window
stock1_dates = list(set([i[0] for i in stock1.index]))
stock1_dates.sort()

In [5]:
# dataframe of corresponding dates from stock2 _after_ each stock1 window
stock2_corresponding = pd.DataFrame(data=None, index=stock1.index, columns=['date_stock2'])

# find correspondig dates
for stock1_idx in stock1.index:
    stock1_date, stock1_win_length = stock1_idx
    stock2_date_idx = stock1_dates.index(stock1_date) + stock1_win_length
    try:
        date_stock2 = stock1_dates[stock2_date_idx+1] # next date _after_ the window
    except:
        continue
    stock2_corresponding.loc[stock1_idx]['date_stock2'] = date_stock2

# remove invalid combinations
stock2_corresponding.dropna(inplace=True)


In [6]:
# join with stock1 dataframe
stock1 = stock1.join(stock2_corresponding, how='inner')
assert len(stock2_corresponding) == len(stock1)

In [7]:
# prepare index in stock2 for joining with stock1
stock2.reset_index(inplace=True)
stock2.rename(columns={'date': 'date_stock2'}, errors="raise", inplace=True)
stock2.set_index('date_stock2',inplace=True)

In [8]:
# join stock1 and stock2 on stock2_date as corresponding date
stock_correspond = stock1.reset_index().set_index('date_stock2').join(stock2, lsuffix='_stock1', rsuffix='_stock2', how='left')
stock_correspond.rename(columns={'date':'date_stock1'}, inplace=True)

In [22]:
# build new multi index
stock_correspond = stock_correspond.reset_index().set_index(['date_stock1', 'date_stock2'])
# Summary stats
print('Corresponding stock rows: {}'.format(len(stock_correspond)))

Corresponding stock rows: 188750


### Store Features in `.parquet` File

Output file name in variable `output_file`

In [29]:
stock_correspond.to_parquet(path=output_file, index=True)