# Portfolio Direction and Its Transition Matrix

We import necessary libaries first

In [1]:
#pip install yfinance
import yfinance as yf
yf.pdr_override()
import pandas as pd
import numpy as np
from scipy.stats import kurtosis, skew
from sklearn.model_selection import train_test_split

I will be using Apple and Tesla stocks 

In [2]:
Apple= yf.download("AAPL", start="2021-01-01", end="2022-05-22")
Tesla= yf.download("TSLA", start="2021-01-01", end="2022-05-22")

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Lets get their log return.

In [3]:
Apple["Log_Return"]=np.log(Apple["Close"]/Apple["Close"].shift(1))
Apple = pd.DataFrame(Apple["Log_Return"])
Tesla["Log_Return"]=np.log(Tesla["Close"]/Tesla["Close"].shift(1))
Tesla = pd.DataFrame(Tesla["Log_Return"])

Let's compute the mean, standard deviation, skewness, and excess kurtosis of their log return. 

Apple

In [4]:
print( 'Mean of Log Return of Apple stocks: {}'.format(np.mean(Apple["Log_Return"].dropna())))
print( 'Standard Deviation of Log Return of Apple stocks: {}'.format(np.std(Apple["Log_Return"].dropna())))
print( 'Excess kurtosis of Log Return of Apple stocks(should be 0): {}'.format(kurtosis(Apple["Log_Return"].dropna())))
print( 'Skewness of Log Return of Apple stocks (should be 0): {}'.format( skew(Apple["Log_Return"].dropna())))

Mean of Log Return of Apple stocks: 0.00017612797325155293
Standard Deviation of Log Return of Apple stocks: 0.01773443243104994
Excess kurtosis of Log Return of Apple stocks(should be 0): 0.7503583441008375
Skewness of Log Return of Apple stocks (should be 0): -0.11193853435352391


Tesla

In [5]:
print( 'Mean of Log Return of Tesla stocks: {}'.format(np.mean(Tesla["Log_Return"].dropna())))
print( 'Standard Deviation of Log Return of Tesla stocks: {}'.format(np.std(Tesla["Log_Return"].dropna())))
print( 'excess kurtosis of Log Return of Tesla stocks(should be 0): {}'.format(kurtosis(Tesla["Log_Return"].dropna())))
print( 'skewness of Log Return of Tesla stocks (should be 0): {}'.format( skew(Tesla["Log_Return"].dropna()) ))

Mean of Log Return of Tesla stocks: -0.0002718329605446314
Standard Deviation of Log Return of Tesla stocks: 0.03765649395055383
excess kurtosis of Log Return of Tesla stocks(should be 0): 2.2011291480713284
skewness of Log Return of Tesla stocks (should be 0): 0.09333273123825693



# Let's compute the covariance and the correlation. 

In [6]:
print( 'Covariance of Log Return of Apple and Tesla Stocks: {}'.format( Apple["Log_Return"].cov(Tesla["Log_Return"])))

Covariance of Log Return of Apple and Tesla Stocks: 0.0003780996508438968


In [7]:
print( 'Correlation between Log Return of Apple and Tesla Stocks: {}'.format( Apple["Log_Return"].corr(Tesla["Log_Return"])))

Correlation between Log Return of Apple and Tesla Stocks: 0.5645459954143168


Both correlation and covariance measure the relationship and the dependency between two variables.Here, our two variables are Apple's  and Tesla's stocks. Covariance is used to find how much Apple and Tesla stocks vary together, whereas correlation is used to find when a change in one of our variable, say Apple can result in a change in our another variable which is Tesla.

As the formula for Correlation $$Corr=\frac{Cov\left ( Apple, Tesla \right )}{\sigma _{Apple}^{2}*\sigma _{Tesla}^{2}}$$

We will be able to convert between them by using abovementioned formula

# Let's build our own transition

Let's categorize each day in a year price history as belonging to one of four categories:
i.Both stocks up
ii.Stock 1 up, stock 2 down
iii.Stock 1 down, stock 2 up
iv.Both stocks down

Let's concatenate both as one single datasets

In [9]:
Apple.rename(columns = {'Log_Return':'Apple_Log_Return'}, inplace =True)
Tesla.rename(columns = {'Log_Return':'Tesla_Log_Return'}, inplace = True)
frames=[Apple, Tesla]
Log_Returns_Cat=pd.concat(frames,  axis=1).dropna()

To categorize as it is mentioned above, we need to get the difference between consecutive stocks for each data and see if it goes up or down.If the difference between todays and tomorrows stock is negative  then there is a down otherwise it is up(there is no 0 case)

In [10]:
Diff_Return = Log_Returns_Cat.diff()
Diff_Return.dropna()

Unnamed: 0_level_0,Apple_Log_Return,Tesla_Log_Return
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-01-06,-0.046529,0.020704
2021-01-07,0.067795,0.048454
2021-01-08,-0.024960,-0.000967
2021-01-11,-0.032118,-0.156924
2021-01-12,0.022127,0.127517
...,...,...
2022-05-16,-0.042147,-0.116113
2022-05-17,0.035835,0.110688
2022-05-18,-0.083178,-0.120570
2022-05-19,0.033123,0.069888


Now, we can label each stock as up and and down depending on difference between consecutive days.

In [11]:
# create a list of our conditions
conditions = [
    (Diff_Return['Apple_Log_Return']>0) & (Diff_Return['Tesla_Log_Return']>0),
    (Diff_Return['Apple_Log_Return']>0) & (Diff_Return['Tesla_Log_Return']<0),
    (Diff_Return['Apple_Log_Return']<0) & (Diff_Return['Tesla_Log_Return']>0),
    (Diff_Return['Apple_Log_Return']<0) & (Diff_Return['Tesla_Log_Return']<0)
    ]

# create a list of the values we want to assign for each condition
values = ['Both stocks up', 'Stock #1 up, stock #2 down', 'Stock #1 down, stock #2 up', 'Both stocks down']
Short = ['uu','ud','du','dd']
# create a new column and use np.select to assign values to it using our lists as arguments
Log_Returns_Cat['Categories'] = np.select(conditions, values)
Log_Returns_Cat['Categories_Letter'] = np.select(conditions, Short)
# display updated DataFrame
Log_Category=Log_Returns_Cat.iloc[2:]
Log_Category

Unnamed: 0_level_0,Apple_Log_Return,Tesla_Log_Return,Categories,Categories_Letter
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-01-07,0.033554,0.076448,Both stocks up,uu
2021-01-08,0.008594,0.075481,Both stocks down,dd
2021-01-11,-0.023523,-0.081442,Both stocks down,dd
2021-01-12,-0.001396,0.046075,Both stocks up,uu
2021-01-13,0.016096,0.005834,"Stock #1 up, stock #2 down",ud
...,...,...,...,...
2022-05-16,-0.010730,-0.060556,Both stocks down,dd
2022-05-17,0.025105,0.050132,Both stocks up,uu
2022-05-18,-0.058073,-0.070437,Both stocks down,dd
2022-05-19,-0.024950,-0.000550,Both stocks up,uu


# Let's build a transition matrix of portfolio direction that shows your portfolio in          four scenarios:


i.  From moving together to moving together That means starting from uu or dd & going to uu or dd
ii. From moving together to moving apart That means starting from uu or dd & going to ud or du
iii. From moving apart to moving together That means starting from ud or du & going to uu or dd
iv. From moving apart to moving apart That means starting from ud or du & going to ud or du  

In [13]:
Categories=Log_Returns_Cat[['Apple_Log_Return','Tesla_Log_Return','Categories_Letter']].iloc[2:]
Categories.head()

Unnamed: 0_level_0,Apple_Log_Return,Tesla_Log_Return,Categories_Letter
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-07,0.033554,0.076448,uu
2021-01-08,0.008594,0.075481,dd
2021-01-11,-0.023523,-0.081442,dd
2021-01-12,-0.001396,0.046075,uu
2021-01-13,0.016096,0.005834,ud


In [15]:
Categories.groupby(by=['Categories_Letter']).count()

Unnamed: 0_level_0,Apple_Log_Return,Tesla_Log_Return
Categories_Letter,Unnamed: 1_level_1,Unnamed: 2_level_1
dd,120,120
du,53,53
ud,53,53
uu,120,120


In [16]:
from itertools import islice

def window(seq, n=2):
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

Let's create new dataframe and name the columns as state 1 and state 2 and by usuing our window() function( i created above)
Let's put our pairs in new dataframe, after that let's get the probability of each state 1 changing to state 2

In [17]:
pairs = pd.DataFrame(window(Categories['Categories_Letter']), columns=['state1', 'state2'])
counts = pairs.groupby('state1')['state2'].value_counts()
probs = (counts / counts.sum()).unstack()


Let's get our probabality in a new dataframe all together

In [18]:
df=pd.DataFrame(probs).loc[['dd','du','ud','du'], ['dd','du','ud','du'] ]
df

state2,dd,du,ud,du
state1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
dd,0.055072,0.043478,0.052174,0.043478
du,0.055072,0.011594,0.04058,0.011594
ud,0.028986,0.063768,0.011594,0.063768
du,0.055072,0.011594,0.04058,0.011594


# Is the process Markovian?

Yes, this is Markovian as it is memoryless and we can predict future outcome based solely on its present state, we do not need its full history.