# Standard deviation for trading with Python

## Definition of Standard deviation

In statistics,

    The standard deviation (σ) is a measure that is used to quantify the amount of variation or dispersion of data from its mean.

So, if the values in a dataset lie close together, the standard deviation would be small. On the contrary, if the values are spread out, the standard deviation would be larger.


## Standard deviation in Finance and Trading

### Standard deviation as a measure of volatility

In trading and finance, it's important to quantify the volatility of an asset. An asset’s volatility, unlike its return or price, is an unobserved variable.

Standard deviation has a special significance in risk management and performance analysis as it is often used as a proxy for the volatility of a security. For example, the well-established blue-chip securities have a lower standard deviation in their returns compared to that of small-cap stocks.

On the other hand, assets like cryptocurrency have a higher standard deviation, as their returns vary widely from their mean.

In the next section, we will show how to compute the annualized volatility of stocks in Python.

Let us now compute and compare the annualized volatility for two Indian stocks namely, ITC and Reliance. We begin with fetching the end of day close price data using the *yfinance* library for a period of the last 5 years:

## Importing libraries

In [None]:
!pip install yfinance
import yfinance as yf
import warnings
warnings.filterwarnings('ignore')


## Downloading the data

In [None]:
# Downloading the data for ITC and RELIANCE stocks using Yahoo Finance library
itc_df = yf.download('ITC.NS', period= '5y') [['Adj Close']]
reliance_df = yf.download('RELIANCE.NS', period = '5y') [['Adj Close']]


In [None]:
#Taking a look at the fetched data
itc_df.tail()

In [None]:
reliance_df.tail()

Below, we calculate the daily returns using the pct_change() method and the standard deviation of those returns using the std() method to get the daily volatilities of the two stocks:

In [None]:
# Compute the returns of the two stocks:
itc_df['Returns'] = itc_df['Adj Close'].pct_change()
reliance_df['Returns'] = reliance_df['Adj Close'].pct_change()
print(reliance_df[['Adj Close','Returns']])

##

In [None]:
# Compute the standard deviation of the returns using the pandas std() method:
daily_sd_itc = itc_df['Returns'].std()
daily_sd_rel = reliance_df['Returns'].std()

In [None]:
reliance_df.dropna(inplace=True)
reliance_df.head()

In general, the volatility of assets is quoted in annual terms. So below, we convert the daily volatilities to annual volatilities by multiplying with the square root of 252 (the number of trading days in a year):

In [None]:
import numpy as np
# Annualized standard deviation
annualized_sd_itc = daily_sd_itc * np.sqrt(252)
annualized_sd_rel = daily_sd_rel * np.sqrt(252)
print(f'The annualized standar deviation of the ITC stock daily returns is: {annualized_sd_itc*100:.2f}%')
print(f'The annualized standard deviation of the Reliance stock daily returns is: {annualized_sd_rel*100:.2f}%')

## Standard deviation with Bessel's correction

Now we will compute the standard deviation with Bessel's correction. To do this, we provide a ddof parameter to the Numpy std function. Here, ddof means 'Delta Degrees of Freedom'.

By default, Numpy uses ddof=0 for calculating standard deviation- this is the standard deviation of the population. For calculating the standard deviation of a sample, we give ddof=1, so that in the formula, (n−1) is used as the divisor. Below, we do the same:

In [None]:
# Compute the standard deviation with Bessel's correction
daily_sd_itc_b = itc_df['Returns'].std(ddof=1)
daily_sd_rel_b = reliance_df['Returns'].std(ddof=1)

# Annualized standard deviation with Bessel's correction
annualized_sd_itc_b = daily_sd_itc_b* np.sqrt(252)
annualized_sd_rel_b = daily_sd_rel_b* np.sqrt(252)

print(f'The annualized standard deviation of the ITC stock daily returns with Bessel\'s correction is: {annualized_sd_itc_b*100:.2f}%')
print(f'The annualized standard deviation of the reliance stock daily returns with Bessel\'s correction is: {annualized_sd_rel_b*100:.2f}%')


Thus, we can observe that, as the sample size is very large, Bessel's correction does not have much impact on the obtained values of standard deviation. In addition, based on the given data, we can say that the Reliance stock is more volatile compared to the ITC stock.

## The z-score in Standard Deviation

Z-score is a metric that tells us how many standard deviations away a particular data point is from the mean. It can be negative or positive. A positive z-score, like 1, indicates that the data point lies one standard deviation above the mean and a negative z-score, like -2, implies that the data point lies two standard deviations below the mean.

In financial terms, when calculating the z-score on the returns of an asset, a higher value of z-score (either positive or negative) means that the return of the security differs significantly from its mean value. So, the z-score tells us how well the data point conforms to the norm.

Usually, if the absolute value of a z score of a data point is very high (for example, more than 3), it indicates that the data point is quite different from the other data points.

we calculate and plot the z-scores for the ITC stock returns using the above formula in Python:

In [None]:
itc_df['z-score'] = (itc_df['Returns'] - itc_df['Returns'].mean())/itc_df['Returns'].std(ddof=1)

In [None]:
import matplotlib.pyplot as plt

itc_df['z-score'].plot(figsize=(20,10));
plt.axhline(-3, color='cyan')
plt.title('Z-scores for ITC stock returns')
plt.show();

From the above figure, we observe that around March of 2020, the ITC stock returns had a z-score reaching below -3 several times, indicating that the returns were more than 3 standard deviations below the mean for the given data sample. As we know that this was during the sell-off triggered by the COVID pandemic.

In addition, a standardized measure like the z-score is used widely to generate signals for mean-reverting trading strategies such as pair trading.

Also, one can use the zscore function from the scipy.stats module to calculate the z-scores as follows:

In [None]:
# Computing z-scores in python using scipy.stats module
import scipy.stats as stats
reliance_df['Returns_zscore'] = stats.zscore(reliance_df['Returns'])
reliance_df.tail()

## Value at Risk with standard deviation
Value at Risk (VaR) is an important financial risk management metric that quantifies the maximum loss that can be realized in a given time with a given level of confidence/probability for a given strategy, portfolio or trading desk.

In this method we assume that the returns are normally distributed for the lookback period. Understand how VaR calculation can help enhance your skills in financial risk management.

We calculate the z-score of the returns of the strategy based on the confidence level we want and then multiply it with the standard deviation to get the VaR. To get the VaR in dollar terms, we can multiply it with the investment in the strategy.

For example, if we want the 95% confidence VaR, we are essentially finding the cut-off point for the worst 5% of the losses from the returns distribution. If we assume that the stock returns are normally distributed, then their z-scores will have a standard normal distribution. So, the cut-off point for the worst 5% returns is -1.64.

Thus the 1-year 95% VaR of a simple strategy of investing in the ITC stock is given by:

VaR = (−1.64) ∗ (s) ∗ investment

where, s is the annualized standard deviation of the ITC stocks.

In [None]:
#1 year 95% VaR calculation for ITC stock:
from scipy.stats import norm

initial_investment = 100000
annual_standard_deviation = annualized_sd_itc
confidence_level = .95

# using the norm.ppf (percent point function), calculate the value where 95% our data lies
z_score_cut_off = norm.ppf(1-confidence_level, 0, 1)

In [None]:
VaR = z_score_cut_off * annual_standard_deviation * initial_investment
VaR

Thus, we can say that the maximum loss that can be realized in 1 year with 95% confidence is INR 45045. Of course, this was calculated under the assumption that ITC stock returns follow a normal distribution.

## Confidence intervals with standard deviation

Another common use case for standard deviation is in computing the confidence intervals.

In general, when we work with data, we assume that the population from which the data has been generated follows a certain distribution and the population parameters for that distribution are not known. These population parameters have to be estimated using the sample.

For example, the mean daily return of the ITC stock is a population parameter, which we try to estimate using the sample mean. This gives us a point estimate. However, financial market forecasts are probabilistic, and hence, it would make more sense to work with an interval estimate rather than a point estimate.

A confidence interval gives a probable estimated range within which the value of population parameter may lie. Assuming the data to be normally distributed, we can use the empirical rule to describe the percentage of data that falls within 1, 2, and 3 standard deviations from the mean.

   1. About 68% of the values lie between -1 and +1 standard deviation from the mean.
   2. About 95% of the values lie within two standard deviations from the mean.
   3. About 99.7% of the values lie within three standard deviations from the mean.

In [None]:
# Compute the sample mean of the ITC returns
daily_mean_itc = itc_df['Returns'].mean()

#Compute 95% confidence interval for the ITC returns
ci_95_itc_upper = daily_mean_itc + 2* daily_sd_itc_b
ci_95_itc_lower = daily_mean_itc - 2* daily_sd_itc_b
print(f'The 95% confidence interval of the ITC stock daily returns is: [{ci_95_itc_lower:.2f},{ci_95_itc_upper:.2f}]')

## Conclusion

In this notebook, we have seen how standard deviation captures the dispersion in a given dataset with ease. We saw that interpreting the standard deviation is much more intuitive compared to variance and why it remains the most popular measure of dispersion in the world of quantitative finance.

Moreover, we discussed some use cases of standard deviation in quant finance and trading such as estimating the unobservable volatility of asset returns, the z-score, computation of VaR for risk management and determining confidence intervals for unknown population parameters.