# Calculating Financial Statistics
***

## Introduction
The _rate of return_ is a measure of the amount of money gained or lost in an investment. A positive return signifies a profit and a negative return indicates a loss.

The _risk_ of an investment is defined as the likelihood of suffering a financial loss.

In [11]:
# The rate of return we will be using is 7.5%
rate_of_return = 0.075

# Function that displays a value as a percentage
def display_as_percentage(val):
    return "{:.1f}%".format(val * 100)

## Simple Rate of Return
The most basic rate of return is the _simple rate of return_ it is defined as the difference in the starting and ending price of an investment over a time period, divided by the starting price.
$$
R = \frac{E - S + D}{S}
$$

$R$ : Simple rate of return

$S$ : Starting price of an investment

$E$ : Ending price of an investment

$D$ : Dividend

In [12]:
# Defining simple rate of return
def calculate_simple_return(start_price, end_price, dividend=0):
    r = (end_price - start_price + dividend) / start_price
    return r

# Testing our function
simple_return = calculate_simple_return(200, 250, 20)
print('The simple rate of return is', display_as_percentage(simple_return))

The simple rate of return is 35.0%


## Logarithmic Rate of Return
Another type of return is the _logarithmic rate of return_, also known as the continuously compounded return. This is the expected return for an investment where the earnings are assumed to be continually reinvested over the time period. It is calculated by taking the difference between the log of the ending price and the log of the starting price.

$$
r = log(E) - log(S) = log(\frac{E}{S})
$$

$r$ : logarithmic rate of return

$S$ : Starting price of an investment

$E$ : Ending price of an investment

In [13]:
# Defining our log return function
from math import log

def calculate_log_return(start_price, end_price):
  return log(end_price / start_price)

# Testing our function
log_return = calculate_log_return(200, 250)
print('The log rate of return is', display_as_percentage(log_return))

The log rate of return is 22.3%


## Aggregate Across Time I
When describing the rate of return of an investment, something that is important to keep in mind is the time frame of the investment.
It is common to convert returns to a standard time period. Often, this means converting to the annual rate of return in a process called _annualizing_.

To convert a log rate of return from one time period to another, we can multiple the rate of return by the number of original time periods there are in the new time period.

$$
r = r_0 * t
$$

$r$ : converted log rate of return

$r_0$ : Original rate of return

$t$: The number of original time periods in the new time period

In [14]:
daily_return_a = 0.001
monthly_return_b = 0.022

print('The daily rate of return for Investment A is', display_as_percentage(daily_return_a))
print('The monthly rate of return for Investment B is', display_as_percentage(monthly_return_b))

# Defining our function
def annualize_return(log_return, t):
    return log_return * t

# Annualising the return for investment A
annual_return_a = annualize_return(daily_return_a, 252)
print('\n')
print('The annual rate of return for Investment A is', display_as_percentage(annual_return_a))

# Annualising the return for investment B
annual_return_b = annualize_return(monthly_return_b, 12)
print('The annual rate of return for Investment B is', display_as_percentage(annual_return_b))

The daily rate of return for Investment A is 0.1%
The monthly rate of return for Investment B is 2.2%


The annual rate of return for Investment A is 25.2%
The annual rate of return for Investment B is 26.4%


## Aggregate Across Time II

Now, let’s look at an extension of the previous conversion formula. Suppose we know the log rate of return for 5 days of a given year. Which daily log return would we use to calculate the annual return?

In this case, we can first take the average of the 5 daily log returns, then multiple by 252, the number of trading days in a year. The general formula is:

$$
r = \frac{r_{01} + r_{02} + ... + r_{0n}}{n} * t
$$

$r$ : converted log rate of return

$r_{0n}$ : the nth log return from the original time period

$n$ : the number of returns from the original time period

$t$ : the number of original time periods in the new time period

In [15]:
# Daily Returns list
daily_returns = [0.002, -0.002, 0.003, 0.002, -0.001]

# Defining the convert returns function
def convert_returns(log_returns, t):
    return (sum(log_returns) / len(log_returns)) * t

# Calculating the annual returns
annual_return = convert_returns(daily_returns, 252)
print('The annual rate of return is', display_as_percentage(annual_return))

# Calculating the weekly returns 
weekly_return = convert_returns(daily_returns, 5)
print('The weekly rate of return is', display_as_percentage(weekly_return))

The annual rate of return is 20.2%
The weekly rate of return is 0.4%


## Aggregate Across Assets
Using the simple rate of return makes it easy to aggregate across multiple assets. The portfolio return would simply be the weighted average of each individual asset’s simple rate of return.

$$
R = (W_1 * R_1) + (W_2 * R_2) + ... + (W_n * R_n)
$$

$R$ : Portfolio simple rate of return

$W_i$ : weight of the ith investment in the portfolio

$R_i$ : simple rate of return of the ith investment in the portfolio

The weights of each asset is obtained by:

$$
W_i = \frac{S_i}{S_1 + S_2 + ... + S_n}
$$

$W_i$ : weight of the ith investment in the portfolio

$S_i$ : starting price of the ith investment in the portfolio



## Variance
One of the key statistics for understanding risk is _variance_. _Variance_ is a measure of the spread of a dataset, or how far apart each value is from the mean. The greater the variance, the more spread out or variable the data is.

The formula for calculating variance is:
$$
\sigma ^ 2 = \frac{\Sigma(X_i - \hat{X})^2}{n}
$$

$\sigma^2$ : variance

$X_i$ : the ith value in the dataset

$\hat{X}$ : the mean of the dataset

$n$ : the number of values in the dataset


In [16]:
# Calculating Variance
import numpy as np

returns_disney = [0.22, 0.12, 0.01, 0.05, 0.04]
returns_cbs = [-0.13, -0.15, 0.31, -0.06, -0.29]

# Defining our function
def calculate_variance(dataset):
    mean = sum(dataset) / len(dataset)
    numerator = sum([(i - mean)**2 for i in dataset])
    variance = numerator / len(dataset)
    return variance

variance_disney = calculate_variance(returns_disney)
variance_cbs = calculate_variance(returns_cbs)

print('Our function variance for Disney:', variance_disney)
print('Numpy variance for Disney:', np.var(returns_disney))
print('\n')
print('Our function variance for Disney:', variance_cbs)
print('Numpy variance for Disney:', np.var(returns_cbs))


Our function variance for Disney: 0.0056560000000000004
Numpy variance for Disney: 0.0056560000000000004


Our function variance for Disney: 0.04054399999999999
Numpy variance for Disney: 0.04054399999999999


## Standard Deviation
Although the variance is useful in determining the relative risk of an investment, it is sometimes not the easiest statistic to interpret since it does not have the same unit as the original data. As an alternative, it is common to use the standard deviation to describe the spread of the dataset.

Standard deviation is simply the square root of the variance. It has the same unit as the original dataset.

$$
\sigma = \sqrt{\frac{\Sigma (X_i - \hat{X})^2}{n}}
$$

$\sigma$ : standard deviation

$X_i$ : the ith value in the dataset

$\hat{X}$ : the mean of the dataset

$n$ : the number of values in the dataset


In [17]:
# Defining our standard deviation function
from math import sqrt

def calculate_stddev(dataset):
    variance = calculate_variance(dataset)
    stddev = sqrt(variance)
    return stddev

stddev_disney = calculate_stddev(returns_disney)
stddev_cbs = calculate_stddev(returns_cbs)

print('The standard deviation of Disney stock returns is', display_as_percentage(stddev_disney))
print('The Numpy standard deviation of Disney stock returns is', display_as_percentage(np.std(returns_disney)))
print('\n')
print('The standard deviation of CBS stock returns is', display_as_percentage(stddev_cbs))
print('The Numpy standard deviation of CBS stock returns is', display_as_percentage(np.std(returns_cbs)))

The standard deviation of Disney stock returns is 7.5%
The Numpy standard deviation of Disney stock returns is 7.5%


The standard deviation of CBS stock returns is 20.1%
The Numpy standard deviation of CBS stock returns is 20.1%


## Correlation I
Another important statistic for assessing risk is the correlation between the returns of two assets. Correlation is a measure of how closely two datasets are associated with each other. It is often represented by the correlation coefficient, which is a value that ranges between -1 and 1. This indicates whether there is a positive correlation, negative correlation, or no correlation:

* **Positive Correlation** - when the rate of return of one asset deviates upward from its mean, the other usually deviates upward as well.
* **Negative Correlation** - when the rate of return of one asset deviates upward from its mean, the other usually deviates downward.
* **No Correlation** - when a change in one asset’s rate of return does not dictate a change in another. The correlation coefficient will be close to 0.

When building a portfolio, it is generally a good idea to include assets that are not correlated with each other. If assets are independent of one another, then there is a lower risk of the financial loss that can occur when assets in a portfolio are correlated.

In [18]:
# Calculating the Correlation
returns_general_motors = [0.018, -0.005, -0.047, -0.009, -0.012, 0.003, -0.027, -0.014, 0.029, -0.062, 0.009]
returns_ford = [0.002, -0.004, -0.027, -0.022, -0.001, 0.002, -0.006, -0.017, 0.035, -0.029, 0.002]
returns_exxon_mobil = [0.008, 0.015, 0.009, 0.012, 0.003, -0.007, 0.006, 0.005, -0.048, 0.025, -0.012]
returns_apple = [-0.002, 0.007, -0.004, -0.004, 0.002, 0.013, -0.011, 0.017, -0.001, 0.012, 0.006]

corrcoef_matrix = np.corrcoef([returns_general_motors, returns_ford, returns_exxon_mobil, returns_apple])
print(corrcoef_matrix)
print('\n')
print('The correlation coefficient between General Motors and Ford is', corrcoef_matrix[0][1])
print('The correlation coefficient between General Motors and ExxonMobil is:', corrcoef_matrix[0][2])
print('The correlation coefficient between General Motors and Apple is', corrcoef_matrix[0][3])

[[ 1.          0.84145997 -0.70322462 -0.0518139 ]
 [ 0.84145997  1.         -0.87407739 -0.1286648 ]
 [-0.70322462 -0.87407739  1.          0.09955855]
 [-0.0518139  -0.1286648   0.09955855  1.        ]]


The correlation coefficient between General Motors and Ford is 0.841459974316774
The correlation coefficient between General Motors and ExxonMobil is: -0.7032246241393197
The correlation coefficient between General Motors and Apple is -0.05181389942186942


## Correlation II

Below is the formula for the Pearson correlation coefficient:

$$
r_{xy} = \frac{{}\sum_{i=1}^{n} (x_i - \overline{x})(y_i - \overline{y})}
{\sqrt{\sum_{i=1}^{n} (x_i - \overline{x})^2(y_i - \overline{y})^2}}
$$

$r_{xy}$ : Correlation Coefficient
$X_i$ : the ith value in dataset X
$Y_i$ : the ith value in dataset Y 
$n$ : the number of values in the dataset

In [19]:
# Defining the correlation function
def calculate_correlation(set_x, set_y):
    # Summing all values in dataset
    sum_x = sum(set_x)
    sum_y = sum(set_y)

    # Summing squared values in each dataset
    sum_x2 = sum([x**2 for x in set_x])
    sum_y2 = sum([y**2 for y in set_y])

    # Sum of the product of each respective element in each dataset
    sum_xy = sum([i * j for i,j in zip(set_x, set_y)])

    # Length of dataset
    n = len(set_x)

    # Calculating correlation coefficient
    numerator = n * sum_xy - sum_x * sum_y
    denominator = sqrt((n * sum_x2 - sum_x ** 2) * (n * sum_y2 - sum_y ** 2))

    return numerator / denominator

# Function calls
print('The correlation coefficient between General Motors and Ford is', calculate_correlation(returns_general_motors, returns_ford))
print('The correlation coefficient between General Motors and ExxonMobil is', calculate_correlation(returns_general_motors, returns_exxon_mobil))
print('The correlation coefficient between General Motors and Apple is', calculate_correlation(returns_general_motors, returns_apple))

The correlation coefficient between General Motors and Ford is 0.8414599743167742
The correlation coefficient between General Motors and ExxonMobil is -0.7032246241393197
The correlation coefficient between General Motors and Apple is -0.05181389942186936
