# Simple Returns and Log Returns

# 0. Libraries and Settings

In [1]:
import yfinance as yf
import seaborn as sns
import pandas as pd
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
sns.set(style="darkgrid")

# 1. Log returns

In [3]:
# make a random dataframe

df = pd.DataFrame(index = [2016, 2017, 2018], data = [100, 50, 95], columns = ['Price'])
df

Unnamed: 0,Price
2016,100
2017,50
2018,95


## 1.1 Mean returns

In [4]:
simple_returns = df.pct_change().dropna()
simple_returns

Unnamed: 0,Price
2017,-0.5
2018,0.9


In [5]:
simple_returns.mean()
# the mean give the return of the stock but in this case we see that this is clearly not the case
# from 2016 to 2018, the stock has gone down in value by 1%
# thus the mean is not a good measure of the return of the stock in this case since it says that the average return is 0.2
# so according to the mean, we expect 100*1.2*1.2 = 144 in 2018, but the actual value is 95
# thus mean returns are misleading

# thus we can use the log returns


Price    0.2
dtype: float64

## 1.2 Log returns

The equation `np.log(df / df.shift(1)).dropna()` is used to calculate the log returns of stock prices using pandas DataFrame operations along with NumPy for numerical computations. Let's break down the components of this equation to understand the math behind it:

1. **`df.shift(1)`**: Shifts the DataFrame `df` down by one row. For a series of stock prices in `df`, this operation gets the stock price of the previous day (or the previous time period, depending on the data's frequency).

2. **`df / df.shift(1)`**: Calculates the ratio of the current day's price to the previous day's price for each stock. Denoting the current day's price by $(P_t)$ and the previous day's price by $(P_{t-1})$, the ratio is $(\frac{P_t}{P_{t-1}})$.

3. **`np.log(...)`**: Applies the natural logarithm (log base $(e)$) to the ratio calculated in step 2. The log return for a single period is given by $( \ln\left(\frac{P_t}{P_{t-1}}\right) )$, where $( \ln )$ denotes the natural logarithm. Log returns are additive over time and more symmetric compared to simple returns, making them suitable for analyzing returns over multiple periods and for statistical modeling.

4. **`.dropna()`**: Removes any rows with NaN values that result from the shift operation (since the first row of `df / df.shift(1)` will always be NaN due to no preceding row for the first row to be divided by).


### Mathematical Explanation

The equation calculates the log return of a stock, a measure of the percentage change in the stock's price, adjusted for compounding over multiple periods. The log return for a single period from $(t-1)$ to $(t)$ is expressed as:

$$ r_t = \ln\left(\frac{P_t}{P_{t-1}}\right) $$

Advantages of using log returns include:

- **Time Additivity**: Log returns over non-overlapping intervals can be added to find the total log return over the period.
- **Normalization**: Log returns are more stable and normally distributed, useful for statistical modeling and hypothesis testing in finance.
- **Symmetry**: Log returns treat proportional gains and losses more symmetrically than simple returns.

This equation is a standard way to pre-process financial time series data for further analysis, such as calculating volatility, modeling price movements, or evaluating investment strategies.


In [9]:
df.shift(1) # remember, this shifts the data down by 1 step

Unnamed: 0,Price
2016,
2017,100.0
2018,50.0


In [10]:
df / df.shift(1)

Unnamed: 0,Price
2016,
2017,0.5
2018,1.9


In [11]:
logreturns = np.log(df / df.shift(1)).dropna()
logreturns

Unnamed: 0,Price
2017,-0.693147
2018,0.641854


In [12]:
# now: mean of the log return

logreturns.mean()

Price   -0.025647
dtype: float64

the expression `100 * (np.exp(2 * logreturns.mean()))` is calculating the expected percentage return over twice the average period (average period is 1 year) considered, based on the mean of the log returns.

so we start with 100 and want to determine the value after 2 years

In [15]:
# now lets calculate it as an arithmatic perspective:

100*(  np.exp( 2*logreturns.mean() )  )

Price    95.0
dtype: float64

In [31]:
# this is another example: 

df2 = pd.DataFrame(index = [2016, 2017, 2018, 2019], data = [200, 50, 95, 40], columns = ['Price'])
print(df2)
logreturns = np.log(df2 / df2.shift(1)).dropna()

print('\n')
print( 200*(  np.exp( 3*logreturns.mean() )  ) )

# now if we dont know what the value could be for 2020 (4th year), so if we assume that the relationship is logarithmic then we can predict the value like this:
# stating_value * exp( nth_year * logreturns.mean()) )
print('\n')
print( 200*(  np.exp( 4*logreturns.mean() )  ) )

      Price
2016    200
2017     50
2018     95
2019     40


Price    40.0
dtype: float64


Price    23.392142
dtype: float64
