# NumPy Operations
NumPy is a Python package that provides lots of the underlying functionality of pandas. In fact we encounter NumPy every time we see a NaN value. Pandas uses NumPy under-the-hood to optimise several of its internal computations too.

Before we start, let's load `pandas`, `numpy` and our dataset.  Notice that NumPy also has a preferred shortform.

In [2]:
import numpy as np
import pandas as pd

In [3]:
df = pd.read_csv("TSLA_clean.csv")
df.Date = pd.to_datetime(df.Date)
df.set_index("Date")
df

Unnamed: 0,Date,Close,High,Low,Open,Volume
0,2015-01-02,14.620667,14.883333,14.217333,14.858000,71466000
1,2015-01-05,14.006000,14.433333,13.810667,14.303333,80527500
2,2015-01-06,14.085333,14.280000,13.614000,14.004000,93928500
3,2015-01-07,14.063333,14.318667,13.985333,14.223333,44526000
4,2015-01-08,14.041333,14.253333,14.000667,14.187333,51637500
...,...,...,...,...,...,...
2511,2024-12-23,430.600006,434.510010,415.410004,28.586000,72698100
2512,2024-12-24,462.279999,462.779999,435.140015,435.899994,59551800
2513,2024-12-26,454.130005,465.329987,451.019989,465.160004,76366400
2514,2024-12-27,431.660004,450.000000,426.500000,449.519989,82666800


## Log Returns

Logarithmic returns are often used in finance due to their useful statistical properties. They are **additive over time**, making them ideal for analysing historical returns across multiple periods.

Consider the following example:

* You invest £100
* On the first day, the simple return is +10% → your investment grows to £110
* On the second day, the simple return is -10% → your investment drops to £99

Using simple returns, you might assume the net change is 0%, since +10% and -10% appear to cancel out. But in reality, you've lost £1 — a **-1%** return overall.

Logarithmic returns correctly account for compounding. Here's how:

* Day 1: $\ln(110 / 100) ≈ 0.09531$
* Day 2: $\ln(99 / 110) ≈ -0.10536$
* Total log return: $0.09531 + (-0.10536) = -0.01005$

To convert back to a simple return:
$e^{-0.01005} - 1 ≈ -0.01$ → **-1%**, matching the actual loss.

The formula for calculating log returns is:

$\ln\left(\frac{{\text{price}_{\text{current}}}}{{\text{price}_{\text{previous}}}}\right)$

To compute daily log returns in code, use `.shift()` to get the previous day's price, then apply NumPy’s `log` function.


In [4]:
df["PrevClose"] = df.Close.shift()
df["LogReturns"] = np.log(df.Close / df.PrevClose) #missing the log function which is in the np function
# You can also write it as np.log(df.Close) - np.log(df.PrevClose)

df["SimpleReturns"] = df.Close.pct_change()
np.log(1+ df.SimpleReturns)
#instead of taking cumulative product we will take cumulative sum as log is additive and not multiplicative

0            NaN
1      -0.042950
2       0.005648
3      -0.001563
4      -0.001566
          ...   
2511    0.022404
2512    0.070991
2513   -0.017787
2514   -0.050745
2515   -0.033569
Name: SimpleReturns, Length: 2516, dtype: float64

### Exercise: Cumulatively Comparing

The sum of the log returns is the natural logarithm of the cumulative return. To calculate the cumulative simple return from the log returns, sum the log returns over the period and exponentiate (NumPy has an `exp` function for this) the sum.

Calculate the cumulative return based on the simple returns, and then compare this to the cumulative simple return calculated from the log return.

In [5]:
end_simple = (1 + df.SimpleReturns).prod() - 1
print(f"Simple returns cumulative end-of-period {end_simple}")

end_log = np.exp(df.LogReturns.sum()) - 1
print(f"Log returns cumulative end-of-period {end_log}")

Simple returns cumulative end-of-period 27.549312462864847
Log returns cumulative end-of-period 27.5493124628648


## Other useful functions

Another useful NumPy function is `np.where()`, often used for populating columns with a signal or indicator, depending on if a condition is met. Let's create a column to colour code our trading days. Days will have a different colour depening on if the market closes higher (green) or lower (red) than the opening.

In [6]:
df["Colour"] = np.where(df["Close"] > df["Open"], "green", "red")
df


Unnamed: 0,Date,Close,High,Low,Open,Volume,PrevClose,LogReturns,SimpleReturns,Colour
0,2015-01-02,14.620667,14.883333,14.217333,14.858000,71466000,,,,red
1,2015-01-05,14.006000,14.433333,13.810667,14.303333,80527500,14.620667,-0.042950,-0.042041,red
2,2015-01-06,14.085333,14.280000,13.614000,14.004000,93928500,14.006000,0.005648,0.005664,green
3,2015-01-07,14.063333,14.318667,13.985333,14.223333,44526000,14.085333,-0.001563,-0.001562,red
4,2015-01-08,14.041333,14.253333,14.000667,14.187333,51637500,14.063333,-0.001566,-0.001564,red
...,...,...,...,...,...,...,...,...,...,...
2511,2024-12-23,430.600006,434.510010,415.410004,28.586000,72698100,421.059998,0.022404,0.022657,green
2512,2024-12-24,462.279999,462.779999,435.140015,435.899994,59551800,430.600006,0.070991,0.073572,green
2513,2024-12-26,454.130005,465.329987,451.019989,465.160004,76366400,462.279999,-0.017787,-0.017630,red
2514,2024-12-27,431.660004,450.000000,426.500000,449.519989,82666800,454.130005,-0.050745,-0.049479,red


## The VWAP

VWAP (Volume-Weighted Average Price) over a period can be an important metric for evaluating trading activity. It is often thought of as the true average price for a stock, and is calculated using the following formula:

$$ \text{VWAP} = \frac{\sum_{i=1}^{n} \text{typical price}_i \cdot \text{volume}_i}{\sum_{i=1}^{n} \text{volume}_i} $$

The *typical price* for each period is calculated as:

$$ \text{typical price}_i = \frac{\text{high}_i + \text{low}_i + \text{close}_i}{3} $$

We can calculate VWAP with the help of the `np.dot()` function. This performs a dot product, which will multiply corresponding elements in the price and volume columns and then find the sum. Let's calculate the VWAP for our stock over this period, which will give us an indication of the price at which the bulk of trading took place.


In [8]:
df["TP"] = (df.High + df.Low + df.Close) / 3

vwap = np.dot(df.TP, df.Volume) / np.sum(df.Volume)
vwap

np.float64(109.57359248499407)

There is more to NumPy that we'll explore over the next days, and even more that we won't get a chance to use.