# NumPy Operations
NumPy is a Python package that provides lots of the underlying functionality of pandas. In fact we encounter NumPy every time we see a NaN value. Pandas uses NumPy under-the-hood to optimise several of its internal computations too.

Before we start, let's load `pandas`, `numpy` and our dataset.  Notice that NumPy also has a preferred shortform.

In [3]:
import numpy as np
import pandas as pd

In [4]:
df = pd.read_csv("TSLA_10_clean.csv")
df.Date = pd.to_datetime(df.Date,dayfirst=True)
df.set_index("Date", inplace=True)
df

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-07-11,241.029999,271.000000,239.649994,263.299988,221707300
2020-08-13,108.066666,110.078667,104.484001,107.400002,306379500
2020-08-13,108.066666,110.078667,104.484001,107.400002,306379500
2019-10-30,21.000668,21.252666,20.664667,20.866667,144627000
2015-08-27,16.199333,16.199333,15.387333,15.400000,114840000
...,...,...,...,...,...
2018-08-10,23.699333,24.000000,23.066668,23.600000,173280000
2021-06-21,206.943329,210.463333,202.960007,208.160004,74438100
2016-06-20,14.646667,14.916667,14.548667,14.633333,53332500
2019-02-21,19.415333,20.216000,19.366667,20.120667,133638000


## Log Returns

Logarithmic returns are often used in finance due to their useful statistical properties. They are **additive over time**, making them ideal for analysing historical returns across multiple periods.

Consider the following example:

* You invest £100
* On the first day, the simple return is +10% → your investment grows to £110
* On the second day, the simple return is -10% → your investment drops to £99

Using simple returns, you might assume the net change is 0%, since +10% and -10% appear to cancel out. But in reality, you've lost £1 — a **-1%** return overall.

Logarithmic returns correctly account for compounding. Here's how:

* Day 1: $\ln(110 / 100) ≈ 0.09531$
* Day 2: $\ln(99 / 110) ≈ -0.10536$
* Total log return: $0.09531 + (-0.10536) = -0.01005$

To convert back to a simple return:
$e^{-0.01005} - 1 ≈ -0.01$ → **-1%**, matching the actual loss.

The formula for calculating log returns is:

$\ln\left(\frac{{\text{price}_{\text{current}}}}{{\text{price}_{\text{previous}}}}\right)$

To compute daily log returns in code, use `.shift()` to get the previous day's price, then apply NumPy’s `log` function.


In [5]:
df["PrevClose"] = df.Close.shift()
df["LogReturns"] = np.log(df.Close / df.PrevClose)

df["SimpleReturns"] = df.Close.pct_change()
np.log(1+ df.SimpleReturns)

Date
2024-07-11         NaN
2020-08-13   -0.802173
2020-08-13    0.000000
2019-10-30   -1.638194
2015-08-27   -0.259584
                ...   
2018-08-10   -2.026565
2021-06-21    2.166998
2016-06-20   -2.648232
2019-02-21    0.281850
2019-05-14   -0.226041
Name: SimpleReturns, Length: 2566, dtype: float64

### Exercise: Cumulatively Comparing

The sum of the log returns is the natural logarithm of the cumulative return. To calculate the cumulative simple return from the log returns, sum the log returns over the period and exponentiate (NumPy has an `exp` function for this) the sum.

Calculate the cumulative return based on the simple returns, and then compare this to the cumulative simple return calculated from the log return.

In [6]:
## YOUR CODE GOES HERE
end_simple = (1+df.SimpleReturns).prod()-1
print(f"")




## Other useful functions

Another useful NumPy function is `np.where()`, often used for populating columns with a signal or indicator, depending on if a condition is met. Let's create a column to colour code our trading days. Days will have a different colour depening on if the market closes higher (green) or lower (red) than the opening.

In [7]:
df["Colour"] = np.where(df.Close > df.Open, "green", "red")
df


Unnamed: 0_level_0,Close,High,Low,Open,Volume,PrevClose,LogReturns,SimpleReturns,Colour
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2024-07-11,241.029999,271.000000,239.649994,263.299988,221707300,,,,red
2020-08-13,108.066666,110.078667,104.484001,107.400002,306379500,241.029999,-0.802173,-0.551646,green
2020-08-13,108.066666,110.078667,104.484001,107.400002,306379500,108.066666,0.000000,0.000000,green
2019-10-30,21.000668,21.252666,20.664667,20.866667,144627000,108.066666,-1.638194,-0.805669,green
2015-08-27,16.199333,16.199333,15.387333,15.400000,114840000,21.000668,-0.259584,-0.228628,green
...,...,...,...,...,...,...,...,...,...
2018-08-10,23.699333,24.000000,23.066668,23.600000,173280000,179.830002,-2.026565,-0.868213,green
2021-06-21,206.943329,210.463333,202.960007,208.160004,74438100,23.699333,2.166998,7.732032,red
2016-06-20,14.646667,14.916667,14.548667,14.633333,53332500,206.943329,-2.648232,-0.929224,green
2019-02-21,19.415333,20.216000,19.366667,20.120667,133638000,14.646667,0.281850,0.325580,red


## The VWAP

VWAP (Volume-Weighted Average Price) over a period can be an important metric for evaluating trading activity. It is often thought of as the true average price for a stock, and is calculated using the following formula:

$$ \text{VWAP} = \frac{\sum_{i=1}^{n} \text{typical price}_i \cdot \text{volume}_i}{\sum_{i=1}^{n} \text{volume}_i} $$

The *typical price* for each period is calculated as:

$$ \text{typical price}_i = \frac{\text{high}_i + \text{low}_i + \text{close}_i}{3} $$

We can calculate VWAP with the help of the `np.dot()` function. This performs a dot product, which will multiply corresponding elements in the price and volume columns and then find the sum. Let's calculate the VWAP for our stock over this period, which will give us an indication of the price at which the bulk of trading took place.


In [14]:
df ["TP"] = (df.High + df.Low + df.Close)/3
vwap= np.dot(df.TP, df.Volume) / np.sum(df.Volume)
vwap

np.float64(109.4721116286912)

There is more to NumPy that we'll explore over the next days, and even more that we won't get a chance to use.