Deadline: Dec 10 2023

# Introduction

Under mild theoretical assumptions (non existence of arbitrage opportunities), it can be shown that the time $t$ price $P_t$ of any asset that pays $X_{t+1}$ at time $t+1$ has to satisfy the equation:
$$
    P_t = \mathbb{E}_t\left[M_{t + 1} X_{t + 1} \right]
$$
where $M_{t + 1}$ is a stochastic variable called Stochastic Discount Factor.

For the particular case of a stock, we can write it's future payoff $X_{t + 1}$as:

$$
    X_{t + 1} = D_{t + 1} + P_{t + 1}
$$,

where $D_{t + 1}$ is the dividend payed next year and $P_{t + 1}$ is the stock price next year. That is, if you buy 1 stock now and sell it exactly one year from now your cash flows are $-P_t$ today and $D_{t + 1} + P_{t + 1}$ in one year. These cashflows have to satisfy:

$$
    P_t = \mathbb{E}_t\left[M_{t + 1} (D_{t + 1} + P_{t + 1}) \right]
$$

A common assumption to study asset prices is that the Stochastic Discount Factor can be decomposed as:

$$
    M_{t + 1} = \gamma \frac{\pi_{t+1}}{\pi_t}
$$

where $\pi$ is a random variable that captures the state of the economy and $\gamma$ is a constant $\in [0, 1)$. Under this assumption, stock prices have to satisfy:

$$
    \pi_t P_t = \mathbb{E}_t\left[\gamma   \pi_{t + 1} (D_{t + 1} + P_{t + 1}) \right]
$$

Defining
$$V\equiv \pi P$$

and

$$R\equiv \gamma   \pi_{t + 1} D_{t + 1}$$
we finally get:

$$
V_t = \mathbb{E}_t\left[R + \gamma V_{t+1} \right]
$$

that is, stock prices satisfy a Bellman equation!

Use a neural network to represent the value function $V$ and solve the corresponding Bellman equation using temporal differences learning (TD).
Once you have solved for $V$, you can back out the stock price as:

$$
    P_t = \frac{V_t}{\pi_t}
$$

Plot the stock price as a function of the dividends.



# Data

You have a time series of dividends payed by a given *firm*. Suppose you also have a time series for $\pi$. The data is stored in the file replay_buffer.npy, which is a $10000 \times 4$ matrix where each row consists of an observation of:

$$
    D_t, D_{t + 1}, \pi_t, \pi_{t+1}
$$

In [None]:
%%capture
# This snipped of code will download the data file
!wget https://www.dropbox.com/s/ms5p2hqqhlaago7/replay_buffer.npy?dl=1
!mv replay_buffer.npy?dl=1 replay_buffer.npy

In [7]:
#Download npy
import numpy as np
data = np.load('/replay_buffer.npy')
data_matrix = np.asarray(data)
print(data_matrix)

[[1.7544187  1.6899673  0.32488784 0.35014135]
 [1.5905526  1.617272   0.39527917 0.382326  ]
 [1.6724763  1.6786338  0.3575033  0.35488534]
 ...
 [1.4697709  1.3415053  0.4629144  0.55566776]
 [2.2108552  2.2101867  0.20458762 0.20471142]
 [1.322186   1.2165726  0.5720248  0.67565334]]


In [16]:
import torch
import torch.nn as nn


# Bellman Loss
class CustomLoss(nn.Module):
    def __init__(self):
        super(CustomLoss, self).__init__()

    def forward(self, input, target,R):
        #Bellman loss
        gamma=0.85
        loss = torch.mean(torch.abs(gamma*target+gamma*R - input))
        return loss

# Network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc = nn.Linear(5, 5)

    def forward(self, x):
        x = self.fc(x)
        return x




In [17]:
import torch.optim as optim
net = Net()

loss_func = CustomLoss()
# optimizer
optimizer = optim.SGD(net.parameters(), lr=0.1)

# random input
inputs = torch.randn(2000, 5)
targets = torch.randn(2000, 5)
result_vector = data_matrix[:, 1] * data_matrix[:, 3]
R = torch.tensor(result_vector).view(2000, 5)

# training
epochs = 20
batch_size = 100
num_batches = inputs.size(0) // batch_size

for epoch in range(epochs):
    running_loss = 0.0
    for batch in range(num_batches):
        batch_inputs = inputs[batch * batch_size: (batch + 1) * batch_size]
        batch_targets = targets[batch * batch_size: (batch + 1) * batch_size]
        batch_R = R[batch * batch_size: (batch + 1) * batch_size]

        optimizer.zero_grad()

        outputs = net(batch_inputs)
        loss = loss_func(outputs, batch_targets,batch_R)

        loss.backward()

        optimizer.step()

        running_loss += loss.item()

    print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, epochs, running_loss/num_batches))

Epoch [1/20], Loss: 1.0293
Epoch [2/20], Loss: 0.9204
Epoch [3/20], Loss: 0.8493
Epoch [4/20], Loss: 0.8050
Epoch [5/20], Loss: 0.7795
Epoch [6/20], Loss: 0.7651
Epoch [7/20], Loss: 0.7574
Epoch [8/20], Loss: 0.7536
Epoch [9/20], Loss: 0.7517
Epoch [10/20], Loss: 0.7506
Epoch [11/20], Loss: 0.7501
Epoch [12/20], Loss: 0.7498
Epoch [13/20], Loss: 0.7496
Epoch [14/20], Loss: 0.7496
Epoch [15/20], Loss: 0.7495
Epoch [16/20], Loss: 0.7495
Epoch [17/20], Loss: 0.7494
Epoch [18/20], Loss: 0.7494
Epoch [19/20], Loss: 0.7494
Epoch [20/20], Loss: 0.7494


In [20]:
V=net(inputs)
pai = torch.tensor(data_matrix[:, 2]).view(2000, 5)
P=torch.div(V,pai)
P=P.view(10000,1)

In [21]:
print(P)

tensor([[2.4498],
        [1.7286],
        [2.0523],
        ...,
        [1.5176],
        [3.4367],
        [1.1196]], grad_fn=<ViewBackward0>)
