Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about reward function and __pack_samples #22

Closed
ziofil opened this issue Dec 17, 2017 · 14 comments
Closed

Question about reward function and __pack_samples #22

ziofil opened this issue Dec 17, 2017 · 14 comments

Comments

@ziofil
Copy link

ziofil commented Dec 17, 2017

I'm having trouble reconciling what I read in the paper and what I read in the code.

The reward function in a single period in the paper (Eq. (10)) is \log(\mu_t y_t \cdot w_{t-1}). But in the code, it seems that the reward is instead \log(mu_t y_{t+1} \cdot w_{t}). Am I correct?

Because __pack_samples (in datamatrices.py) makes the price tensor X using M[..., :-1] and the relative price vector y using M[...,-1]/M[...,-2], so y is one period ahead of X.

@dexhunter
Copy link
Collaborator

dexhunter commented Dec 17, 2017

y is the future price, including y into X would introduce look-ahead bias (edit: which is avoided in the framework). The price history of every mini-batch is normalized by the last period of that mini-batch.

The reward function in a single period in the paper (Eq. (10)) is \log(\mu_t y_t \cdot w_{t-1}). But in the code, it seems that the reward is instead \log(mu_t y_{t+1} \cdot w_{t}). Am I correct?

y_t means the future price of this period, and y_{t+1} means future price of next period.

@ziofil
Copy link
Author

ziofil commented Dec 17, 2017

Hang on a sec. In the paper, Eq. (1) says that y_t = v_t/v_{t-1} is the element-wise division of the closing and opening prices of period t: which means it describes the evolution of prices during the period t, not the "future price".

In the definition of X_t (below Eq. (18)), the paper says that the one-to-last element of V_t is v_{t-1}/v_t, which is the inverse of y_t, so y_t is "included" into X_t.

There seems to be a discrepancy between the code and the paper.

@dexhunter
Copy link
Collaborator

dexhunter commented Dec 18, 2017

In the definition of X_t (below Eq. (18)), the paper says that the one-to-last element of V_t is v_{t-1}/v_t, which is the inverse of y_t, so y_t is "included" into X_t.

From my understanding, t is not the same in two equations (1&18). t is more like a variable rather than constant. Is that what you are confused about?

y is price relative vector for two adjacent price vector(v_t & v_{t-1}) while Eq. (18) states the normalization of the price in the training set. You can see the normalization code for y and for X

@dexhunter
Copy link
Collaborator

The reward function in a single period in the paper (Eq. (10)) is \log(\mu_t y_t \cdot w_{t-1}). But in the code, it seems that the reward is instead \log(mu_t y_{t+1} \cdot w_{t}). Am I correct?

Or are you confused about commission fee calculation? The input_num in code is not time, but batch_size. So t in mu_t is in the same in y_t

@ziofil
Copy link
Author

ziofil commented Dec 18, 2017

From my understanding, t is not the same in two equations (1&18). t is more like a variable rather than constant. Is that what you are confused about?

Well, if t is not the same then yes, it's definitely extremely confusing, because v_t in (1) and v_t in (below 18) are the same symbol and should mean the same thing! 😆

Or are you confused about commission fee calculation? The input_num in code is not time, but batch_size. So t in mu_t is in the same in y_t

No, the commission fee calculation is fine. I am trying to understand what exact periods you are using for X, w and mu in the loss function, and whether or not you are applying the definitions of the paper.

[Feedback for the paper]
If it is how you say, using the same symbol for two different things is a very confusing oversight.
As an author of scientific papers myself, I can assure you that this issue is nearly impossible to find by a referee. If you still have time to fix it before publication (out of curiosity, are you an author?), please do.

@kumkee
Copy link
Collaborator

kumkee commented Dec 18, 2017

Well, if t is not the same then yes, it's definitely extremely confusing, because v_t in (1) and v_t in (below 18) are the same symbol and should mean the same thing!

Yes, both t's in (1) and (18) mean the same thing: time, and both v_t's mean the same thing: the prices of all assets at time t.

update: please refer to Figure 1 in the paper, that should help clarify these messes.

@ziofil
Copy link
Author

ziofil commented Dec 18, 2017

Yes, both t's in (1) and (18) mean the same thing: time, and both v_t's mean the same thing: the price of the asset at t.

Good! Then is it correct to say that y_t is the inverse of the one-to-last (time-wise) element of X_t?

If it is, then in the code you are not applying the same loss function as in the paper.

@kumkee
Copy link
Collaborator

kumkee commented Dec 18, 2017

is it correct to say that y_t is the inverse of the one-to-last (time-wise) element of X_t?

Only for the last column of V_t (one of the three matrices in X_t), yes.

Maybe the confusion here is that in (1) 'now' is t-1, but in X 'now' is t.

@ziofil
Copy link
Author

ziofil commented Dec 18, 2017

Maybe the confusion here is that in (1) now is t-1, but in X now is t.

Oh, this is very relevant with the whole issue. So, to clarify: if the actual wall clock time is t, we can have pricing information up to time t. We build the price tensor X_t, and we feed it to the agent, together with the previous portfolio vector w_{t-1}. Then, to evaluate the loss, which y do we use (in terms of v_{t±1})?

@kumkee
Copy link
Collaborator

kumkee commented Dec 18, 2017

if the actual wall clock time is t, we can have pricing information up to time t. We build the price tensor X_t, and we feed it to the agent, together with the previous portfolio vector w_{t-1}.

Yes.

Then, to evaluate the loss, which y do we use?

For the Reward/Loss function of the decision w_t at time t, we need y_{t+1}.

@ziofil
Copy link
Author

ziofil commented Dec 18, 2017

For the Reward/Loss function of the decision w_t at time t, we need y_{t+1}.

Ok, so what was confusing me is that in the paper the loss contains y_t, w_{t-1} and mu_t (so I thought that w_t enters this loss only because you need it compute mu_t).
In summary: the loss for the decision w_t taken at time t should be computed as the negative log rate of return mu_t(w_t \cdot y_{t+1}), where mu_t is a function of w_{t-1}, y_t and w_t, is this all correct?

@kumkee
Copy link
Collaborator

kumkee commented Dec 18, 2017

In summary: the loss for the decision w_t taken at time t should be computed as the negative log rate of return mu_t(w_t \cdot y_{t+1}), where mu_t is a function of w_{t-1}, y_t and w_t, is this all correct?

All correct. Thank you for your careful reading of our poorly written paper.
We will improve it with this particular confusion in mind for the next version.

@kumkee
Copy link
Collaborator

kumkee commented Dec 18, 2017

Thank you @ziofil for your questions raised. They lead us to our discovery of this bug #25

@ziofil
Copy link
Author

ziofil commented Dec 18, 2017

Thank you for explaining! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants