Question about reward function and `__pack_samples` #22

ziofil · 2017-12-17T00:07:52Z

I'm having trouble reconciling what I read in the paper and what I read in the code.

The reward function in a single period in the paper (Eq. (10)) is \log(\mu_t y_t \cdot w_{t-1}). But in the code, it seems that the reward is instead \log(mu_t y_{t+1} \cdot w_{t}). Am I correct?

Because __pack_samples (in datamatrices.py) makes the price tensor X using M[..., :-1] and the relative price vector y using M[...,-1]/M[...,-2], so y is one period ahead of X.

The text was updated successfully, but these errors were encountered:

dexhunter · 2017-12-17T04:55:20Z

y is the future price, including y into X would introduce look-ahead bias (edit: which is avoided in the framework). The price history of every mini-batch is normalized by the last period of that mini-batch.

The reward function in a single period in the paper (Eq. (10)) is \log(\mu_t y_t \cdot w_{t-1}). But in the code, it seems that the reward is instead \log(mu_t y_{t+1} \cdot w_{t}). Am I correct?

y_t means the future price of this period, and y_{t+1} means future price of next period.

ziofil · 2017-12-17T17:45:58Z

Hang on a sec. In the paper, Eq. (1) says that y_t = v_t/v_{t-1} is the element-wise division of the closing and opening prices of period t: which means it describes the evolution of prices during the period t, not the "future price".

In the definition of X_t (below Eq. (18)), the paper says that the one-to-last element of V_t is v_{t-1}/v_t, which is the inverse of y_t, so y_t is "included" into X_t.

There seems to be a discrepancy between the code and the paper.

dexhunter · 2017-12-18T04:29:54Z

In the definition of X_t (below Eq. (18)), the paper says that the one-to-last element of V_t is v_{t-1}/v_t, which is the inverse of y_t, so y_t is "included" into X_t.

From my understanding, t is not the same in two equations (1&18). t is more like a variable rather than constant. Is that what you are confused about?

y is price relative vector for two adjacent price vector(v_t & v_{t-1}) while Eq. (18) states the normalization of the price in the training set. You can see the normalization code for y and for X

dexhunter · 2017-12-18T06:24:10Z

The reward function in a single period in the paper (Eq. (10)) is \log(\mu_t y_t \cdot w_{t-1}). But in the code, it seems that the reward is instead \log(mu_t y_{t+1} \cdot w_{t}). Am I correct?

Or are you confused about commission fee calculation? The input_num in code is not time, but batch_size. So t in mu_t is in the same in y_t

ziofil · 2017-12-18T08:52:57Z

From my understanding, t is not the same in two equations (1&18). t is more like a variable rather than constant. Is that what you are confused about?

Well, if t is not the same then yes, it's definitely extremely confusing, because v_t in (1) and v_t in (below 18) are the same symbol and should mean the same thing! 😆

Or are you confused about commission fee calculation? The input_num in code is not time, but batch_size. So t in mu_t is in the same in y_t

No, the commission fee calculation is fine. I am trying to understand what exact periods you are using for X, w and mu in the loss function, and whether or not you are applying the definitions of the paper.

[Feedback for the paper]
If it is how you say, using the same symbol for two different things is a very confusing oversight.
As an author of scientific papers myself, I can assure you that this issue is nearly impossible to find by a referee. If you still have time to fix it before publication (out of curiosity, are you an author?), please do.

kumkee · 2017-12-18T09:05:57Z

Well, if t is not the same then yes, it's definitely extremely confusing, because v_t in (1) and v_t in (below 18) are the same symbol and should mean the same thing!

Yes, both t's in (1) and (18) mean the same thing: time, and both v_t's mean the same thing: the prices of all assets at time t.

update: please refer to Figure 1 in the paper, that should help clarify these messes.

ziofil · 2017-12-18T09:28:32Z

Yes, both t's in (1) and (18) mean the same thing: time, and both v_t's mean the same thing: the price of the asset at t.

Good! Then is it correct to say that y_t is the inverse of the one-to-last (time-wise) element of X_t?

If it is, then in the code you are not applying the same loss function as in the paper.

kumkee · 2017-12-18T09:38:08Z

is it correct to say that y_t is the inverse of the one-to-last (time-wise) element of X_t?

Only for the last column of V_t (one of the three matrices in X_t), yes.

Maybe the confusion here is that in (1) 'now' is t-1, but in X 'now' is t.

ziofil · 2017-12-18T09:45:41Z

Maybe the confusion here is that in (1) now is t-1, but in X now is t.

Oh, this is very relevant with the whole issue. So, to clarify: if the actual wall clock time is t, we can have pricing information up to time t. We build the price tensor X_t, and we feed it to the agent, together with the previous portfolio vector w_{t-1}. Then, to evaluate the loss, which y do we use (in terms of v_{t±1})?

kumkee · 2017-12-18T10:25:40Z

if the actual wall clock time is t, we can have pricing information up to time t. We build the price tensor X_t, and we feed it to the agent, together with the previous portfolio vector w_{t-1}.

Yes.

Then, to evaluate the loss, which y do we use?

For the Reward/Loss function of the decision w_t at time t, we need y_{t+1}.

ziofil · 2017-12-18T10:55:50Z

For the Reward/Loss function of the decision w_t at time t, we need y_{t+1}.

Ok, so what was confusing me is that in the paper the loss contains y_t, w_{t-1} and mu_t (so I thought that w_t enters this loss only because you need it compute mu_t).
In summary: the loss for the decision w_t taken at time t should be computed as the negative log rate of return mu_t(w_t \cdot y_{t+1}), where mu_t is a function of w_{t-1}, y_t and w_t, is this all correct?

kumkee · 2017-12-18T11:08:33Z

In summary: the loss for the decision w_t taken at time t should be computed as the negative log rate of return mu_t(w_t \cdot y_{t+1}), where mu_t is a function of w_{t-1}, y_t and w_t, is this all correct?

All correct. Thank you for your careful reading of our poorly written paper.
We will improve it with this particular confusion in mind for the next version.

kumkee · 2017-12-18T12:44:45Z

Thank you @ziofil for your questions raised. They lead us to our discovery of this bug #25

ziofil · 2017-12-18T12:54:33Z

Thank you for explaining! 🙂

kumkee mentioned this issue Dec 18, 2017

Bug in implementation of mu #25

Closed

ziofil closed this as completed Dec 18, 2017

dexhunter mentioned this issue Jan 13, 2018

Price norm_method #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about reward function and `__pack_samples` #22

Question about reward function and `__pack_samples` #22

ziofil commented Dec 17, 2017 •

edited

dexhunter commented Dec 17, 2017 •

edited

ziofil commented Dec 17, 2017 •

edited

dexhunter commented Dec 18, 2017 •

edited

dexhunter commented Dec 18, 2017

ziofil commented Dec 18, 2017 •

edited

kumkee commented Dec 18, 2017 •

edited

ziofil commented Dec 18, 2017 •

edited

kumkee commented Dec 18, 2017 •

edited

ziofil commented Dec 18, 2017 •

edited

kumkee commented Dec 18, 2017

ziofil commented Dec 18, 2017 •

edited

kumkee commented Dec 18, 2017

kumkee commented Dec 18, 2017

ziofil commented Dec 18, 2017

Question about reward function and __pack_samples #22

Question about reward function and __pack_samples #22

Comments

ziofil commented Dec 17, 2017 • edited

dexhunter commented Dec 17, 2017 • edited

ziofil commented Dec 17, 2017 • edited

dexhunter commented Dec 18, 2017 • edited

dexhunter commented Dec 18, 2017

ziofil commented Dec 18, 2017 • edited

kumkee commented Dec 18, 2017 • edited

ziofil commented Dec 18, 2017 • edited

kumkee commented Dec 18, 2017 • edited

ziofil commented Dec 18, 2017 • edited

kumkee commented Dec 18, 2017

ziofil commented Dec 18, 2017 • edited

kumkee commented Dec 18, 2017

kumkee commented Dec 18, 2017

ziofil commented Dec 18, 2017

Question about reward function and `__pack_samples` #22

Question about reward function and `__pack_samples` #22

ziofil commented Dec 17, 2017 •

edited

dexhunter commented Dec 17, 2017 •

edited

ziofil commented Dec 17, 2017 •

edited

dexhunter commented Dec 18, 2017 •

edited

ziofil commented Dec 18, 2017 •

edited

kumkee commented Dec 18, 2017 •

edited

ziofil commented Dec 18, 2017 •

edited

kumkee commented Dec 18, 2017 •

edited

ziofil commented Dec 18, 2017 •

edited

ziofil commented Dec 18, 2017 •

edited