Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

untangle scaling of xx by weights from setting xx to zero #677

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

antnguyen13
Copy link
Collaborator

What changes does this PR introduce?

Separate scaling xx by weights from where xx is set to zero

What is the current behaviour?

Currently, the determination of where xx is being set to zero is tied to only when the logical doscaling is true. This flag should only be used to check if xx should be scaled by the weight or not, not whether xx should be non-zero or zero.

What is the new behaviour

This fix now untangle these two behaviors: whether doscaling is true or false does not determine whether xx is being set to zero or nonzero. xx is scaled by weights when doscaling is .true., as intended. Outside of this doscaling loop, xx will be determined nonzero if the weight is nonzero, else it is set to zero, as intended.

Does this PR introduce a breaking change?

This potentially can break any experiment where the a user has the following combination of setting: (1) control xx_genarr[2d,3d] that were non-zeros outside of landmask, (2) chose to have xx being zero by their weights=0. and (3) chose doscaling=.false. . The way it should have worked is that the user should get xx=zero where their weights are zero (satisfy (2) above). However, due to the bug, by setting (3) , they skipped the code determining where xx should be zero, and thus they'd get nonzero xx where weights=0. If they then read their xx adjustments to force their model, the trajectory of the model will differ now than before.

Other information:

This behavior was fixed by An Nguyen and Arash Bigdeli for ctrl_map_ini_gentim2d.F in 2017 for xx_gentim2d, and the same fix is now applied to xx_genarr[2d,3d].

Suggested addition to tag-index

Separate scaling of xx by weights from where xx is set to zero

(To avoid unnecessary merge conflicts, please don't update tag-index. One of the admins will do that when merging your pull request.)

@timothyas
Copy link
Member

timothyas commented Dec 9, 2022

Hi @antnguyen13, I'm not sure I agree that this is a bug. It was my understanding that the entire purpose of the doscaling option is to turn on the application of the weights or not. So if you are using weights to specify whether a field should be "on/off" at a given location using the weights, then you would want to just use weights of 1's or 0's to do that.

Basically, what is the use-case where we want to provide a weight file, but not use the values of those weights (other than >0)? Thanks!

@antnguyen13
Copy link
Collaborator Author

antnguyen13 commented Dec 9, 2022

@timothyas I think the main problem is perhaps that the ctrl pkg has been using "weights" as mask as well . The definition of doscaling is whether you will scale by the weights or not, not whether you will set xx to zero or not, would you agree? By setting the xx=0 only inside doscaling loop, you get a different mask than when doscaling=.false. I think a user can often have a weight file, and they should be able to choose at run-time to switch docaling on or off to simply have their xx being scaled by weight or not, independent of where suddenly their xx would be set to zero, don't you think?

@timothyas
Copy link
Member

I see what you're saying. My thinking is that by requiring the user to specify doscaling=.True. we are requiring them to come to terms with the fact that the weights are being used as masks. To me that seems more consistent, but I'm open to a counterargument. If we wanted to totally separate the weights from acting like masks, then I think this should be provided as a totally separate option/file. I know that's more complicated to code, though.

I'm also realizing that the code is somewhat inconsistent about whether a weight file is provided or not. I had an additional concern that there could be a case where a weight file is not provided, but now suddenly it is expected to be used as a mask in this way. But now that I'm looking through the code, CTRL_MAP_GENARR[2/3]D already expect the weights file to be there, so that's not an additional issue.

@antnguyen13
Copy link
Collaborator Author

I see what you're saying. My thinking is that by requiring the user to specify doscaling=.True. we are requiring them to come to terms with the fact that the weights are being used as masks.

I think that's a very hidden assumption, as the variable itself is simply implying whether you're scaling your controls or not. These kinds of assumptions force the user to know more than what is intended i think, and therefore can cause different unintended results. The way "weight" is defined, it should be applied in the cost function right? so if you have zero weights, it implies for that grid cell it is not contributing to the cost (e.g., xx^2 * (weight=0) = zero contrib), but it does not imply the adjustment there must be zero. I know this sounds strange perhaps, because why would someone want zero weights there but still want an adjustment?! But by assuming the user should know that setting weights == setting mask, this now causes a run with or without scaling to have different results due to a flag that does not reflect (tell the user of) the masks being applied are now different.

The updates of controls before the timestep (e.g., adding to init T, S, or atmospheric forcing) are done using the "effective" files, not the controls (weighted or unweighted). This means right now, the user does not have a mean to change the "mask" unless they change their weights AND turn "doscaling" on. If at some point during their optimization run they want to turn doscaling off (e.g., they've absorbed xx into their controls and now re-init xx to zeros), they'd need to know ahead of time they must edit their controls somehow (actually right now we don't have a mean for them to do so right?) so that the unscaled xx would have the same mask as set by the weights. In fact , i think the way it is now, the user would run 1 iter, get xx, must know to edit offline to set to zero where their original weight=0 (mask) was, then run next iter. That's a lot to ask for new users I think.

I do think the cleanest ways to do this would be to tell the users to also have a mask, independent of where weights are, such that whether you use doscaling or not you're using the same mask (set xx = zero or not) , independent of where weights are set.. The mask would be defaulted to the model's mask, then set using weights (a required field) if a mask is not provided. There can be an additional check and print warning to the users if their weight=0 is inconsistent with mask, likely just turn off (!!) that control until the user get the consistency. This way the user is actively knowing what they're setting, instead of relying on word of mouth about what "weight" means. Then the check for setting zero/non-zero xx is done via mask, while the scaling is done based strictly on doscaling flag...

@jm-c jm-c added the adjoint Affects the adjoint model; label triggers full OpenAD test label Dec 9, 2022
@timothyas
Copy link
Member

Hmm, sorry An I don't mean to give you a hard time for such a relatively small change (in terms of line numbers) but I see the scaling operation a bit differently. But it is definitely good to have this discussion because there is no documentation regarding what happens when a weight file has zero values in grid cells, and as you mentioned this is already implemented in ctrl_map_ini_gentim2d and so it's good to have them be consistent.

I have always viewed the "weights = 0" as part of the scaling operation, where the if/else branch is just a handy way to deal with the division operator. For instance, if we passed standard deviation or variance values instead of weights, then we wouldn't even need the if/else branch and I don't think this would be a question. From that perspective, it seems very clear that if you want a weight file to act like a mask, then just pass 1's and 0's where you want to apply the mask. Additionally, it would seem odd to only use the 0 valued grid cells from the weights file when the user has set noscaling=.TRUE. (i.e., don't do the doscaling). We are then only using part of the file, where I would view it as either apply it completely or don't at all.

Secondly, while the ctrl package is definitely tricky, I don't think it is too much to ask a user to change a weight file when switching from performing an optimization experiment to a sensitivity analysis experiment (for instance). After all they are conducting completely different experiments, and I don't think it's too much to ask for different inputs for different functionality.

But no matter what, I think that the key is to make sure that whatever the behavior is, it is documented. Right now, there is no documentation regarding what happens during the scaling operation for when a grid cell value is zero. The docs simply state that

If noscaling is false, the control adjustment is scaled by one on the square root of the weight before being added to the base control variable; if noscaling is true, the control is multiplied by the weight in the cost function itself.

which, the latter part could be confusing to anyone new to the code - even I had to look at ctrl_cost_driver and ctrl_cost_gen to refresh myself on what that truly implies!

@antnguyen13
Copy link
Collaborator Author

@timothyas I think the way the code has been (and is now with my mod), we set where xx = 0 differently depending on whether we use doscaling=.true. or .false. . So, one way i can at least make it doing the same thing that is more simple that dealing with the question of what "doscaling" means is to create two separate loops: the first one simply check where weights = 0, and set xx = 0. Then we enter the loop for doscaling , and if weights is nonzero, we scale xx. this way we do not need to re-set xx=0 again given it has gone through the first loop.

That's at least a clean way i can think of to tell the user they're using "weights" to set the mask, but outside of conditional doscaling.

@mjlosch
Copy link
Member

mjlosch commented Dec 13, 2022

I tried to understand, what the doscaling parameter does (by default doscaling=T, if you set xx_gen_preproc as 'no scaling', the default is changed to doscaling=F). Correct me if I am wrong:

if (doscaling): 
   xx_scaled = xx/sqrt(w)
   new_var = var + xx_scaled
else: 
   new_var = var + xx

if (dodimensionalcost=.not. doscaling): cf_xx = xx**2 / w
else: cf_xx = xx_scaled*2

so that in both cases cf_xx = xx**2/w. If the "weight" w contains zero, this still holds. I have these observations:

  1. the name "weight" is not intuitive to me, as that implies to me xx**2 * w and not divided by w, but OK
  2. scaling the control variable (and the gradient vector consistently) makes sense to me, if you want to improve the an ill-conditioned problem, and here this is done in a way that relies on TAF to sort it out properly. E.g. if w>1, xx will become smaller and in the AD code adxx is also scaled like this adxx/sqrt(w), which reduces the gradient. Is this what we want?
  3. if w contains zeros in "wet" cells, then doscaling=T (default!) implies that var + xx_scaled leads to a different solution, which is what this PR addresses.

Before, xx was not modified if doscaling=F. Now xx is masked by zero values of w, but not scaled if doscaling=F. The behaviour for doscaling=T does not change (scaling and masking).

Am I getting this right? If so, I think that this PR makes the situation less intuitive than before, because now w is always a "mask". For the default case, xx is scaled by w (and implicitly masked where w=0). When 'noscaling' is specified in the namelist file, w is still used to modify the control variable xx. This would be the same as specifying a mask in w with 0, 1 and doscaling=T, but now we can no longer apply the entire (unmasked) xx and at the same time only constrain part of it in the cost function. If I am getting this right, I am not in favour of this PR. Instead of making things clearer ("disentangle"), this PR deprives us of a (small and maybe insignificant) feature.

Needless to say, this "scaling/weighting" business should be documented.

@jrscott jrscott added the work in progress Should not be merged until this label is removed label Feb 9, 2023
@mjlosch mjlosch mentioned this pull request May 2, 2023
2 tasks
@jm-c jm-c added wontfix This will not be worked on and removed work in progress Should not be merged until this label is removed labels Dec 13, 2023
@jm-c
Copy link
Member

jm-c commented Dec 13, 2023

I changed the label to reflect more accurately the status of this PR: There are good points raised and changes we want to keep, at least for now, as reference but the overall idea of these changes is not the way we want to go (and if this change, a new PR would be a better place to start).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adjoint Affects the adjoint model; label triggers full OpenAD test wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants