New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eqw 2 #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I'm not sure I fully understand the tests but the code change is sufficiently simple that I'll leave it up to you how thoroughly you want to test it.
Assuming notebooks etc. come next?
Database cruncher which uses the 'equal quantile walk' technique. | ||
|
||
This cruncher assumes that the amount of effort going into reducing one emission set | ||
is equal to that for another emission, therefore the lead and follow data should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is equal to that for another emission, therefore the lead and follow data should be | |
is equal to that for another emission, therefore the lead and follow data should come from |
This cruncher assumes that the amount of effort going into reducing one emission set | ||
is equal to that for another emission, therefore the lead and follow data should be | ||
the same quantile of all pathways in the infiller database. | ||
It calculates what quantile the lead infillee data is in the lead infiller database, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It calculates what quantile the lead infillee data is in the lead infiller database, | |
It calculates the quantile of the lead infillee data is in the lead infiller database, |
lead_vals = lead_vals.sort_values() | ||
quant_of_lead_vals = np.arange(len(lead_vals)) / (len(lead_vals) - 1) | ||
if any(quant_of_lead_vals > 1) or any(quant_of_lead_vals < 0): | ||
raise NotImplementedError("Impossible quantiles!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError("Impossible quantiles!") | |
raise ValueError("Impossible quantiles!") |
Also very amusing +1
input_quantiles = scipy.interpolate.interp1d( | ||
lead_vals, quant_of_lead_vals, bounds_error=False, fill_value=(0, 1) | ||
)(lead_input) | ||
return np.nanquantile(follow_vals, input_quantiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return np.nanquantile(follow_vals, input_quantiles) | |
return np.nanquantile(follow_vals, input_quantiles, interpolation="linear") |
It's already what happens but just makes clear?
return self._db.filter(variable=variable_follower) | ||
|
||
def _find_same_quantile(self, follow_vals, lead_vals, lead_input): | ||
if len(lead_vals) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(lead_vals) == 1: | |
if len(follow_vals) == 1: |
Would this be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to avoid a singularity later, we could also short-circuit the calculation in the case of 1 follow but that's for computational reasons not essential ones. I guess it's better to take the mean afterwards in case the length of the two are different.
|
||
infilled = res(simple_df) | ||
# We compare the results with the expected results: for T1, we are below the | ||
# lower limit on the first, in the middle on the second. At later times we are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# lower limit on the first, in the middle on the second. At later times we are | |
# lower limit on the first, in the middle on the second scenario. At later times we are |
? Is there a way to make it slightly clearer in the test rather than having the hard-coded 50 and 100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's derived better now
|
||
def test_with_one_value_in_infiller_db(self, test_db, caplog): | ||
# The calculation is different with only one entry in the infiller db. We | ||
# expect a warning and the only value to be returned in all cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# expect a warning and the only value to be returned in all cases. | |
# |
No warning at the moment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Probably not going to add one now, they are annoying
) | ||
|
||
# Repeat with reducing the minimum value. This works differently because the | ||
# minimum point is doubled. By default the cruncher selects the higher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# minimum point is doubled. By default the cruncher selects the higher | |
# minimum point is doubled. This modification causes the cruncher to pick the lower value. |
Pull request
Please confirm that this pull request has done the following:
CHANGELOG.rst
addedAdding to CHANGELOG.rst
Please add a single line in the changelog notes similar to one of the following: