Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default Hazard.event_name to string of event_id #795

Merged
merged 12 commits into from Nov 20, 2023

Conversation

sarah-hlsn
Copy link
Collaborator

@sarah-hlsn sarah-hlsn commented Oct 23, 2023

Changes proposed in this PR:

  • changed default Hazard.event_name to string representation of event_id
  • changed Hazard test_base_xarray.py accordingly

This PR fixes #789

PR Author Checklist

PR Reviewer Checklist

Change event_name default for Hazard.from_xarray_raster to string of event id.
Change variable description in class method.
adapted event_name in test_assert_default_values and deleted event_name tests from test_missing_dims since the time dimension does not affect event_name anymore
@@ -840,7 +840,7 @@ def maybe_repeat(values: np.ndarray, times: int) -> np.ndarray:
None,
np.ones(num_events),
np.array(range(num_events), dtype=int) + 1,
list(data[coords["event"]].values),
[str(x) for x in np.array(range(num_events), dtype=int) + 1],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does actually not behave as expected: If the dataset supplies custom event IDs (like 100...100+X), this will still set the event name to values 1...X. However, I don't see a way of inserting the event ID data as default value here. @chahank, what do you think?

Copy link
Member

@chahank chahank Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, but I think it does not matter that much. I was actually expecting that this PR would change the Hazard.init to make a good default instead of just the read xarray datasets method. But, this might be a larger endeavor.

Also, you do not need to cast the generator into an array (this is actually very sub-optimal, both memory-wise, speed-wise and style-wise). Also, range already provides int type. Keeping the original values, this would be it:

Suggested change
[str(x) for x in np.array(range(num_events), dtype=int) + 1],
[str(x) for x in range(1, num_events+ 1)],

Another side-note to @peanutfun : why do you use a list here instead of a dictionary? How do you keep track of the correct ordering? Is there a particular reason to use a list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chahank We could indeed adjust the Hazard.__init__ to handle that. I think this would be the cleanest solution. However, changing the default initialization of this class might have some repercussions elsewhere.

why do you use a list here instead of a dictionary?

Not sure I understand. The whole thing is a dictionary which is inserted into a dataframe. This transforms the dictionary keys into column names and dictionary values into the column series. Order is preserved. I agree that it might be a bit less confusing if we used DataFrame.from_records, where the dataframe can be initialized row-wise. But I recall you insisted that I create a separate variable DEF_DATA_VARS instead of hardcoding the data variables here, so I thought this was the cleanest approach.

Copy link
Member

@chahank chahank Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chahank We could indeed adjust the Hazard.init to handle that. I think this would be the cleanest solution. However, changing the default initialization of this class might have some repercussions elsewhere.

Agreed. This will come later when we change the hazard attributes to a better data format I suppose.

Not sure I understand.

Sorry that I was unclear! What I meant is why is the default_var variable a list (because then the ordering matters in that list, which is not directly obvious)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This will come later when we change the hazard attributes to a better data format I suppose.

Then we need an alternative. I would not like to have event_id set to 100..100+X and event_name to 1..1+X. My proposal is then to simply leave the event name empty, and let the Hazard initializer handle these cases in the future.

What I meant is why is the default_var variable a list (because then the ordering matters in that list, which is not directly obvious)?

Still unclear. Do you mean default_value? Yes, of course ordering matters, because the list relates to the DEF_DATA_VARS. Again, I can change initialization to row-wise, that would probably make it less confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we need an alternative. I would not like to have event_id set to 100..100+X and event_name to 1..1+X. My proposal is then to simply leave the event name empty, and let the Hazard initializer handle these cases in the future.

I think having an empty hazard event is not an option, this would break some functionalities. At the moment, the initialization does NOT handle empty event_names properly, so we would have to wait until we fix that (or do it now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still unclear. Do you mean default_value? Yes, of course ordering matters, because the list relates to the DEF_DATA_VARS. Again, I can change initialization to row-wise, that would probably make it less confusing.

Exactly, the ordering of the values for default_value at that point in the code is not obvious. But it is also not a big deal. Maybe just a comment referring to DEF_DATA_VARS for the ordering/content?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having an empty hazard event is not an option, this would break some functionalities. At the moment, the initialization does NOT handle empty event_names properly, so we would have to wait until we fix that (or do it now).

Okay, then I suggest the following: We create a follow-up issue to update the default initialization of Hazard. For now, we keep the functionality of from_xarray_raster as is, i.e. event_name stores the date. We just make sure that the event_name is indeed a string (because we should always be able to write those). We can do that by calling dt.strftime on the array and storing the result.

Maybe just a comment referring to DEF_DATA_VARS for the ordering/content?

Will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then I suggest the following: We create a follow-up issue to update the default initialization of Hazard. For now, we keep the functionality of from_xarray_raster as is, i.e. event_name stores the date. We just make sure that the event_name is indeed a string (because we should always be able to write those). We can do that by calling dt.strftime on the array and storing the result.

Good idea! Let us do this then. @sarah-hlsn can you make the event_name to be the string version of the date as suggested by @peanutfun ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do!

climada/hazard/test/test_base_xarray.py Outdated Show resolved Hide resolved
climada/hazard/test/test_base_xarray.py Outdated Show resolved Hide resolved
@peanutfun
Copy link
Member

@sarah-hlsn This is coming along very nicely! Could you add an explicit check for event_name in test_2D_time? I am not sure if calling list on the DataArray will always result in a "flat" list. Maybe we have to create the list from <arr>.values.flat.

@chahank
Copy link
Member

chahank commented Nov 17, 2023

@sarah-hlsn : could you then please update the changelog too? Then we can soon merge this bugfix.

@sarah-hlsn
Copy link
Collaborator Author

@chahank Yes will do, I am currently working on the additional test suggested by @peanutfun but I may need some more time because it is not working as I expected

@peanutfun
Copy link
Member

@sarah-hlsn Let me know if you need any help

CHANGELOG.md Outdated
@@ -14,6 +14,7 @@ Code freeze date: YYYY-MM-DD

### Changed

- Change default event name to string of event date when importing Hazard data from xarray [#795](https://github.com/CLIMADA-project/climada_python/pull/795)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add under fixed the issue that was fixed with this pull request please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just mentioning this under Fixed is enough

@chahank
Copy link
Member

chahank commented Nov 17, 2023

Looks good to me. @peanutfun merge?

Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good @sarah-hlsn! 🙌 I only have a few stylistic suggestions.

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated
@@ -14,6 +14,7 @@ Code freeze date: YYYY-MM-DD

### Changed

- Change default event name to string of event date when importing Hazard data from xarray [#795](https://github.com/CLIMADA-project/climada_python/pull/795)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just mentioning this under Fixed is enough

CHANGELOG.md Outdated Show resolved Hide resolved
climada/hazard/test/test_base_xarray.py Outdated Show resolved Hide resolved
climada/hazard/test/test_base_xarray.py Outdated Show resolved Hide resolved
@peanutfun
Copy link
Member

@chahank, would you go trough my suggestions and merge afterwards?

chahank and others added 2 commits November 20, 2023 10:01
Co-authored-by: Lukas Riedel <34276446+peanutfun@users.noreply.github.com>
@chahank chahank merged commit 05ee576 into develop Nov 20, 2023
3 of 5 checks passed
@chahank
Copy link
Member

chahank commented Nov 20, 2023

Thanks @sarah-hlsn and @peanutfun !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants