-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADs 360 day calendar conversion #32
Comments
If we change the align_on option to 'date' it should fix it for this dataset, but we might want to
|
Current @RuthBowyer workaround to remove 5 days (e.g. 31 of month); didn't notice duplicate days would be good to document. @aranas discussed producing script to reproduce error to double check |
@RuthBowyer Could you please specify which file specifically only runs up to february 27th? I am not sure I can reproduce your error.
so essentially the dates have been realigned in terms of absolute position, which means that a march file might only run til 29th and march 30th would appear in the april file. To me it seems this is not what we want because it leads to duplicate dates, eg 1980-03-30 appears both in tasmax_hadukgrid_uk_1km_day_19800301-19800330.nc but also in tasmax_hadukgrid_uk_1km_day_19800401-19800430.nc So my questions are:
|
commit (sry, I forgot to link issue) 62b86f5 |
Thanks so much Sophie! Just for my own clarity, where you say:
Does this mean the 1km hadsukgrid is being resampled to 30 days before it's being resampled to the 2.2km grid? My understanding was the other way round but not a problem either way! I was using a file a few steps down the pipeline but because I added a little bit of code to rename the layers (the input files I was using are in this series
In answer therefore to your Qs:
Yes this is confusing, and what I think confused me initially! But we could work with it if we just have 360 days in every year?
See answer above
Yep, in that I am using this data as the observation data for the bias correction, and have already done so. However, I don't think we need to worry very much about this - the dates will align enough for it to reflect accurately the seasonality of observations (ie the weather on 30th march is quite similar to 1st April) - but I think aligning as well as possible is a good aim going forwards. I'll proceed with BC three cities and UK wide data for now, but for comparing the methods across the 3 cities, it would be fairly straightforward to just rerun the assessments etc when we've decided on a final dataset for this issue :-)
It's a great Q - to me, whatever the default in the resampler would be fine, as whoever made it probably knows more about it than me (!). I guess because a year has >360 days my assumption would be days would just be shifted to accommodate this and then otherwise dropped (eg 1-2nd March becomes 29th-30th Feb) and then surplus '31st's just get dropped? We could ask the MO folks if they have a preference? |
Sorry, I just realised from talking to @gmingas where the 27 day thing I had mentioned was coming from @aranas
Has only 27 vals (ie only 27 days in Feb rather than 30). This doesn't effect my answers above however, and I guess as long as we're aligning across 360 days its ok? |
Also, relatedly, I believe the underlying raw HADs data containing NA cells, and therefore are grid cells in the CPM crops which don't have a value for the corresponding HADs crops. Relevant also for #34 and applying subsequent bias correction @gmingas I've just been subsetting them to match - although thinking about it this could actually be causing the issue observed in #33 , which would be great!! |
Also, if we're re-running this anyway, maybe we could rename the HADs output from 'rainfall' to 'pr' to match the naming convention of the CPM (it would make processing easier, for me at least!) |
Hi @RuthBowyer ,
temporal resampling happens indeed after spatial resampling, you're right! I meant the resampled versions of those files. |
I think I finally understand why we get the duplicates and I think this might be a bit of a bug in the convert_calendar xarray function when applying it to partial files. and this is coherent with the documentation because when converting
So what happens is that april 1st instead of being "dropped" (like it correctly happens for non-leap years, the date is simply renamed to the previous day (eg march 30th), and all other dates are shifted (eg april 2nd becomes april 1st) So if we simply delete the second instance, of the duplicate dates, we achieve the output as specified by the documentation, plus the calendars should align nicely. however, this can only be done once the files are recombined which happens at two distinct points due to the preprocessing happening in both R and python, so maybe @gmingas we can find the right spot for correcting this together during coworking on Monday. In the meantime I have also filed a bug with a minimal complete verifiable example over on the xarray repo, to confirm that this is truly a bug: pydata/xarray#8086 |
and just to confirm @RuthBowyer
Yes this is actually correct behavior of the convert_calendar call, because of the realignment and dropping of february 6th in non-leap years, the file ends up running from february 2nd to feburary 28th and the next file (march) will contain dates february 29th and february 30th, so after recombining all dates, we indeed end up with 360. The question is whether it is an issue that some dates are temporarily (before combining across months) associated with the "wrong" files (eg february dates occurring in march file). Also, I understand now that we don't need to interpolate data, because indeed in this case, we only drop dates & then realign as you suggested. The only problem to be solved then is to delete the duplicate dates. Which will be very similar to what you have done in your hack of removing random dates, just that we end up with a clean calendar, rather than keeping the duplicates. |
Thank you for this clarification @aranas ! This all sounds great. Just to clarify re interpolation, there is still a reason we might consider it:
(Ie the issue I thought was due to the shapefile was actual due to the obs data, as doc'd here: #33 ) - so we might want to interpolate the NAs - but probably i) this should be done after bias correction and ii) I should start a new issue for this to avoid confusing everyone! |
ah yes. Indeed it seems like opening a new issue would be the way to go. Then we can plan it into a sprint. |
@gmingas to sense check the script in https://github.com/alan-turing-institute/clim-recal/tree/bug_calendar |
Hi @gmingas @RuthBowyer I want to close this issue but I cannot seem to run the script all the way to the end 😭. When I ssh into VM there is no anaconda distribution and I cannot install cause no space left. I run it with the following specs: and then I use check_calendar.py to ensure all dates are there as intended |
I can give running it a go via the dyme vm! Will update with how I get on :-) |
Hey @aranas and @gmingas - This seems to have worked and I have ran for tasmax, tasmin and rainfall. To note:
To get the script to work, I had to install If the data looks OK, I think we can close this issue :) |
Thank you Ruth, should've handed this over earlier and happy that we can close it now :) The rioxarray is an extension for the xarray package that allows it to work with geospatial data. Looking at the documentation of rioxarray we could import it directly. If we would do that then we would also run the initialization code for that module and it might be that this would set up some configs necessary to perform rioxarray specific functionality. So I think we would need to dive deeper into the package to decide if we'd rather call rioxarray directly. benefits might be:
I guess the only Con is that the syntax might change slightly, creating more work for us at this point. For now, I have noted this down to make sure and mention the package in the documentation, so that users are aware of it being used implicitly. |
End result:
|
Moving to new issue as may have noticed an additional problem - this is the original thread and relevant comment:
I need to look up how 360 day calendars are sampled, but seems in the 360 day HADs calendars, Feb is only sampled up the 27th - (ie why not use all 28 days in Feb? Presumably there's a reason but wanted to note - it could be something to do with how I've pulled through the date names to the format Im working with the data in but can't work it out!
The text was updated successfully, but these errors were encountered: