### Use pyhomogenize to check netCDF file(s) time axis; `time_control`

Now, we want to use pyhomogenize's `time_control` class. We open a test netCDF file. This will be done automatically by calling the class.

In [1]:
import pyhomogenize as pyh

In [2]:
time_control = pyh.time_control(pyh.test_netcdf[0])
time_control.ds

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,1.33 MiB
Shape,"(412, 424)","(412, 424)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.33 MiB 1.33 MiB Shape (412, 424) (412, 424) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",424  412,

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,1.33 MiB
Shape,"(412, 424)","(412, 424)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,1.33 MiB
Shape,"(412, 424)","(412, 424)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.33 MiB 1.33 MiB Shape (412, 424) (412, 424) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",424  412,

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,1.33 MiB
Shape,"(412, 424)","(412, 424)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,112 B,16 B
Shape,"(7, 2)","(1, 2)"
Count,15 Tasks,7 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 112 B 16 B Shape (7, 2) (1, 2) Count 15 Tasks 7 Chunks Type object numpy.ndarray",2  7,

Unnamed: 0,Array,Chunk
Bytes,112 B,16 B
Shape,"(7, 2)","(1, 2)"
Count,15 Tasks,7 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.33 MiB,5.33 MiB
Shape,"(412, 424, 4)","(412, 424, 4)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 5.33 MiB 5.33 MiB Shape (412, 424, 4) (412, 424, 4) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  424  412,

Unnamed: 0,Array,Chunk
Bytes,5.33 MiB,5.33 MiB
Shape,"(412, 424, 4)","(412, 424, 4)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.33 MiB,5.33 MiB
Shape,"(412, 424, 4)","(412, 424, 4)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 5.33 MiB 5.33 MiB Shape (412, 424, 4) (412, 424, 4) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  424  412,

Unnamed: 0,Array,Chunk
Bytes,5.33 MiB,5.33 MiB
Shape,"(412, 424, 4)","(412, 424, 4)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.66 MiB,682.38 kiB
Shape,"(7, 412, 424)","(1, 412, 424)"
Count,15 Tasks,7 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 4.66 MiB 682.38 kiB Shape (7, 412, 424) (1, 412, 424) Count 15 Tasks 7 Chunks Type float32 numpy.ndarray",424  412  7,

Unnamed: 0,Array,Chunk
Bytes,4.66 MiB,682.38 kiB
Shape,"(7, 412, 424)","(1, 412, 424)"
Count,15 Tasks,7 Chunks
Type,float32,numpy.ndarray


Let's have a look on the datasets's time axis

In [3]:
time_control.time

CFTimeIndex([2007-01-16 12:00:00, 2007-02-15 00:00:00, 2007-03-16 12:00:00,
             2007-04-16 00:00:00, 2007-05-16 12:00:00, 2007-06-16 00:00:00,
             2007-07-16 12:00:00],
            dtype='object', length=7, calendar='noleap', freq='None')

We can check whether the time axis contains duplicated, missing or redundant time steps. A redundant time step is a time steps that does not math with the dataset's calendar and/or frequency.

In [4]:
duplicates = time_control.get_duplicates()
redundants = time_control.get_redundants()
missings   = time_control.get_missings()

In [5]:
duplicates, redundants, missings

('', '', '')

We see the time axis doesn't contain any incorrect time steps and no time steps are missing. Not really a auspicious example. We can combine the three above requests by using the function `check_timestamps`.

In [6]:
timechecker1 = time_control.check_timestamps()
timechecker1

<pyhomogenize._time_control.time_control at 0x7f7dca066790>

As we can see the functions returns a `time_control` object again but with three new attributes.

In [7]:
timechecker1.duplicated_timesteps, timechecker1.missing_timesteps, timechecker1.redundant_timesteps

({'tas': ''}, {'tas': ''}, {'tas': ''})

We want to test the time axis only for duplicated time steps.

timechecker2 = time_control.check_timestamps(selection='duplicates')
timechecker2.duplicated_timesteps

By setting the parameter correct to the boolean value `True` we can delete the duplicated and redundant time steps if exisitng. Of course, in our great example this is not the case.

In [8]:
timechecker3 = time_control.check_timestamps(correct=True)
timechecker3.time

CFTimeIndex([2007-01-16 12:00:00, 2007-02-15 00:00:00, 2007-03-16 12:00:00,
             2007-04-16 00:00:00, 2007-05-16 12:00:00, 2007-06-16 00:00:00,
             2007-07-16 12:00:00],
            dtype='object', length=7, calendar='noleap', freq='None')

We can set the parameter `output` to select the dataset's output file name on disk. If so the parameter `correct` is automatically set to `True`.

In [9]:
timechecker4 = time_control.check_timestamps(output='output.nc')

Now, we want to sleect a specific time range. We copy out `time_control` object to keep the original object. 

In [10]:
from copy import copy
time_control1 = copy(time_control)
selected1 = time_control1.select_time_range(['2007-02-01','2007-03-30'])
selected1

<pyhomogenize._time_control.time_control at 0x7f7dc97fab80>

Here again, we get a `time_control` object. But now with a different time axis.

In [11]:
selected1.time

CFTimeIndex([2007-02-15 00:00:00, 2007-03-16 12:00:00],
            dtype='object', length=2, calendar='noleap', freq=None)

Of course, we can write the result as netCDF file on disk. 

In [12]:
time_control2 = copy(time_control)
selected2 = time_control2.select_time_range(['2007-02-01','2007-03-30'], 
                                            output='output.nc')
selected2.ds

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,1.33 MiB
Shape,"(412, 424)","(412, 424)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.33 MiB 1.33 MiB Shape (412, 424) (412, 424) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",424  412,

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,1.33 MiB
Shape,"(412, 424)","(412, 424)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,1.33 MiB
Shape,"(412, 424)","(412, 424)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.33 MiB 1.33 MiB Shape (412, 424) (412, 424) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",424  412,

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,1.33 MiB
Shape,"(412, 424)","(412, 424)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32 B,16 B
Shape,"(2, 2)","(1, 2)"
Count,17 Tasks,2 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 32 B 16 B Shape (2, 2) (1, 2) Count 17 Tasks 2 Chunks Type object numpy.ndarray",2  2,

Unnamed: 0,Array,Chunk
Bytes,32 B,16 B
Shape,"(2, 2)","(1, 2)"
Count,17 Tasks,2 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.33 MiB,5.33 MiB
Shape,"(412, 424, 4)","(412, 424, 4)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 5.33 MiB 5.33 MiB Shape (412, 424, 4) (412, 424, 4) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  424  412,

Unnamed: 0,Array,Chunk
Bytes,5.33 MiB,5.33 MiB
Shape,"(412, 424, 4)","(412, 424, 4)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.33 MiB,5.33 MiB
Shape,"(412, 424, 4)","(412, 424, 4)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 5.33 MiB 5.33 MiB Shape (412, 424, 4) (412, 424, 4) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  424  412,

Unnamed: 0,Array,Chunk
Bytes,5.33 MiB,5.33 MiB
Shape,"(412, 424, 4)","(412, 424, 4)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,682.38 kiB
Shape,"(2, 412, 424)","(1, 412, 424)"
Count,17 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 1.33 MiB 682.38 kiB Shape (2, 412, 424) (1, 412, 424) Count 17 Tasks 2 Chunks Type float32 numpy.ndarray",424  412  2,

Unnamed: 0,Array,Chunk
Bytes,1.33 MiB,682.38 kiB
Shape,"(2, 412, 424)","(1, 412, 424)"
Count,17 Tasks,2 Chunks
Type,float32,numpy.ndarray


If we want to crop or limit the time axis to a user-specified start and end month values as shown in the above example `basics.date_range_to_frequency_limits` we can do this with netCDF files as well. The time axis should start with the start of an arbitrary season and end with the end of an arbitrary season.

In [13]:
time_control3 = copy(time_control)
selected3 = time_control3.select_limited_time_range(smonth=[3,6,9,12], 
                                                    emonth=[2,5,8,11], 
                                                    output='output.nc')
selected3.time

CFTimeIndex([2007-03-16 12:00:00, 2007-04-16 00:00:00, 2007-05-16 12:00:00],
            dtype='object', length=3, calendar='noleap', freq='732H')

Now, we want to check whether the time axis is within certain left and right bounds.

In [14]:
time_control.within_time_range(['2007-02-01','2007-03-30'])

True

In [15]:
time_control.within_time_range(['2007-02-01','2008-03-30'])

False

In [16]:
time_control.within_time_range(['20070201','20070330'], fmt='%Y%m%d')

True