Skip to content

Commit

Permalink
Merge pull request #55 from rcjackson/xarray_base
Browse files Browse the repository at this point in the history
FIX: Wrong id being placed in cell_mask.
  • Loading branch information
fsenf committed Jul 26, 2021
2 parents 50a99a8 + 8d974c3 commit fd9ce2f
Show file tree
Hide file tree
Showing 12 changed files with 66 additions and 12 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,18 @@ pip install tobac/

Contributing
------------

Please take a look at our CONTRIBUTING file.

License
-------
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
=======
The current development branch is v2.0-dev.

For more details on contributing, please see https://github.com/climate-processes/tobac/blob/v2.0-dev/CONTRIBUTING.md

Roadmap
------------
A roadmap for the future development of tobac is available here: https://github.com/fsenf/tobac-roadmap/blob/master/tobac-roadmap-main.md

34 changes: 31 additions & 3 deletions doc/data_input.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,35 @@
Data input and output
*Data input and output
======================
The output of the different analysis steps in tobac are output as either xarray Datasets in the case of one-dimensional data, such a lists of identified features or cloud trajectoies or as xarray DataArrays in the case of 2D/3D/4D fields such as cloud masks.

Input data for tobac should consist of one or more fields on a common, regular grid with a time dimension and two or more spatial dimensions. The input data should also include latitude and longitude coordinates, either as 1-d or 2-d variables depending on the grid used.

Interoperability with Iris and pandas is provided by the convenient functions allowing for a transformation between the data types.
xarray DataArays can be easily converted into iris cubes using xarray's `to__iris() <http://xarray.pydata.org/en/stable/generated/xarray.DataArray.to_iris.html>`_ method, while the Iris cubes produced as output of tobac can be turned into xarray DataArrays using the `from__iris() <http://xarray.pydata.org/en/stable/generated/xarray.DataArray.from_iris.html>`_ method.
xarray Datasets can be easily converted into pandas DataFrames using the Dataset's `to__dataframe() <http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_dataframe.html>`_ method, while the pandas DataFrames produced can be turned into xarray Datasets using the `to_xarray() <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_xarray.html>`_ method.


For the future development of the next major version of tobac, we are envisaging moving the basic data structures from Iris cubes to xarray DataArrays for improved computing performance and interoperability with other open-source sorftware packages.

The output of the different analysis steps in tobac are output as either pandas DataFrames in the case of one-dimensional data, such a lists of identified features or cloud trajectories or as Iris cubes in the case of 2D/3D/4D fields such as cloud masks. Note that the dataframe output from tracking is a superset of the features dataframe.

(quick note on terms; “feature” is a detected object at a single time step. “cell” is a series of features linked together over multiple timesteps)

Overview of the output dataframe from feature_dection
- Frame: the index along the time dimension in which the feature was detected
- hdim_1, hdim_2…: the central index location of the feature in the spatial dimensions of the input data
- num: the number of connected pixels that meet the threshold for detection for this feature
- threshold_value: the threshold value that was used to detect this feature. When using feature_detection_multithreshold this is the max/min (depending on whether the threshold values are increasing (e.g. precip) or decreasing (e.g. temperature) with intensity) threshold value used.
- feature: a unique integer >0 value corresponding to each feature
- time: the date and time of the feature, in datetime format
- timestr: the date and time of the feature in string format
- latitude, longitude: the central lat/lon of the feature
- x,y, etc: these are the central location of the feature in the original dataset coordinates

Also in the tracked output:
- Cell: The cell which each feature belongs to. Is nan if the feature could not be linked into a valid trajectory
- time_cell: The time of the feature along the tracked cell, in numpy.timedelta64[ns] format

The output from segmentation is an n-dimensional array produced by segmentation in the same coordinates of the input data. It has a single field, which provides a mask for the pixels in the data which are linked to each detected feature by the segmentation routine. Each non-zero value in the array provides the integer value of the feature which that region is attributed to.

Note that in future versions of tobac, it is planned to combine both output data types into a single hierarchical data structure containing both spatial and object information. Additional information about the planned changes can be found in the v2.0-dev project, as well as the tobac roadmap

9 changes: 9 additions & 0 deletions tobac/examples/eric_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from tobac.themes import tint
from copy import deepcopy
track_params = deepcopy(tint.objects.default_params)

nc_file_path = ('/home/rjackson/tracer-jcss/Tgrid_*.nc')
nc_grid = tint.io.load_cfradial_grids(nc_file_path)
# print(nc_grid)
tracks = tint.make_tracks(nc_grid, 'reflectivity', params=track_params)
print(tracks)
Binary file modified tobac/themes/tint/__pycache__/__init__.cpython-38.pyc
Binary file not shown.
Binary file modified tobac/themes/tint/__pycache__/grid_utils.cpython-38.pyc
Binary file not shown.
Binary file modified tobac/themes/tint/__pycache__/helpers.cpython-38.pyc
Binary file not shown.
Binary file modified tobac/themes/tint/__pycache__/io.cpython-38.pyc
Binary file not shown.
Binary file modified tobac/themes/tint/__pycache__/matching.cpython-38.pyc
Binary file not shown.
Binary file modified tobac/themes/tint/__pycache__/objects.cpython-38.pyc
Binary file not shown.
Binary file modified tobac/themes/tint/__pycache__/phase_correlation.cpython-38.pyc
Binary file not shown.
Binary file modified tobac/themes/tint/__pycache__/tracks.cpython-38.pyc
Binary file not shown.
25 changes: 16 additions & 9 deletions tobac/themes/tint/tracks.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def make_tracks(grid_ds, field, params=None):
counter = Counter()
tracks = pd.DataFrame()
cell_mask = xr.DataArray(
np.zeros((len(times), frame2.shape[0], frame2.shape[1])),
np.ones((len(times), frame2.shape[0], frame2.shape[1])),
dims=('time', 'x', 'y'))
for i in range(1, len(times)):
raw1 = raw2
Expand Down Expand Up @@ -72,8 +72,12 @@ def make_tracks(grid_ds, field, params=None):
record.add_uids(current_objects)
tracks = write_tracks(
tracks, record, current_objects, obj_props)
cell_mask[i] = frame1

uids = np.array([int(x) for x in current_objects['uid']])
id2 = np.array([int(x) for x in current_objects['id2']])
for j in range(uids.max()):
ind = np.argwhere(uids == j)
cell_mask[i, :, :] = np.where(frame2 == id2[ind], j, cell_mask[i, :, :])
cell_mask[i, :, :] = np.ma.masked_where(frame2 == 0, cell_mask[i, :, :])
record.update_scan_and_time(grid_obj1)
tracks = tracks.set_index(['cell'])
tracks = tracks.to_xarray()
Expand All @@ -86,7 +90,7 @@ def make_tracks(grid_ds, field, params=None):
tracks["cell_id"].attrs["parent"] = "storm_id"
tracks["cell_id"].attrs["parent_id"] = "cell_parent_storm_id"
tracks["cell_parent_storm_id"] = grid_ds["storm_id"]
tracks["cell_mask"] = cell_mask.astype(int)
tracks["cell_mask"] = cell_mask
tracks["cell_mask"].attrs["cf_role"] = grid_ds.attrs["tree_id"]
tracks["cell_mask"].attrs["long_name"] = "cell ID for this grid cell"
tracks["cell_mask"].attrs['coordinates'] = 'cell_id time latitude longitude'
Expand Down Expand Up @@ -144,8 +148,6 @@ def make_tracks_2d_field(grid_ds, field, params=None):
current_objects = None
newRain = True
continue
print(frame2.shape)
print(frame1.shape)
global_shift = get_global_shift(raw1, raw2, params)
pairs = get_pairs(frame1, frame2, global_shift, current_objects, record, params)
if newRain:
Expand All @@ -162,8 +164,13 @@ def make_tracks_2d_field(grid_ds, field, params=None):
record.add_uids(current_objects)
tracks = write_tracks(
tracks, record, current_objects, obj_props)
cell_mask[i] = frame1

uids = np.array([int(x) for x in current_objects['uid']])
id2 = np.array([int(x) for x in current_objects['id2']])
for j in range(uids.max()):
ind = np.argwhere(uids == j)
cell_mask[i, :, :] = np.where(frame2 == id2[ind], j, cell_mask[i, :, :])
cell_mask[i, :, :] = np.ma.masked_where(frame2 == 0, cell_mask[i, :, :])
cell_mask = np.ma.masked_where(cell_mask == 0, cell_mask)
record.update_scan_and_time(grid_obj1)
tracks = tracks.set_index(['cell'])
tracks = tracks.to_xarray()
Expand All @@ -176,7 +183,7 @@ def make_tracks_2d_field(grid_ds, field, params=None):
tracks["cell_id"].attrs["parent"] = "storm_id"
tracks["cell_id"].attrs["parent_id"] = "cell_parent_storm_id"
tracks["cell_parent_storm_id"] = grid_ds["storm_id"]
tracks["cell_mask"] = cell_mask.astype(int)
tracks["cell_mask"] = cell_mask
tracks["cell_mask"].attrs["cf_role"] = grid_ds.attrs["tree_id"]
tracks["cell_mask"].attrs["long_name"] = "cell ID for this grid cell"
tracks["cell_mask"].attrs['coordinates'] = 'cell_id time latitude longitude'
Expand Down

0 comments on commit fd9ce2f

Please sign in to comment.