# Track Mesoescale Convective Systems (MCS) at consecutive time steps to form tracks with a scheme based on brightness temperature

This notebook tracks detected MCS at individual time steps to form tracks. In production, you can use the `scripts/2_detect_and_track_MCS_scheme_Tb.py` for this step.

The principle for detecting the MCS is from the cold top of the clouds, according to a limiting range of brightness temperature, and an approximate horizontal area generating from the convex hull. The algorithm has the option to operate with only brightness temperature or associating this scheme with precipitation features. The selection criteria and filters for considering or not a MCS can be modified. The parameterization in this notebook is only based on the use of brightness temperature.

## The detection of the MCS (regions) is performed using these steps:

1. At any time pixel, find all where brightness temperature `Tb` $\le 215 K$ and trace an approximate region, with the convex hull, according to a binary structure where the pixels that satisfy the described condition are equal to $1$ and those that do not are equal to $0$.
2. Transform from geographic to plane coordinates the pixels and compute an approximate area of those regions traced. 
3. Discard all regions whose area is $\le 1000 km^2$.
4. Estimate the average, minimum and maximum brightness temperature of those regions.

## The tracks are performed using these steps:

Specifically, assume we have detected $n$ MSC at time $t$, and $m$ MSC at time $t+1$. There are theoretically $n \times m$ possible associations to link these two groups of MCS. Of cause not all of them are meaningful. The rules that are applied in the association process are:

1. **overlapping priority** principle: for any MCS at time $t$, the MCS with the highest percentage of overlap at time $t+1$ "wins" and is associated with it. 
2. The MCS with the lowest percentage of overlap at time $t+1$ could form a track on their own, and waits to be associated in the next iteration between $t+1$ and $t+2$.
3. No merging or splitting is allowed, any MCS at time $t$ can only be linked to one MCS at time $t+1$, similarly, any MCS at time $t+1$ can only be linked to one MCS at time $t$.
4. All tracks that do not get updated during the $t$ - $t+1$ process terminates. This assumes that no gap in the track is allowed. 
5. In this first part no tracks are discarded based on their total duration. The algorithm has the option for filtering the tracks with a specific minimun duration or not.

## Input data

**Brightness Temperature:**
NCEP/CPC L3 (Merge IR V1): Spatial and temporal resolution is 4 km and 30 minutes, 
data availability from February 7, 2000 to present. The interest variable of this dataset is `Tb` and the files format must be `netCDF4`.
https://doi.org/10.5067/P4HZB9N27EKU

In this case the algorithm is run for 2 days (2001/12/30/ 00 (UTC) - 2001/12/31/ 22 (UTC)) for northern South America. The input data are in the notebooks folder in the repository. The raw data can be downloaded with the link at the top.

## Steps

1. Make sure you have successfully installed the ATRACKCS library.
2. Execute the following code blocks in sequence.

## Results

* `resume_Tb_2001_12_29_19_2001_12_31_13.csv`: a csv table listing various attributes for the tracks and MCS associated.
* `map_Tb_2001_12_29_19_2001_12_31_13.html` (folium): plot of the geographical locations of the MSC with informations that links to the associaded tracks and features of the MSC.

## Bibliography
* Feng, Z., Leung, L. R., Liu, N., Wang, J., Houze, R. A., Li, J., Hardin, J. C., Chen, D., & Guo, J. (2021). A Global High‐resolution Mesoscale Convective System Database using Satellite‐derived Cloud Tops, Surface Precipitation, and Tracking. Journal of Geophysical Research: Atmospheres. https://doi.org/10.1029/2020jd034202
* Li, J., Feng, Z., Qian, Y., & Leung, L. R. (2020). A high-resolution unified observational data product of mesoscale convective systems and isolated deep convection in the United States for 2004–2017. Earth System Science Data Discussions, October, 1–48. https://doi.org/10.5194/essd-2020-151
* Liu, W., Cook, K. H., & Vizy, E. K. (2019). The role of mesoscale convective systems in the diurnal cycle of rainfall and its seasonality over sub-Saharan Northern Africa. Climate Dynamics, 52(1–2), 729–745. https://doi.org/10.1007/s00382-018-4162-y
* Vizy, E. K., & Cook, K. H. (2018). Mesoscale convective systems and nocturnal rainfall over the West African Sahel: role of the Inter-tropical front. Climate Dynamics, 50(1–2), 587–614. https://doi.org/10.1007/s00382-017-3628-7

### Set paths

As before, first we give the locations to the input (raw data) using `TBDIR` and `PDIR` and output data using `OUTDIR`.

In [1]:
%matplotlib inline
import os

TBDIR=os.path.join('2_input_data', 'tb/')

OUTDIR=os.path.join('2_output_data/')

### Parameters used in the MCS detection and tracking process

* `UTM_LOCAL_ZONE`: int, is needed for converting the WGS geodetic coordinate system to plane coordinate system. This is a constant that must be asociated with the interest region. 
* `UTC_LOCAL_HOUR`: int, is needed for converting the raw data hour (UTC) to a local hour (interest region).
* `UTC_LOCAL_SIGN`: str (minus, plus, local), is needed for converting the raw data hour (UTC) to a local hour (interest region).
* `DETECT_SCHEME`: str (Tb, Both), association scheme: Tb or Both (Tb and P). 
* `TB`: int, MCS based on limited maximun threshold cold cloud top.
* `AREA_TB`: int, MCS with a minimun largest area.
* `THRESHOLD_OVERLAPPING_P`: int, percentage overlap limit between MCS.
* `LOCATION_FOLIUM`: list (lat, lon), location for center the map_folium.
* `MIN_DURATION`: int, minimum required number of hours of a track.

In [2]:
#32718 is the UTM zone 18S plane coordinate system. 
#It was Used for tracking MCS in South America - Colombia.
UTM_LOCAL_ZONE = 32718 

#UTC-5 is the local hour for Colombia
UTC_LOCAL_HOUR = 5
UTC_LOCAL_SIGN = "minus"

#Scheme of association
DETECT_SCHEME = "Tb"

TB = 215 #deep convection regions

AREA_TB = 1000

THRESHOLD_OVERLAPPING_P = 15

LOCATION_FOLIUM = [5, -73.94]

MIN_DURATION = 2

### Importing modules

In [3]:
#--------Import modules-------------------------
from atrackcs.utils import funcs
from atrackcs.detect import detect_mcs
from atrackcs.features import features_Tb, features_Tracks
from atrackcs.track import track_mcs

### Reading raw Tb data.

The read process is handled with the function `funcs.readNC()`.

Also make sure the output folder exists.

In [4]:
ds = funcs.readNC(pathTb = TBDIR, pathP =None, utc_local_hour = UTC_LOCAL_HOUR, 
                  utc_local_sign = UTC_LOCAL_SIGN)

if not os.path.exists(OUTDIR):
    os.makedirs(OUTDIR)

Complete Tb data reading 2001-12-29T19:00 - 2001-12-31T17:00


The detection process is handled with the function `detect_mcs()`.

* `ds` is the `xarray.Dataset` object we just read in.

### Detecting MCS

In [5]:
mcs_l = detect_mcs(ds, detect_scheme = DETECT_SCHEME, Tb = TB, area_Tb = AREA_TB, 
                   utm_local_zone = UTM_LOCAL_ZONE, path_save = None)

Spots detection completed


The MCS Tb features is handled with the function `features_Tb()`.

* `mcs_l` is the `Geopandas.GeoDataFrame` object we just created in the detection process.

In [6]:
mcs_l = features_Tb(mcs_l, ds)

Estimating Tb spots features: 


100%|████████████████████████████████████████████████████████████████████████████████| 978/978 [01:12<00:00, 13.42it/s]


* `mcs_l` is the `geopandas.GeoDataFrame` object we just created in the tracking process.

Each `row` object in `mcs_l` stores a MCS record. We can have a peak into what `msc_l` columns contains so far. 

* `time`: datetime64, hour (UTC-5) for this case.
* `Tb`: float, polygon index after being filtered by the established parameterization.
* `geometry`: geometry, MCS polygon (convex hull). To view the coordinate reference system of the geometry column, access the crs attribute. For this case is `EPSG:32718`.
* `area_Tb`: float, area polygon [$km^2$]
* `centroid_`: geometry, geometric centroid polygon (convex hull). The crs in  this case is `EPSG:32718`.
* `mean_tb`: float, brightness temperature average of the pixels composing the polygon. [$K$]
* `min_tb`: float, brightness temperature min value of the pixels composing the polygon. [$K$]
* `max_tb`: float, brightness temperature max value of the pixels composing the polygon. [$K$]

For this case 979 MCS were detected with the selected parameterization.

In [7]:
mcs_l

Unnamed: 0,time,Tb,geometry,area_tb,centroid_,mean_tb,min_tb,max_tb
0,2001-12-29 19:00:00,4.0,"POLYGON ((588581.067 8980308.452, 560619.762 8...",6881.7,POINT (596434.594 9030696.135),206.3886,196.0,219.0
1,2001-12-29 19:00:00,5.0,"POLYGON ((756503.800 8983596.395, 752530.163 8...",1788.7,POINT (779127.063 9001996.176),208.6316,203.0,221.0
2,2001-12-29 19:00:00,6.0,"POLYGON ((456662.179 9024635.193, 436580.870 9...",9958.2,POINT (496475.685 9079680.424),208.5105,196.0,235.0
3,2001-12-29 19:00:00,14.0,"POLYGON ((1043090.386 9162378.939, 1031053.674...",1423.4,POINT (1042465.448 9183727.004),209.6966,204.0,216.0
4,2001-12-29 19:00:00,16.0,"POLYGON ((332056.056 9177224.939, 324010.729 9...",22988.6,POINT (383921.811 9259032.613),206.5761,195.0,231.0
...,...,...,...,...,...,...,...,...
974,2001-12-31 17:00:00,88.0,"POLYGON ((1324502.020 10148030.754, 1316327.87...",2217.7,POINT (1325217.005 10182458.642),209.2093,204.0,215.0
975,2001-12-31 17:00:00,94.0,"POLYGON ((-185800.524 10912282.073, -185482.70...",2668.7,POINT (-154911.838 10955537.688),212.8415,208.0,222.0
976,2001-12-31 17:00:00,95.0,"POLYGON ((-261845.639 10945946.695, -261772.99...",6082.0,POINT (-206888.235 10997871.976),213.8760,211.0,225.0
977,2001-12-31 17:00:00,101.0,"POLYGON ((66392.993 11159093.993, 58464.441 11...",2668.3,POINT (70470.248 11192769.061),215.7267,209.0,228.0


In [8]:
mcs_l.crs

<Projected CRS: EPSG:32718>
Name: WGS 84 / UTM zone 18S
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 78°W and 72°W, southern hemisphere between 80°S and equator, onshore and offshore. Argentina. Brazil. Chile. Colombia. Ecuador. Peru.
- bounds: (-78.0, -80.0, -72.0, 0.0)
Coordinate Operation:
- name: UTM zone 18S
- method: Transverse Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

### Tracking MCS

The tracking process is handled with the function `track_mcs()`. However, these have not been characterized in this phase. The tracks features is handled with the function `features_Tracks()`.

In [9]:
tracks_l = track_mcs(mcs_l, threshold_overlapping_percentage = THRESHOLD_OVERLAPPING_P, utm_local_zone = UTM_LOCAL_ZONE,
                path_save = None)

Estimating trajectories: 


100%|████████████████████████████████████████████████████████████████████████████████| 978/978 [01:36<00:00, 10.11it/s]


In [10]:
tracks_l = features_Tracks(tracks_l, initial_time_hour = MIN_DURATION,
                         path_save = OUTDIR)

Estimating distance and direction between geometrics centroids: 


100%|████████████████████████████████████████████████████████████████████████████████| 580/580 [00:20<00:00, 28.40it/s]


* `tracks_l` is a `geopandas.GeoDataFrame` object we just created in the tracking process and stores the MCS polygons and the tracks. 
* This `GeoDataFrame` contains exactly the information previously referenced except for the dropping of some features associated to the brightness temperature and some characteristics generated for the tracks.
* The indexing of this object is a `pandas.MultiIndex` since it hierarchically associates an id for each track and an id for each MCS.
* The indexing of this object is encrypted generated using the `uuid` library. To disable encryption use the parameter `encrypt_index = False` in the `features_Tracks()` function. The index generated when not using encryption is a `int` iteration result  of the tracking process.
* The indexes encryption is useful when processing a long period of time and must iterate for smaller periods of time, for example every 6 months. This is a limitation imposed by the hardware used to run the algorithm.

We can have a peak into what `tracks_l` new columns contains. 


* `belong`: str, encrypted index generated for each track.
* `id_gdf`: str, encrypted index generated for each MCS.
* `geometry`: geometry, polygon. The crs is `EPSG:4326-WGS84`
* `centroid_`: geometry, geometric centroid polygon. The crs is `EPSG:4326`.
* `intersection_percentage`: float, percentage overlap between MCS [%].
* `distance_c`: float, distance between the overlapping geometric centroids of the MCS [km].
* `direction`: float, direction between the overlapping geometric centroids of the MCS [°].
* `total_duration`: float, total duration of the event or the track. This value is associated to each MCS of the corresponding track [h]. 
* `total_distance`: float, total distance of the event or the track. This value is associated to each MCS of the corresponding track [km]. 
* `mean_velocity`: float, velocity average of the  event or the track. This value is associated to each MCS of the corresponding track [$km \times {h^{-1}}$]. 

In [11]:
tracks_l

Unnamed: 0_level_0,Unnamed: 1_level_0,time,geometry,area_tb,centroid_,mean_tb,intersection_percentage,distance_c,direction,total_duration,total_distance,mean_velocity
belong,id_gdf,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1504db32-f1c9,43a8-bed5-6ded81736368,2001-12-30 15:00:00,"POLYGON ((-72.73848 -9.73317, -73.24777 -9.478...",12251.2,POINT (-73.12759 -9.15059),218.2818,27.1,59.673891,127.3,4,184.561366,46.140342
1504db32-f1c9,481a-bc6b-75fc8626d151,2001-12-30 19:00:00,"POLYGON ((-72.55659 -9.47847, -73.21140 -8.787...",10527.0,POINT (-72.67245 -8.84978),213.4062,,52.653137,336.8,4,184.561366,46.140342
1504db32-f1c9,4a7b-8b7f-797faeae7611,2001-12-30 14:00:00,"POLYGON ((-73.53880 -9.11462, -73.57518 -9.078...",2850.0,POINT (-73.55861 -8.82275),213.2022,21.6,,,4,184.561366,46.140342
1504db32-f1c9,4b09-91a3-39d8dafaad00,2001-12-30 17:00:00,"POLYGON ((-73.02950 -9.98787, -73.10226 -9.951...",25972.0,POINT (-72.48495 -9.28784),219.0446,36.7,72.234338,102.1,4,184.561366,46.140342
1d904d9e-5bf4,492e-a3a8-219dd3b43505,2001-12-30 06:00:00,"POLYGON ((-87.98100 4.63918, -88.01738 4.71195...",5272.6,POINT (-87.64802 4.98550),215.0362,58.6,,,4,243.509468,60.877367
...,...,...,...,...,...,...,...,...,...,...,...,...
f81b428b-cc1e,4771-b908-7ed1066e49eb,2001-12-30 02:00:00,"POLYGON ((-80.85085 6.42208, -81.57841 6.49485...",25324.4,POINT (-80.91121 7.06723),210.5013,21.3,8.411150,65.4,3,183.215099,61.0717
f81b428b-cc1e,4c10-b2b7-30781f5b3a62,2001-12-30 04:00:00,"POLYGON ((-80.30518 6.24015, -80.88723 6.34930...",100973.5,POINT (-79.46052 7.70055),212.0053,,174.803949,66.4,3,183.215099,61.0717
f86afe72-8ac5,407d-9937-7eeeea87d161,2001-12-29 20:00:00,"POLYGON ((-75.28497 -8.82353, -75.39410 -8.714...",7402.4,POINT (-75.12620 -8.40469),210.3094,21.8,13.553581,230.1,3,21.750122,7.250041
f86afe72-8ac5,487a-839c-643a1738cc7f,2001-12-29 19:00:00,"POLYGON ((-75.39410 -8.82353, -75.57599 -8.350...",9958.2,POINT (-75.03201 -8.32583),208.5105,68.5,,,3,21.750122,7.250041


In [12]:
tracks_l.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [13]:
#Total tracks and MCS detected 

print ("total tracks detected: " + str(len(tracks_l.index.levels[0])) + " and total MCS detected: " + str(len(tracks_l.index.levels[1])))

total tracks detected: 21 and total MCS detected: 68


### As the last step, let is reload the tracks results from the local storage and plot the MCS with the help of the folium library.

The read process tracks is handled with the function `funcs.readTRACKS()`.


In [14]:
#-------------------Load results-------------------
tracks_l = funcs.readTRACKS('2_output_data/resume_Tb_2001_12_29_19_2001_12_31_13.csv')

print ("total tracks detected: : " + str(len(tracks_l.index.levels[0])) + " and total MCS detected: " + str(len(tracks_l.index.levels[1])))

total tracks detected: : 20 and total MCS detected: 66


The function `funcs.plot_folium()` save the `.html` result and return the path where was saved.


In [15]:
path_html_folium = funcs.plot_folium(tracks_l, location = LOCATION_FOLIUM, path_save = OUTDIR)

In [16]:
import IPython
iframe = '<iframe src=' + path_html_folium + ' width=1000 height=500></iframe>'
IPython.display.HTML(iframe)