# Track Mesoescale Convective Systems (MCS) at consecutive time steps to form tracks with a scheme based on brightness temperature

This notebook tracks detected MCS at individual time steps to form tracks. In production, you can use the `scripts/2_detect_and_track_MCS_scheme_Tb.py` for this step.

ATRACKCS is a Python package for the automated detection and tracking of MCS. It is a potential tool for characterizing their spatio-temporal distribution and evolution. ATRACKCS provides a set of Python functions designed for a workflow analysis  that includes the detection and characterization of  MCS, as well as the integration in tracks, allowing detailed monitoring of the MCS life cycle both in space and time.

ATRACKCS uses brightness temperature (Tb) and precipitation (P) coming from satellite data and can operate with Tb as the only input variable or associating precipitation features. Although the magnitude of Tb does not represent a direct measurement of P, it is an indirect representation of cloud cover height associated with an MCS event. Some methodologies for MCS detection use other satellite spectral bands as a proxy for P. The algorithm parameterization can be adapted to the needs of the MCS detection, as the user is allowed to define the thresholds of Tb and P. The parameterization in this notebook is only based on the use of Tb.

## The detection of the MCS (regions) is performed using these steps:

1. At a given time step, the algorithm finds all pixels where `Tb` $\le 215 K$ and defines approximate regions with the convex hull, using a binary structure where the pixels that satisfy the described condition are equal to $1$ and the remaining pixels are equal to $0$.
2. Transform from geographic to plane coordinates and compute an approximate area of the defined regions.
3. Discard all regions whose area is $< 1000 km^2$.
4. Estimate the average, minimum and maximum brightness temperature of those regions.

## The tracks are performed using these steps:

Specifically, assume we have detected $n$ MSC at time $t$, and $m$ MSC at time $t+1$. There are theoretically $n \times m$ possible associations to link these two groups of MCS.  Of course not all of them are meaningful. The rules that are applied in the association process are:

1. **overlapping priority** principle: for any MCS at time t, the MCS with the highest overlap percentage at time t+1 "wins" and is associated with it.
2. The MCS (with lower or no overlap percentages) at time $t+1$ could form a track on their own, and are left to be associated in the next iteration between $t+1$ and $t+2$.
3. No merging or splitting is allowed, any MCS at time $t$ can only be linked to one MCS at time $t+1$, similarly, any MCS at time $t+1$ can only be linked to one MCS at time $t$.
4. All tracks that do not get updated during the $t$ - $t+1$ process terminate. This assumes that no gap in the track is allowed. 
5. Discard tracks that last 2 hours or less.

## Input data

**Brightness Temperature:**
NCEP/CPC L3 (Merge IR V1): Spatial and temporal resolution is 4 km and 30 minutes, 
data availability from February 7, 2000 to present. The interest variable of this dataset is `Tb` and the files format must be `netCDF4`.
https://doi.org/10.5067/P4HZB9N27EKU

We suggest the option `subset/get data` and use `OpenDAP` method for downloading and refining the date range and interest region.

In this case the algorithm is run for 2 days (2001/12/30/ 00 (UTC) - 2001/12/31/ 22 (UTC)) for northern South America. The input data are in the notebooks folder in the repository. The raw data can be downloaded with the link at the top.

## Steps

1. Make sure you have successfully installed the ATRACKCS library.
2. Execute the following code blocks in sequence.

## Results

* `resume_Tb_2001_12_29_19_2001_12_31_13.csv`: a csv table listing various attributes for the tracks and MCS associated.
* `map_Tb_2001_12_29_19_2001_12_31_13.html` (folium): plot of the geographical locations of the MSC with informations that links to the associaded tracks and features of the MSC.

## Bibliography
* Feng, Z., Leung, L. R., Liu, N., Wang, J., Houze, R. A., Li, J., Hardin, J. C., Chen, D., & Guo, J. (2021). A Global High‐resolution Mesoscale Convective System Database using Satellite‐derived Cloud Tops, Surface Precipitation, and Tracking. Journal of Geophysical Research: Atmospheres. https://doi.org/10.1029/2020jd034202
* Li, J., Feng, Z., Qian, Y., & Leung, L. R. (2020). A high-resolution unified observational data product of mesoscale convective systems and isolated deep convection in the United States for 2004–2017. Earth System Science Data Discussions, October, 1–48. https://doi.org/10.5194/essd-2020-151
* Liu, W., Cook, K. H., & Vizy, E. K. (2019). The role of mesoscale convective systems in the diurnal cycle of rainfall and its seasonality over sub-Saharan Northern Africa. Climate Dynamics, 52(1–2), 729–745. https://doi.org/10.1007/s00382-018-4162-y
* Vizy, E. K., & Cook, K. H. (2018). Mesoscale convective systems and nocturnal rainfall over the West African Sahel: role of the Inter-tropical front. Climate Dynamics, 50(1–2), 587–614. https://doi.org/10.1007/s00382-017-3628-7

### Set paths

First of all we assign the locations to the input (raw data) using `TBDIR` and output data using `OUTDIR`.

In [1]:
%matplotlib inline
import os

TBDIR=os.path.join('2_input_data', 'tb/')

OUTDIR=os.path.join('2_output_data/')

### Parameters used in the MCS detection and tracking process

* `UTM_LOCAL_ZONE`: int, is needed for converting the WGS geodetic coordinate system to plane coordinate system. This is a constant that must be asociated with the interest region. 
* `UTC_LOCAL_HOUR`: int, is needed for converting the raw data hour (UTC) to a local hour (interest region).
* `UTC_LOCAL_SIGN`: str (minus, plus, local), is needed for converting the raw data hour (UTC) to a local hour (interest region).
* `DETECT_SCHEME`: str (Tb, Both), association scheme: Tb or Both (Tb and P). 
* `TB`: int, MCS based on limited maximun threshold cold cloud top.
* `AREA_TB`: int, MCS with a minimun largest area.
* `THRESHOLD_OVERLAPPING_P`: int, percentage overlap limit between MCS to be considered part of the track.
* `LOCATION_FOLIUM`: list (lat, lon), location for center the map_folium.
* `MIN_DURATION`: int, for filtering tracks based on minimun duration.

In [2]:
#32718 is the UTM zone 18S plane coordinate system. 
#It was Used for tracking MCS in South America - Colombia.
UTM_LOCAL_ZONE = 32718 

#UTC-5 is the local hour for Colombia
UTC_LOCAL_HOUR = 5
UTC_LOCAL_SIGN = "minus"

#Scheme of association
DETECT_SCHEME = "Tb"

TB = 215 #deep convection regions

AREA_TB = 1000

THRESHOLD_OVERLAPPING_P = 15

LOCATION_FOLIUM = [5, -73.94]

MIN_DURATION = 3

### Importing modules

In [3]:
#--------Import modules-------------------------
from atrackcs.utils import funcs
from atrackcs.detect import detect_mcs
from atrackcs.features import features_Tb, features_Tracks
from atrackcs.track import track_mcs

### Reading raw Tb data.

The reading process is handled with the function `funcs.readNC()`.

Also make sure the output folder exists.

In [4]:
ds = funcs.readNC(pathTb = TBDIR, pathP =None, utc_local_hour = UTC_LOCAL_HOUR, 
                  utc_local_sign = UTC_LOCAL_SIGN)

if not os.path.exists(OUTDIR):
    os.makedirs(OUTDIR)

Complete Tb data reading 2001-12-29T19:00 - 2001-12-31T17:00


The detection process is handled with the function `detect_mcs()`.

* `ds` is the `xarray.Dataset` object we just read in.

### Detecting MCS

In [5]:
mcs_l = detect_mcs(ds, detect_scheme = DETECT_SCHEME, Tb = TB, area_Tb = AREA_TB, 
                   utm_local_zone = UTM_LOCAL_ZONE, path_save = None)

MCS detection completed


The MCS Tb features is handled with the function `features_Tb()`.

* `mcs_l` is the `Geopandas.GeoDataFrame` object we just created in the detection process.

In [6]:
mcs_l = features_Tb(mcs_l, ds)

Estimating MCS's brightness temperature attributes: 


100%|████████████████████████████████████████████████████████████████████████████████| 978/978 [01:07<00:00, 14.52it/s]


* `mcs_l` is the `geopandas.GeoDataFrame` object we just created in the tracking process.

Each `row` object in `mcs_l` stores a MCS record. We can have a peak into what `msc_l` columns contains so far. 

* `time`: datetime64, hour (UTC-5) for this case.
* `Tb`: float, polygon index after being filtered by the established parameterization.
* `geometry`: geometry, MCS polygon (convex hull). To view the coordinate reference system of the geometry column, access the crs attribute. For this case is `EPSG:32718`.
* `area_Tb`: float, area polygon [$km^2$]
* `centroid_`: geometry, geometric centroid polygon (convex hull). The crs in  this case is `EPSG:32718`.
* `mean_tb`: float, brightness temperature average of the pixels composing the polygon. [$K$]
* `min_tb`: float, brightness temperature min value of the pixels composing the polygon. [$K$]
* `max_tb`: float, brightness temperature max value of the pixels composing the polygon. [$K$]

For this case 979 MCS were detected with the selected parameterization.

In [7]:
mcs_l

Unnamed: 0,time,Tb,geometry,area_tb,centroid_,mean_tb,min_tb,max_tb
0,2001-12-29 19:00:00,4.0,"POLYGON ((588581.067 8980308.452, 560619.762 8...",6881.7,POINT (596434.594 9030696.135),206.3886,196.0,219.0
1,2001-12-29 19:00:00,5.0,"POLYGON ((756503.800 8983596.395, 752530.163 8...",1788.7,POINT (779127.063 9001996.176),208.6316,203.0,221.0
2,2001-12-29 19:00:00,6.0,"POLYGON ((456662.179 9024635.193, 436580.870 9...",9958.2,POINT (496475.685 9079680.424),208.5105,196.0,235.0
3,2001-12-29 19:00:00,14.0,"POLYGON ((1043090.386 9162378.939, 1031053.674...",1423.4,POINT (1042465.448 9183727.004),209.6966,204.0,216.0
4,2001-12-29 19:00:00,16.0,"POLYGON ((332056.056 9177224.939, 324010.729 9...",22988.6,POINT (383921.811 9259032.613),206.5761,195.0,231.0
...,...,...,...,...,...,...,...,...
974,2001-12-31 17:00:00,88.0,"POLYGON ((1324502.020 10148030.754, 1316327.87...",2217.7,POINT (1325217.005 10182458.642),209.2093,204.0,215.0
975,2001-12-31 17:00:00,94.0,"POLYGON ((-185800.524 10912282.073, -185482.70...",2668.7,POINT (-154911.838 10955537.688),212.8415,208.0,222.0
976,2001-12-31 17:00:00,95.0,"POLYGON ((-261845.639 10945946.695, -261772.99...",6082.0,POINT (-206888.235 10997871.976),213.8760,211.0,225.0
977,2001-12-31 17:00:00,101.0,"POLYGON ((66392.993 11159093.993, 58464.441 11...",2668.3,POINT (70470.248 11192769.061),215.7267,209.0,228.0


In [8]:
mcs_l.crs

<Projected CRS: EPSG:32718>
Name: WGS 84 / UTM zone 18S
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 78°W and 72°W, southern hemisphere between 80°S and equator, onshore and offshore. Argentina. Brazil. Chile. Colombia. Ecuador. Peru.
- bounds: (-78.0, -80.0, -72.0, 0.0)
Coordinate Operation:
- name: UTM zone 18S
- method: Transverse Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

### Tracking MCS

The tracking process is handled with the function `track_mcs()`. However, these have not been characterized in this phase. The tracks features are handled with the function `features_Tracks()`.

In [9]:
tracks_l = track_mcs(mcs_l, threshold_overlapping_percentage = THRESHOLD_OVERLAPPING_P, utm_local_zone = UTM_LOCAL_ZONE,
                path_save = None)

Estimating trajectories: 


100%|████████████████████████████████████████████████████████████████████████████████| 978/978 [01:32<00:00, 10.57it/s]


In [10]:
tracks_l = features_Tracks(tracks_l, initial_time_hour = MIN_DURATION,
                         path_save = OUTDIR)

Estimating distance and direction between geometrics centroids: 


100%|████████████████████████████████████████████████████████████████████████████████| 580/580 [00:15<00:00, 38.03it/s]


* `tracks_l` is a `geopandas.GeoDataFrame` object we just created in the tracking process and stores the MCS polygons and the tracks. 
* This `GeoDataFrame` contains exactly the information previously referenced except for the dropping of some features associated to the brightness temperature and some characteristics generated for the tracks.
* The indexing of this object is a `pandas.MultiIndex` since it hierarchically associates an id for each track and an id for each MCS.
* The indexing of this object is encrypted generated using the `uuid` library. To disable encryption use the parameter `encrypt_index = False` in the `features_Tracks()` function. The index generated when not using encryption is a `int` iteration result  of the tracking process.
* The indexes encryption is useful when processing a long period of time and must iterate for smaller periods of time, for example every 6 months. This is a limitation imposed by the hardware used to run the algorithm.

We can have a peak into what `tracks_l` new columns contains. 


* `belong`: str, encrypted index generated for each track.
* `id_gdf`: str, encrypted index generated for each MCS.
* `geometry`: geometry, polygon. The crs is `EPSG:4326-WGS84`
* `centroid_`: geometry, geometric centroid polygon. The crs is `EPSG:4326`.
* `intersection_percentage`: float, percentage overlap between MCS [%].
* `distance_c`: float, distance between the overlapping geometric centroids of the MCS [km].
* `direction`: float, direction between the overlapping geometric centroids of the MCS [°].
* `total_duration`: float, total duration of the event or the track. This value is associated to each MCS of the corresponding track [h]. 
* `total_distance`: float, total distance of the event or the track. This value is associated to each MCS of the corresponding track [km]. 
* `mean_velocity`: float, velocity average of the  event or the track. This value is associated to each MCS of the corresponding track [$km \times {h^{-1}}$]. 

In [11]:
tracks_l

Unnamed: 0_level_0,Unnamed: 1_level_0,time,geometry,area_tb,centroid_,mean_tb,intersection_percentage,distance_c,direction,total_duration,total_distance,mean_velocity
belong,id_gdf,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1289561f-e0e3,48f4-80f4-de9a2382d590,2001-12-29 23:00:00,"POLYGON ((-79.14107 7.76835, -79.21382 7.80473...",2414.0,POINT (-79.01020 8.07086),210.7297,35.0,,,3,54.069248,18.023083
1289561f-e0e3,4a58-90d1-a42d1848e81a,2001-12-30 02:00:00,"POLYGON ((-79.43209 6.89509, -79.54123 6.96786...",38597.2,POINT (-78.52969 8.15990),212.1504,,45.295089,81.7,3,54.069248,18.023083
1289561f-e0e3,4b99-a026-9ed11a97e73e,2001-12-30 00:00:00,"POLYGON ((-79.14107 7.62280, -79.21382 7.65919...",6900.3,POINT (-78.93632 8.10040),210.6698,17.9,8.774159,68.2,3,54.069248,18.023083
16553548-86ba,40fd-ae58-fd5760d34917,2001-12-30 09:00:00,"POLYGON ((-87.90825 4.71195, -88.30840 4.74833...",30899.2,POINT (-87.45844 5.57253),215.4708,48.8,64.719231,16.2,4,243.509468,60.877367
16553548-86ba,4526-b0d6-2ef11a5cb448,2001-12-30 07:00:00,"POLYGON ((-87.94463 4.60279, -88.05376 4.63918...",9001.5,POINT (-87.62163 5.01061),212.7443,25.5,4.034909,46.4,4,243.509468,60.877367
...,...,...,...,...,...,...,...,...,...,...,...,...
f864e2e8-7933,4904-b535-981b3cafcf9a,2001-12-29 19:00:00,"POLYGON ((-74.19361 -9.22377, -74.44826 -9.151...",6881.7,POINT (-74.12322 -8.76789),206.3886,51.0,,,3,55.467188,18.489063
f864e2e8-7933,49ac-85d0-30baa30624d0,2001-12-29 21:00:00,"POLYGON ((-74.22999 -9.47847, -74.30275 -9.369...",1840.8,POINT (-74.15063 -9.26216),213.5593,,27.124123,173.7,3,55.467188,18.489063
fb037325-65bb,46fb-b101-801f7d6f873f,2001-12-30 01:00:00,"POLYGON ((-81.46928 6.45846, -81.61479 6.53123...",15340.5,POINT (-80.98043 7.03558),209.6794,60.3,,,3,183.215099,61.0717
fb037325-65bb,4aa9-bafb-5c3b1284a198,2001-12-30 02:00:00,"POLYGON ((-80.85085 6.42208, -81.57841 6.49485...",25324.4,POINT (-80.91121 7.06723),210.5013,21.3,8.411150,65.4,3,183.215099,61.0717


In [12]:
tracks_l.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [13]:
#Total tracks and MCS detected 

print ("total tracks detected: " + str(len(tracks_l.index.levels[0])) + " and total MCS detected: " + str(len(tracks_l.index.levels[1])))

total tracks detected: 21 and total MCS detected: 68


### As the last step, let is reload the tracks results from the local storage and plot the MCS with the help of the folium library.

The read process tracks is handled with the function `funcs.readTRACKS()`.


In [14]:
#-------------------Load results-------------------
tracks_l = funcs.readTRACKS('2_output_data/resume_Tb_2001_12_29_19_2001_12_31_07.csv')

print ("total tracks detected: : " + str(len(tracks_l.index.levels[0])) + " and total MCS detected: " + str(len(tracks_l.index.levels[1])))

total tracks detected: : 21 and total MCS detected: 68


The function `funcs.plot_folium()` saves the `.html` result and return the path where was saved.


In [15]:
path_html_folium = funcs.plot_folium(tracks_l, location = LOCATION_FOLIUM, path_save = OUTDIR)

In [16]:
import IPython
iframe = '<iframe src=' + path_html_folium + ' width=1000 height=500></iframe>'
IPython.display.HTML(iframe)