# Track Mesoescale Convective Systems (MCS) at consecutive time steps to form tracks with a scheme based on brightness temperature and precipitation

This notebook tracks detected MCS at individual time steps to form tracks.

The principle for detecting the MCS is from the cold top of the clouds, according to a limiting range of brightness temperature, and an approximate horizontal area generating from the convex hull. The algorithm has the option to operate with only brightness temperature or associating this scheme with precipitation features. The selection criteria and filters for considering or not a MCS can be modified. The parameterization in this notebook was established from an extensive literature review, which can be consulted below.

## The detection of the MCS (regions) is performed using these steps:

1. At any time pixel, find all where temperature brightness `Tb` $\le 225 K$ and trace an approximate region, with the convex hull, according to a binary structure where the pixels that satisfy the described condition are equal to $1$ and those that do not are equal to $0$.
2. Transform from geographic to plane coordinates the pixels and compute an approximate area of those regions traced. 
3. Discard all regions whose area is $\le 2000 km^2$.
4. Estimate the average, minimum and maximum temperature brightness of those regions.
5. At any time pixel, find all where precipitation `P` $\ge 2 mm \times {h^{-1}}$ and discard pixels that do not match with the regions estimated in that time step.
6. Estimate an approximate area of those precipitation pixels, that satisfy the previous condition and that are contained in the regions.
7. Estimate the average and maximum precipitation for each region whose area is $\ge 500 km^2$. The algorithm has the option of discard those regions whose precipitaion area is $\le 500 km^2$, but in this case those regions are going to be part of the possible tracks.

## The tracks are performed using these steps:

Specifically, assume we have detected $n$ MSC at time $t$, and $m$ MSC at time $t+1$. There are theoretically $n \times m$ possible associations to link these two groups of MCS. Of cause not all of them are meaningful. The rules that are applied in the association process are:

1. **overlapping priority** principle: for any MCS at time $t$, the MCS with the highest percentage of overlap at time $t+1$ "wins" and is associated with it. 
2. The MCS with the lowest percentage of overlap at time $t+1$ could form a track on their own, and waits to be associated in the next iteration between $t+1$ and $t+2$.
3. No merging or splitting is allowed, any MCS at time $t$ can only be linked to one MCS at time $t+1$, similarly, any MCS at time $t+1$ can only be linked to one MCS at time $t$.
4. All tracks that do not get updated during the $t$ - $t+1$ process terminates. This assumes that no gap in the track is allowed. 
5. In this first part no tracks are discarded based on their total duration. The algorithm has the option for filtering the tracks with a specific minimun duration or not.

## Input data

**Brightness Temperature:**
NCEP/CPC L3 (Merge IR V1): Spatial and temporal resolution is 4 km and 30 minutes, 
data availability from February 7, 2000 to present. The interest variable of this dataset is `Tb` and the files format must be `netCDF4`.
https://doi.org/10.5067/P4HZB9N27EKU

**Precipitation:**
GPM (IMERG V06B): Spatial and temporal resolution is 10 km and 30 minutes, 
data availability from June 1, 2000 to present. The interest variable of this dataset is `PrecipitationCal` and the files format must be `netCDF4`.
https://doi.org/10.5067/GPM/IMERG/3B-HH/06

In this case the algorithm is run for 5 days (2019/12/25/ 00 (UTC) - 2019/12/31/ 00 (UTC)) for northern South America. The input data are in the notebooks folder in the repository. The raw data can be downloaded with the links at the top.

## Steps

1. Make sure you have successfully installed the ATRACKCS library.
2. Execute the following code blocks in sequence.

## Results

* `resume_Tb_P_2019_12_24_19_2019_12_31_17.csv`: a csv table listing various attributes for the tracks and MCS associated.
* `map_Tb_P_2019_12_24_19_2019_12_31_10.html` (folium): plot of the geographical locations of the MSC with informations that links to the associaded tracks and features of the MSC.

## Bibliography
* Feng, Z., Leung, L. R., Liu, N., Wang, J., Houze, R. A., Li, J., Hardin, J. C., Chen, D., & Guo, J. (2021). A Global High‐resolution Mesoscale Convective System Database using Satellite‐derived Cloud Tops, Surface Precipitation, and Tracking. Journal of Geophysical Research: Atmospheres. https://doi.org/10.1029/2020jd034202
* Li, J., Feng, Z., Qian, Y., & Leung, L. R. (2020). A high-resolution unified observational data product of mesoscale convective systems and isolated deep convection in the United States for 2004–2017. Earth System Science Data Discussions, October, 1–48. https://doi.org/10.5194/essd-2020-151
* Liu, W., Cook, K. H., & Vizy, E. K. (2019). The role of mesoscale convective systems in the diurnal cycle of rainfall and its seasonality over sub-Saharan Northern Africa. Climate Dynamics, 52(1–2), 729–745. https://doi.org/10.1007/s00382-018-4162-y
* Vizy, E. K., & Cook, K. H. (2018). Mesoscale convective systems and nocturnal rainfall over the West African Sahel: role of the Inter-tropical front. Climate Dynamics, 50(1–2), 587–614. https://doi.org/10.1007/s00382-017-3628-7

### Set paths

As before, first we give the locations to the input (raw data) using `TBDIR` and `PDIR` and output data using `OUTDIR`.

In [1]:
%matplotlib inline
import os

TBDIR=os.path.join('1_input_data', 'tb/')
PDIR=os.path.join('1_input_data', 'p/')

OUTDIR=os.path.join('1_output_data/')

### Parameters used in the MCS detection and tracking process

* `UTM_LOCAL_ZONE`: int, is needed for converting the WGS geodetic coordinate system to plane coordinate system. This is a constant that must be asociated with the interest region. 
* `UTC_LOCAL_HOUR`: int, is needed for converting the raw data hour (UTC) to a local hour (interest region).
* `UTC_LOCAL_SIGN`: str (minus, plus, local), is needed for converting the raw data hour (UTC) to a local hour (interest region).
* `DETECT_SCHEME`: str (Tb, Both), association scheme: Tb or Both (Tb and P). 
* `TB`: int, MCS based on limited maximun threshold cold cloud top.
* `AREA_TB`: int, MCS with a minimun largest area.
* `MIN_P`: int, MCS with a minimun precipitation pixel.
* `AREA_P`: int, MCS with an mimimum area precipitation.
* `DROP_EMPTY_PRECIPITATION`: boolean, if `True` eliminates MCS that do not contain precipitation with the `MIN_P` and `AREA_P` selected. 
* `THRESHOLD_OVERLAPPING_P`: int, percentage overlap limit between MCS.
* `LOCATION_FOLIUM`: list (lat, lon), location for center the map_folium.
* `MIN_DURATION`: int, minimum required number of hours of a track.


In [2]:
#32718 is the UTM zone 18S plane coordinate system. 
#It was Used for tracking MCS in South America - Colombia.
UTM_LOCAL_ZONE = 32718 

#UTC-5 is the local hour for Colombia
UTC_LOCAL_HOUR = 5
UTC_LOCAL_SIGN = "minus"

#Scheme of association
DETECT_SCHEME = "Both"

TB = 225 #[Feng et al.,(2021); Li et al.,(2020)]

AREA_TB = 2000 # [Lui et al., (2019); Vizy & Cook,(2018)] 

MIN_P = 2 #[Feng et al.,(2021)]  

AREA_P = 500 #[Feng et al.,(2021)]

DROP_EMPTY_PRECIPITATION = False

THRESHOLD_OVERLAPPING_P = 0

LOCATION_FOLIUM = [5, -73.94]

MIN_DURATION = 0

### Importing modules

In [3]:
#--------Import modules-------------------------
from atrackcs.utils import funcs
from atrackcs.detect import detect_mcs
from atrackcs.features import features_Tb, features_P, features_Tracks
from atrackcs.track import track_mcs

### Reading raw Tb and P data.

The read process is handled with the function `funcs.readNC()`.

Also make sure the output folder exists.

In [25]:
ds = funcs.readNC(pathTb = TBDIR, pathP =PDIR, utc_local_hour = UTC_LOCAL_HOUR, 
                  utc_local_sign = UTC_LOCAL_SIGN)

if not os.path.exists(OUTDIR):
    os.makedirs(OUTDIR)

Complete Tb and P data reading 2019-12-24T19:00 - 2019-12-31T18:00


The detection process is handled with the function `detect_mcs()`.

* `ds` is the `xarray.Dataset` object we just read in.

### Detecting MCS

In [26]:
mcs_l = detect_mcs(ds, detect_scheme = DETECT_SCHEME, Tb = TB, area_Tb = AREA_TB, 
                   utm_local_zone = UTM_LOCAL_ZONE, path_save = None)

Spots detection completed


The MCS Tb features is handled with the function `features_Tb()`.

* `mcs_l` is the `Geopandas.GeoDataFrame` object we just created in the detection process.

In [27]:
mcs_l = features_Tb(mcs_l, ds)

Estimating Tb spots features: 


100%|██████████████████████████████████████████████████████████████████████████████| 3200/3200 [03:34<00:00, 14.91it/s]


* `mcs_l` is the `geopandas.GeoDataFrame` object we just created in the tracking process.

Each `row` object in `mcs_l` stores a MCS record. We can have a peak into what `msc_l` columns contains so far. 

* `time`: datetime64, hour (UTC-5) for this case.
* `Tb`: float, polygon index after being filtered by the established parameterization.
* `geometry`: geometry, MCS polygon (convex hull). To view the coordinate reference system of the geometry column, access the crs attribute. For this case is `EPSG:32718`.
* `area_Tb`: float, area polygon [$km^2$]
* `centroid_`: geometry, geometric centroid polygon (convex hull). The crs in  this case is `EPSG:32718`.
* `mean_tb`: float, brightness temperature average of the pixels composing the polygon. [$K$]
* `min_tb`: float, brightness temperature min value of the pixels composing the polygon. [$K$]
* `max_tb`: float, brightness temperature max value of the pixels composing the polygon. [$K$]

For this case 3201 MCS were detected with the selected parameterization.

In [28]:
mcs_l

Unnamed: 0,time,Tb,geometry,area_tb,centroid_,mean_tb,min_tb,max_tb
0,2019-12-24 19:00:00,1.0,"POLYGON ((1385139.077 8889318.481, 1386208.445...",14842.2,POINT (1463228.275 8937057.716),222.4039,212.5513,234.2721
1,2019-12-24 19:00:00,2.0,"POLYGON ((1595866.092 8883549.520, 1585083.176...",3630.7,POINT (1633163.573 8902402.154),219.1902,212.0816,225.6774
2,2019-12-24 19:00:00,4.0,"POLYGON ((2088725.182 8865190.993, 2066639.621...",6145.4,POINT (2126543.231 8906654.165),222.2126,214.0594,234.6743
3,2019-12-24 19:00:00,7.0,"POLYGON ((285900.134 8954798.123, 274850.936 8...",3710.1,POINT (280833.135 9000177.046),222.6613,216.3039,230.5927
4,2019-12-24 19:00:00,8.0,"POLYGON ((681204.449 8966028.400, 637345.833 8...",3526.0,POINT (677052.995 8997699.311),224.8544,218.0482,261.8145
...,...,...,...,...,...,...,...,...
3196,2019-12-31 17:00:00,180.0,"POLYGON ((97079.094 10836238.196, 75168.595 10...",2630.5,POINT (108278.034 10864738.333),218.2107,211.1152,223.0037
3197,2019-12-31 17:00:00,184.0,"POLYGON ((-691865.280 10883122.696, -758907.75...",15303.5,POINT (-748773.702 10970101.458),217.9394,203.1223,247.8695
3198,2019-12-31 17:00:00,185.0,"POLYGON ((-1029700.410 10904530.804, -1051987....",7530.1,POINT (-1010852.082 10956828.001),211.5872,201.9746,225.0643
3199,2019-12-31 17:00:00,186.0,"POLYGON ((174734.077 10879929.160, 163862.710 ...",4026.6,POINT (179987.332 10927972.864),215.6076,205.0069,226.6848


In [29]:
mcs_l.crs

<Projected CRS: EPSG:32718>
Name: WGS 84 / UTM zone 18S
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 78°W and 72°W, southern hemisphere between 80°S and equator, onshore and offshore. Argentina. Brazil. Chile. Colombia. Ecuador. Peru.
- bounds: (-78.0, -80.0, -72.0, 0.0)
Coordinate Operation:
- name: UTM zone 18S
- method: Transverse Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

The MCS P features is handled with the function `features_P`.

In [30]:
mcs_l = features_P(mcs_l, ds, min_precipitation = MIN_P, area_P = AREA_P,
                   drop_empty_precipitation = DROP_EMPTY_PRECIPITATION)

Estimating P spots features: 


100%|██████████████████████████████████████████████████████████████████████████████| 3200/3200 [03:37<00:00, 14.68it/s]


### Tracking MCS

The tracking process is handled with the function `track_mcs()`. However, these have not been characterized in this phase. The tracks features is handled with the function `features_Tracks()`.

In [31]:
tracks_l = track_mcs(mcs_l, threshold_overlapping_percentage = THRESHOLD_OVERLAPPING_P, utm_local_zone = UTM_LOCAL_ZONE,
                path_save = None)

Estimating trajectories: 


100%|██████████████████████████████████████████████████████████████████████████████| 3200/3200 [05:34<00:00,  9.57it/s]


In [32]:
tracks_l = features_Tracks(tracks_l, initial_time_hour = MIN_DURATION,
                         path_save = OUTDIR)

Estimating distance and direction between geometrics centroids: 


100%|██████████████████████████████████████████████████████████████████████████████| 1207/1207 [00:47<00:00, 25.55it/s]


* `tracks_l` is a `geopandas.GeoDataFrame` object we just created in the tracking process and stores the MCS polygons and the tracks. 
* This `GeoDataFrame` contains exactly the information previously referenced except for the dropping of some features associated to the brightness temperature and the addition of some characteristics associated to the MCS precipitation and some characteristics generated for the tracks.
* The indexing of this object is a `pandas.MultiIndex` since it hierarchically associates an id for each track and an id for each MCS.
* The indexing of this object is encrypted generated for each track using the `uuid` library. To disable encryption for both MCS and tracks, use the parameter `encrypt_index = False` in the `features_Tracks()` function. The index generated when not using encryption is a `int` iteration result  of the tracking process.
* The indexes encryption is useful when processing a long period of time and must iterate for smaller periods of time, for example every 6 months. This is a limitation imposed by the hardware used to run the algorithm.

We can have a peak into what `tracks_l` new columns contains. 


* `belong`: str, encrypted index generated for each track.
* `id_gdf`: str, encrypted index generated for each MCS.
* `geometry`: geometry, polygon. The crs is `EPSG:4326-WGS84`
* `centroid_`: geometry, geometric centroid polygon. The crs is `EPSG:4326`.
* `mean_pp`: float, precipitation average of the pixels composing the polygon [$mm \times {h^{-1}}$].
* `max_pp`: float, precipitation max value of the pixels composing the polygon [$mm \times {h^{-1}}$].
* `intersection_percentage`: float, percentage overlap between MCS [%].
* `distance_c`: float, distance between the overlapping geometric centroids of the MCS [km].
* `direction`: float, direction between the overlapping geometric centroids of the MCS [°].
* `total_duration`: float, total duration of the event or the track. This value is associated to each MCS of the corresponding track [h]. 
* `total_distance`: float, total distance of the event or the track. This value is associated to each MCS of the corresponding track [km]. 
* `mean_velocity`: float, velocity average of the  event or the track. This value is associated to each MCS of the corresponding track [$km \times {h^{-1}}$]. 

In [33]:
tracks_l

Unnamed: 0_level_0,Unnamed: 1_level_0,time,geometry,area_tb,centroid_,mean_tb,mean_pp,max_pp,intersection_percentage,distance_c,direction,total_duration,total_distance,mean_velocity
belong,id_gdf,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
000336ce-fb40,48c6-8bc8-86e332760db7,2019-12-29 20:00:00,"POLYGON ((-76.55000 4.75000, -76.75000 4.85000...",49170.9,POINT (-76.19961 6.39748),219.4333,5.5851,17.7096,85.1,101.751564,183.4,3,119.971763,39.990588
000336ce-fb40,4ef9-9045-41c92ceb3a7e,2019-12-29 19:00:00,"POLYGON ((-76.65000 6.55000, -76.75000 6.65000...",10805.4,POINT (-76.14447 7.31591),215.3396,5.9205,16.9926,21.4,,,3,119.971763,39.990588
000336ce-fb40,4fcf-aa28-7102f64558d0,2019-12-29 21:00:00,"POLYGON ((-76.65000 4.75000, -76.85000 4.95000...",56721.1,POINT (-76.09978 6.52853),219.6955,6.5042,34.0868,,18.220200,37.3,3,119.971763,39.990588
00465ab5-2707,41c2-a05b-2a0a590cb55e,2019-12-26 20:00:00,"POLYGON ((-72.45000 8.95000, -72.45000 9.35000...",4260.3,POINT (-72.11149 9.30481),220.5123,,,,41.986944,13.8,2,41.986944,20.993472
00465ab5-2707,4bdf-88dc-4a77a4be98a3,2019-12-26 19:00:00,"POLYGON ((-72.45000 8.55000, -72.55000 8.65000...",5664.9,POINT (-72.20196 8.93599),215.5026,4.9963,10.0471,35.5,,,2,41.986944,20.993472
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ffa93c63-ea01,47fc-8ea9-54a1a71618bd,2019-12-28 17:00:00,"POLYGON ((-69.15000 -9.35000, -69.35000 -9.050...",48095.1,POINT (-68.67902 -7.99769),222.6052,3.3726,13.6501,84.8,,,2,46.003902,23.001951
ffae2e57-e491,428f-a6e4-8a3b7161d41e,2019-12-25 12:00:00,"POLYGON ((-70.15000 1.35000, -70.25000 1.65000...",3097.6,POINT (-69.94995 1.70272),213.1878,,,16.3,,,2,48.575591,24.287796
ffae2e57-e491,4b46-b44a-8f9810029340,2019-12-25 13:00:00,"POLYGON ((-70.35000 1.45000, -70.65000 2.15000...",9718.1,POINT (-70.17830 2.07714),222.8209,4.8161,12.9035,3.9,48.575591,328.6,2,48.575591,24.287796
ffe9af47-d982,437e-b23e-f23bf915157a,2019-12-26 14:00:00,"POLYGON ((-63.35000 -9.15000, -63.45000 -9.050...",5076.7,POINT (-63.09473 -8.73476),215.0460,2.8806,4.5411,11.2,,,2,56.311011,28.155505


In [34]:
tracks_l.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [35]:
#Total tracks and MCS detected 

print ("total tracks detected: " + str(len(tracks_l.index.levels[0])) + " and total MCS detected: " + str(len(tracks_l.index.levels[1])))

total tracks detected: 1208 and total MCS detected: 3201


### As the last step, let is reload the tracks results from the local storage and plot the MCS with the help of the folium library.

The read process tracks is handled with the function `funcs.readTRACKS()`.


In [4]:
#-------------------Load results-------------------
tracks_l = funcs.readTRACKS('1_output_data/resume_Tb_P_2019_12_24_19_2019_12_31_17.csv')

#---------Discarding tracks that last less than 10 hours--------
tracks_l = tracks_l.reset_index()
tracks_l = tracks_l[tracks_l.total_duration >= 10].sort_index()
tracks_l = tracks_l.set_index(["belong", "id_gdf"])
tracks_l = tracks_l.sort_index()

print ("total tracks that last less than 10 hours : " + str(len(tracks_l.index.levels[0])) + " and total MCS detected: " + str(len(tracks_l.index.levels[1])))

total tracks that last less than 10 hours : 20 and total MCS detected: 227


The function `funcs.plot_folium()` save the `.html` result and return the path where was saved.


In [5]:
path_html_folium = funcs.plot_folium(tracks_l, location = LOCATION_FOLIUM, path_save = OUTDIR)

In [6]:
import IPython
iframe = '<iframe src=' + path_html_folium + ' width=1000 height=500></iframe>'
IPython.display.HTML(iframe)