# Track Mesoescale Convective Systems (MCS) at consecutive time steps to form tracks

This notebook tracks detected MCS at individual time steps to form tracks.

The principle for detecting the MCS is from the cold top of the clouds, according to a limiting range of brightness temperature, and an approximate horizontal area generating from the convex hull. The algorithm has the option to operate with only brightness temperature or associating this scheme with precipitation features. The selection criteria and filters for considering or not a MCS can be modified. The parameterization in this notebook was established from an extensive literature review, which can be consulted below.

## The detection of the MCS (regions) is performed using these steps:

1. At any time pixel, find all where temperature brightness `Tb` $\le 225 K$ and trace an approximate region, with the convex hull, according to a binary structure where the pixels that satisfy the described condition are equal to $1$ and those that do not are equal to $0$.
2. Transform from geographic to plane coordinates the pixels and compute an approximate area of those regions traced. 
3. Discard all regions whose area is $\le 2000 km^2$.
4. Estimate the average, minimum and maximum temperature brightness of those regions.
5. At any time pixel, find all where precipitation `P` $\ge 2 mm \times {h^{-1}}$ and discard pixels that do not match with the regions estimated in that time step.
6. Estimate an approximate area of those precipitation pixels, that satisfy the previous condition and that are contained in the regions.
7. Estimate the average and maximum precipitation for each region whose area is $\ge 500 km^2$. The algorithm has the option of discard those regions whose precipitaion area is $\le 500 km^2$, but in this case those regions are going to be part of the possible tracks.

## The tracks are performed using these steps:

Specifically, assume we have detected $n$ MSC at time $t$, and $m$ MSC at time $t+1$. There are theoretically $n \times m$ possible associations to link these two groups of MCS. Of cause not all of them are meaningful. The rules that are applied in the association process are:

1. **overlapping priority** principle: for any MCS at time $t$, the MCS with the highest percentage of overlap at time $t+1$ "wins" and is associated with it. 
2. The MCS with the lowest percentage of overlap at time $t+1$ could form a track on their own, and waits to be associated in the next iteration between $t+1$ and $t+2$.
3. No merging or splitting is allowed, any MCS at time $t$ can only be linked to one MCS at time $t+1$, similarly, any MCS at time $t+1$ can only be linked to one MCS at time $t$.
4. All tracks that do not get updated during the $t$ - $t+1$ process terminates. This assumes that no gap in the track is allowed. 
5. Discard all tracks whose total duration is $\le 4 h$. The algorithm has the option for filtering the tracks with a specific minimun duration or not.

## Input data

**Brightness Temperature:**
NCEP/CPC L3 (Merge IR V1): Spatial and temporal resolution is 4 km and 30 minutes, 
data availability from February 7, 2000 to present. The interest variable of this dataset is `Tb`.
https://doi.org/10.5067/P4HZB9N27EKU

**Precipitation:**
GPM (IMERG V06B): Spatial and temporal resolution is 10 km and 30 minutes, 
data availability from June 1, 2000 to present. The interest variable of this dataset is `PrecipitationCal`.
https://doi.org/10.5067/GPM/IMERG/3B-HH/06

In this case the algorithm is executed for 5 days (2019/12/25/ 00 - 2019/12/31/ 00) UTC. The input data is in the folder notebook in the reposotory and could be download for specifics dates with the links at the top.

## Steps

1. Make sure you have successfully run the previous notebook.
2. Execute the following code blocks in sequence.

## Results

* `.csv`: a csv table listing various attributes for the tracks and MCS associated.
* `plots/ar_track_198405.html` (folium): plot of the geographical locations of the MSC with informations that links to the associaded tracks and features of the MSC.

## Bibliography
* Feng, Z., Leung, L. R., Liu, N., Wang, J., Houze, R. A., Li, J., Hardin, J. C., Chen, D., & Guo, J. (2021). A Global High‐resolution Mesoscale Convective System Database using Satellite‐derived Cloud Tops, Surface Precipitation, and Tracking. Journal of Geophysical Research: Atmospheres. https://doi.org/10.1029/2020jd034202
* Li, J., Feng, Z., Qian, Y., & Leung, L. R. (2020). A high-resolution unified observational data product of mesoscale convective systems and isolated deep convection in the United States for 2004–2017. Earth System Science Data Discussions, October, 1–48. https://doi.org/10.5194/essd-2020-151
* Liu, W., Cook, K. H., & Vizy, E. K. (2019). The role of mesoscale convective systems in the diurnal cycle of rainfall and its seasonality over sub-Saharan Northern Africa. Climate Dynamics, 52(1–2), 729–745. https://doi.org/10.1007/s00382-018-4162-y
* Vizy, E. K., & Cook, K. H. (2018). Mesoscale convective systems and nocturnal rainfall over the West African Sahel: role of the Inter-tropical front. Climate Dynamics, 50(1–2), 587–614. https://doi.org/10.1007/s00382-017-3628-7


# Set paths

As before, first we give the locations to the input (raw data) using `TBDIR` and `PDIR` and output data using `OUTDIR`.

In [3]:
%matplotlib inline
import os

TBDIR=os.path.join('1_input_data', 'tb')
PDIR=os.path.join('1_input_data', 'p')

OUTDIR=os.path.join('1_output_data')

Below is the important parameters used in the tracking process.

* `UTM_LOCAL_ZONE`: int, is needed for converting the WGS geodetic coordinate system to plane coordinate system. This is a constant that must be asociated with the interest region. 
* `UTC_LOCAL_HOUR`: int, is needed for converting the raw data hour (UTC) to a local hour (interest region).
* `UTC_LOCAL_SIGN`: str (minus, plus, local), is needed for converting the raw data hour (UTC) to a local hour (interest region).
* `VARIABLES`: str (Tb, Both), association scheme: Tb or Both (Tb and P). 
* `TB`: int, MCS based on limited maximun threshold cold cloud top.
* `AREA_TB`: int, MCS with a minimun largest area.
* `MIN_P`: int, MCS with a minimun precipitation pixel.
* `AREA_P`: int, MCS with an mimimum area precipitation.
* `DROP_EMPTY_PRECIPITATION`: boolean, if `True` eliminates MCS that do not contain precipitation with the `MIN_P` and `AREA_P` selected. 
* `THRESHOLD_OVERLAPPING_P`: int, percentage overlap limit between MCS.
* `LOCATION_FOLIUM`: list (lat, lon), location for center the map_folium.
* `MIN_DURATION`: int, minimum required number of hours of a track.


In [None]:
#32718 is the UTM zone 18S plane coordinate system. 
#It was Used for tracking MCS in South America - Colombia.
UTM_LOCAL_ZONE = 32718 

#UTC-5 is the local hour for Colombia
UTC_LOCAL_HOUR = 5
UTC_LOCAL_SIGN = "minus"

#Scheme of association
VARIABLES = "Both"

TB = 225 #[Feng et al.,(2021); Li et al.,(2020)]

AREA_TB = 2000 # [Lui et al., (2019); Vizy & Cook,(2018)] 

MIN_P = 2 #[Feng et al.,(2021)]  

AREA_P = 500 #[Feng et al.,(2021)]

DROP_EMPTY_PRECIPITATION = False

THRESHOLD_OVERLAPPING_P = 0

LOCATION_FOLIUM = [5, -73.94]

MIN_DURATION = 10



Import modules

In [None]:
#--------Import modules-------------------------
import os, sys
import numpy as np
import pandas as pd

from ipart.AR_tracer import readCSVRecord, trackARs, filterTracks

Then read in the data from the previous step -- the `csv` table containing AR records at individual time points.

Also make sure the output folder exists.

In [None]:
print('\n# Read in file:\n', 'ar_records.csv')
ardf=readCSVRecord(RECORD_FILE)

if not os.path.exists(OUTPUTDIR):
    os.makedirs(OUTPUTDIR)

if SCHEMATIC:
    plot_dir=os.path.join(OUTPUTDIR, 'plots')
    if not os.path.exists(plot_dir):
        os.makedirs(plot_dir)

The tracking process is handled with this single function `trackARs()`.

* `ardf` is the `pandas.DataFrame` object we just read in.
* `track_list` is a list of `AR` objects, the class definition can be found in `ipart.AR_tracer.py`.

In [None]:
track_list=trackARs(ardf, TIME_GAP_ALLOW, MAX_DIST_ALLOW,
        track_scheme=TRACK_SCHEME, isplot=SCHEMATIC, plot_dir=plot_dir)

We can have a peak into what `track_list` contains:

In [None]:
print('Number of AR tracks = ', len(track_list))
print(track_list[0])

In [None]:
track_list[0].data

In [None]:
track_list[0].duration

In [None]:
track_list[7].data

In [None]:
track_list[6].duration

Each `AR` object in `track_list` stores a sequence of AR records that form a single track. The 1st one, `track_list[0]` is a short track with only 1 record. This one will be filtered out given a minimum duration requirement of 24 hours.

The 7th one, `track_list[6]`, lasted for 36 hours.

To filter out short tracks, and those that consist only of *relaxed* AR records: 

In [None]:
#------------------Filter tracks------------------
track_list=filterTracks(track_list, MIN_DURATION, MIN_NONRELAX)
print(len(track_list))

Note that now the number of tracks has dropped to 9.

Lets plot out the sequence of an AR. Only the AR axis is plotted, and a black-to-yellow color scheme is used to indicate the evolution of the AR.

In [None]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from ipart.utils import plot

latax=np.arange(LAT1, LAT2)
lonax=np.arange(LON1, LON2)

plot_ar=track_list[7]

figure=plt.figure(figsize=(12,6),dpi=100)
ax=figure.add_subplot(111, projection=ccrs.PlateCarree())
plot.plotARTrack(plot_ar,latax,lonax,ax,full=True)

## As the last step, save the results to disk.

In [None]:
#-------------------Save output-------------------
for ii in range(len(track_list)):
    tii=track_list[ii]
    trackidii='%d%d' %(tii.data.loc[0,'time'].year, ii+1)
    tii.data.loc[:,'trackid']=trackidii
    tii.trackid=trackidii

    if ii==0:
        trackdf=tii.data
    else:
        trackdf=pd.concat([trackdf,tii.data],ignore_index=True)

    figure=plt.figure(figsize=(12,6),dpi=100)
    ax=figure.add_subplot(111, projection=ccrs.PlateCarree())
    plot.plotARTrack(tii,latax,lonax,ax,full=True)

    #----------------- Save plot------------
    plot_save_name='ar_track_%s' %trackidii
    plot_save_name=os.path.join(plot_dir,plot_save_name)
    print('\n# <river_tracker2>: Save figure to', plot_save_name)
    figure.savefig(plot_save_name+'.png',dpi=100,bbox_inches='tight')

    plt.close(figure)

#--------Save------------------------------------
abpath_out=os.path.join(OUTPUTDIR,'ar_tracks_1984.csv')
print('\n# Saving output to:\n',abpath_out)
if sys.version_info.major==2:
    np.set_printoptions(threshold=np.inf)
elif sys.version_info.major==3:
    np.set_printoptions(threshold=sys.maxsize)
trackdf.to_csv(abpath_out,index=False)