# Dispersion Event Search

Following the research of Lockwood et al, we define the Eic paramater of a flux distribution as the energy under the energy of peak flux, whose flux is equal to 10% of the peak.

To search for dispersion events, we measure graduaual $\Delta$Eic in the correct direction when physical parameters are in range. This is done by defining a function $D(t)$ that is integrated over a window of time when $55^{\circ} < |\mathrm{MLAT(t)}| < 90^{\circ}$. If $\int_{tf}^{ti}D(t) dt > \mathrm{threshold}$ (respresenting net change in Eic over the window) and $\int_{t_i}^{t_f}{D(t)dt} > 0.8 \int_{t_i}^{t_f}{|D(t)|dt}$ (representing the majority of the instantaneous change is in the correct direction for dispersion), the window is accepted as being part of a dispersion event. At the end of the algorithm, overlapping windows are merged together.

Definition of the integrand $D(t)$ is as follows. The change in magnetic latitude is utilized to retain a positive sign when the change is in the correct direction, regardless of whether the satellite is traveling northward/southward or is coming from above/below the event.

Search for:

$\Large{\int_{t_i}^{t_f}{D(t)dt}} > \mathrm{threshold}$

and:

$\Large{\int_{t_i}^{t_f}{D(t)dt}} > 0.8 \Large{\int_{t_i}^{t_f}{|D(t)|dt}}$


Where..

$\Large{D(t) = -\mathrm{sgn}(\frac{d|\mathrm{MLAT}|}{dt})a(t)b(t)\frac{dEic}{dt}}$

and

$
a(t) =   \left\{
\begin{array}{ll}
      1 & \mathrm{Density} > \mathrm{threshold} \\
      0 & \mathrm{Otherwise} \\
\end{array} 
\right. 
$

$
b(t) =   \left\{
\begin{array}{ll}
      1 & \mathrm{Peak\ Flux\ at\ Eic} > \mathrm{threshold} \\
      0 & \mathrm{Otherwise} \\
\end{array} 
\right. 
$

In [None]:
import progressbar
import h5py
import pandas as pd
from datetime import datetime, timedelta
import numpy as np
import pytz
import warnings
import pylab as plt
%matplotlib inline
from matplotlib.colors import LogNorm

import search_dispersion_events 
import importlib
importlib.reload(search_dispersion_events)

In [None]:
df = pd.read_csv('data/train.csv', parse_dates=['start_time', 'end_time'])
df.head()

In [None]:

def search(row, plot=True):
    importlib.reload(search_dispersion_events)
    
    # Do computation --------------------------------------------------
    fh = search_dispersion_events.read_file(row.filename)

    dEicdt_smooth, Eic = search_dispersion_events.estimate_log_Eic_smooth_derivative(fh)

    df_match, integrand, integral, upper_area_frac = search_dispersion_events.walk_and_integrate(
        fh, dEicdt_smooth, Eic, search_dispersion_events.DEFAULT_INTERVAL_LENGTH,
        return_integrand=True
    )
    
    if not plot:
        return df_match

    # Do plotting --------------------------------------------------
    for _, row_match in df_match.iterrows():
        i = fh['t'].searchsorted(row_match.start_time)
        j = fh['t'].searchsorted(row_match.end_time)

        fig, axes = plt.subplots(2, 1, figsize=(18, 6), sharex=True)

        im = axes[0].pcolor(fh['t'][i:j], np.log10(fh['ch_energy']), fh['ion_d_ener'][:, i:j], 
                            norm=LogNorm(vmin=1e3, vmax=1e8), cmap='jet')
        plt.colorbar(im, ax=axes[0]).set_label('Log Energy Flux')
        plt.colorbar(im, ax=axes[1]).set_label('')

        axes[0].plot(fh['t'][i:j], Eic[i:j], 'b*-')
        axes[0].invert_yaxis()
        axes[0].set_ylabel('Log Energy [eV] - Ions')

        time_length = row_match.end_time - row_match.start_time
        fig.suptitle(f'{time_length.total_seconds() / 60:.1f} minutes : '
                     f'{row_match.start_time.isoformat()} - {row_match.end_time.isoformat()}', fontweight='bold')
        
        title = 'MLAT = (%.1f deg to %.1f deg)' % (fh['mlat'][i], fh['mlat'][j])
        title += ' Northward' if fh['mlat'][j] > fh['mlat'][i] else ' Southward'
        title += f' -- Integral {float(integral[i:j].max()):.2f} -- UpperFrac {float(upper_area_frac[i]):.2f}'
        axes[0].set_title(title)

        axes[1].fill_between(fh['t'][i:j], 0, integrand[i:j])
        axes[1].axhline(0, color='black', linestyle='dashed')
        axes[1].set_ylim([-.25, .25])
        axes[1].set_ylabel('D(t) [eV/s]')
        
        if row.start_time > row_match.start_time and row.start_time < row_match.end_time:
            axes[1].axvspan(row_match.start_time, row_match.end_time, color='gray', alpha=0.3)
            
    return df_match

In [None]:
search(df[df['class']==1].iloc[2])

# Check Provided Examples exist in Matches for their Respective Day

In [None]:
import joblib

def test(row):
    df_match = search(row, plot=False)
    b=False
    for _, match in df_match.iterrows():
        x1, x2 = row.start_time, row.end_time
        y1, y2 = match.start_time, match.end_time
        
        if x1 <= y2 and y1 <= x2:
            b=True
            break    
    return (row, b)

In [None]:
from joblib import Parallel, delayed
results = Parallel(n_jobs=12)(delayed(test)(row) for _, row in df[df['class']==1].iterrows())

In [None]:
correct_fraction = sum(1 for (row, b) in results if b)/len(results)
print('Correctly classified the following fraction of provided examples:')
print(f'{correct_fraction:.2f}')

In [None]:
print('Provided examples not detected:')
print([row.name for row, b in results if not b])