# Data Processing

This notebook demonstrates how data processing has been carried out for the study.

Let's start with importing the required modules.

In [2]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
# from matplotlib.patches import Circle
# from matplotlib.collections import PatchCollection
# from matplotlib.lines import Line2D
# import matplotlib.colors as colors
# import matplotlib

import metpy.calc as mpcalc
from metpy.units import units 
 
# import glob
# from tqdm.notebook import tqdm
import seaborn as sb
import pandas as pd
# import datetime
# sb.set_palette('Set2')
# from math import ceil
# import scipy as sp
# from pylab import shape,size
# import string
# from matplotlib.patches import ConnectionPatch
# import warnings

# import joanne
# from joanne.Level_4 import rgr_fn as rf
# from joanne.Level_4 import ready_ds_for_regression as prep
# from joanne.Level_4 import dicts

from src import plotting_functions as pf

import eurec4a

## Reading & Subsetting Data
--------

For accessing [JOANNE](https://doi.org/10.5194/essd-13-5253-2021) data, we use the intake catalog set up for EUREC<sup>4</sup>A. 

<div class="alert alert-block alert-info">
The <a href="https://docs.ipfs.tech/concepts/content-addressing/#identifier-formats">content identifier (CID)</a>, an <a href="https://docs.ipfs.tech/concepts/what-is-ipfs/#decentralization">IPFS</a> hash in this case, is fixed for the study to make it as reproducible as possible. This CID is provided to the <a href="https://github.com/eurec4a/eurec4a-intake">EUREC<sup>4</sup>A intake catalog</a> with the <code>use_ipfs</code> argument. For JOANNE data, the provided CID links to v2.0.0.
</div>

In [6]:
cat = eurec4a.get_intake_catalog(use_ipfs="QmahMN2wgPauHYkkiTGoG2TpPBmj3p5FoYJAq9uE9iXT9N")

For this notebook, we will only be needing Levels 3 & 4 of JOANNE. We use them as [Dask](https://docs.dask.org/en/stable/)-ified datasets as shown below.

In [7]:
jo_l3 = cat.dropsondes.JOANNE.level3.to_dask()
jo_l4 = cat.dropsondes.JOANNE.level4.to_dask()

We pick out selected EUREC<sup>4</sup>A circles for further analysis. See manuscript for details on EUREC<sup>4</sup>A circles.

<div class="alert alert-block alert-warning">
Note that one circle (circle ID <code>HALO-0215_c3</code>) during the flight on 15.02.2020 was flown over the NTAS buoy, farther east than the EUREC<sup>4</sup>A circle region. This does not qualify as a EUREC<sup>4</sup>A circle and therefore, we exclude it from any analyses.
</div>

In [8]:
eurec4a_circles = jo_l4.where(
    jo_l4.platform_id=='HALO',drop=True).where(
    jo_l4.segment_id != 'HALO-0215_c3',drop=True).where(
    jo_l4.segment_id != 'HALO-0119_c1',drop=True) # only EUREC4A-circles of HALO

We will also be using the radiative profiles dataset by [Albright et al (2021)](https://doi.org/10.5194/essd-13-617-2021). These are derived from EUREC<sup>4</sup>A's [radiosonde](https://doi.org/10.5194/essd-13-491-2021) and dropsonde (JOANNE) sounding data, and will supplement our analyses from JOANNE. 

In [17]:
ds_rad = cat.radiative_profiles.clear_sky.to_dask()

# Changing sounding_id from object to str for ease of use of dataset
ds_rad['sounding_id'] = ds_rad['sounding_id'].astype('str')