## General information about this notebook

This notebook series has been initiated by the Data Management Project (INF) within the TR-172 ["ArctiC Amplification: Climate Relevant Atmospheric and SurfaCe Processes, and Feedback Mechanisms" (AC)³](http://www.ac3-tr.de/) funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG)

Author(s) of this notebook: 
 - *Nina Maherndl*, [*Leipzig University, Institute of Meteorology (LIM)*](*https://www.physgeo.uni-leipzig.de/institutefuermeteorologie/*), *Stephanstraße 3, 04103 Leipzig*, *nina.maherndl@uni-leipzig.de*

Github repository: https://github.com/ac3-tr/ac3-notebooks

This notebook is licensed under the [Creative Commons Attribution 4.0 International](http://creativecommons.org/licenses/by/4.0/ "CC-BY-4.0")

# Dataset description

**Title:** Cloudnet target classification during PS106

**Author** Griesche, Hannes; Seifert, Patric; Engelmann, Ronny; Radenz, Martin; Bühl, Johannes

**Year** 2020

**Institutes** Tropos, Leipzig

**Data hosted by** [PANGAEA](https://pangaea.de)

**DOI** [10.1594/PANGAEA.919463](https://doi.org/10.1594/PANGAEA.919463)

**License** [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/ "CC-BY-4.0")

## Abstract
The dataset contains daily nc-files of the Cloudnet target classification during Polarstern cruise PS106 based on:
- 35.5 GHz Cloud radar MIRA,
- Multiwavelength Raman polarization lidar PollyXT,
- Polarstern radiosonde data
- OceanRAIN optical disdrometer ODM470,
- Microwave radiometer.
The data is retrieved using the instrument synergystic approach Cloudnet (Illingworth, 2007 https://journals.ametsoc.org/doi/10.1175/BAMS-88-6-883 ).
This variable is a simplification of the bitfield \"category_bits\" in the target categorization and data quality dataset. It provides the 9 main atmospheric target classifications that can be distinguished by radar and lidar.
Target_classification:definition 

0: Clear sky,

1: Cloud droplets only,

2: Drizzle or rain,

3: Drizzle/rain & cloud droplets,

4: Ice,

5: Ice & supercooled droplets,

6: Melting ice,

7: Melting ice &cloud droplets,

8: Aerosol,

9: Insects.


## Contents of this notebook
The purpose of this notebook is to show how to dowload and use the (AC)³ data set "Cloudnet target classification during PS106". It contains the following plotting examples:
- location of the research vessel Polarstern (PS)
- cloud net target classification during specified day
- cloud top height during specified day
- cloud base height during specified day
- target classification + cloud top and base heights during specified day

## Import relevant modules

In [1]:
# basics, data handling
import numpy as np
import pandas as pd
import xarray as xr
import pangaeapy as pgp
import datetime as dt
import os

# plotting:
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from matplotlib import cm
from matplotlib.colors import LinearSegmentedColormap

# plotting maps
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER

%matplotlib inline

## Pre-processing of the imported data
The pangaea python library pangaeapy 0.0.7 (https://pypi.org/project/pangaeapy/#description) is used to get the download links for the dataset.

In [2]:
ds = pgp.PanDataSet(919463)

ds.data contains a pandas dataframe with file information and urls:

In [3]:
dl = ds.data

In [None]:
dates = []
for name in dl['File name'].values:
    dates.append(name.split('_')[0][0:4]+'-'+name.split('_')[0][4:6]+'-'+name.split('_')[0][6:8])
dl['Date/Time'] = dates

### Selecting the date
Pick the date from the list that you are interested in.

In [None]:
date = dt.datetime(2017, 5, 24)# default setting

### Take a look at all data:
**Attention:** If you select 'download all' you will be downloading all 55 datafiles (about 420 GB).

In [None]:
all_data = False # default setting

### Downloading the datafile
Optional: If you don't care that a new folder will be created and that the data from pangaea (for the day you chose) will be downloaded, then you can go straight to 'Reading in the datafile'. Make sure you run the cells, though.  

In [None]:
# create a folder to save downloaded data in

# name and path of folder to download data to
datafolder = '../pangaea_download/'

if not os.path.exists(datafolder):
    os.mkdir(datafolder)
else:
    print('Download folder already exists!')

In [None]:
# read url from pangea dataset

fname_list = []

if all_data == True:
    url_list = dl['URL file']
    
    for url in url_list:
        url = str(url)
        
        fname = url[url.rfind('/')+1:]
        fname_list.append(fname)
        print ('Link: ', url)
        try:
            if not os.path.exists(os.path.join(datafolder,fname)):
                os.system('wget -O '+os.path.join(datafolder,fname)+' '+url)
                print ('Download finished')
            else:
                print ('File already there...')
        except:
            print ('Could not download automatically, please try manual download! (Link above)')

if all_data == False:
    url = dl[dl['Date/Time'] == date.strftime('%Y-%m-%d')]['URL file'].values[0]
    
    fname = url[url.rfind('/')+1:]
    fname_list.append(fname)
    print ('Link: ', url)
    try:
        if not os.path.exists(os.path.join(datafolder,fname)):
            os.system('wget -O '+os.path.join(datafolder,fname)+' '+url)
            print ('Download finished')
        else:
            print ('File already there...')
    except:
        print ('Could not download automatically, please try manual download! (Link above)')

## Reading in the datafile
The Cloudnet data is stored in nc files including time and height as coordinates:

In [None]:
if all_data == True:
    print('Attention: all data option was selected, however only last date will be plotted!')
    data = xr.open_dataset(os.path.join(datafolder,fname))

else:
    data = xr.open_dataset(os.path.join(datafolder,fname))

# Location of the Polarstern
gps coordinates of the Polarstern track are not included in the Cloudnet target classification dataset. If you want to see the location of Polarstern for your chosen date, you have to load the Mastertrack data (https://doi.pangaea.de/10.1594/PANGAEA.881579?format=html#download and https://doi.pangaea.de/10.1594/PANGAEA.881580). The data is also published on pangaea, so it can be read with pangaeapy.

In [None]:
track_data1 = pgp.PanDataSet(881579).data
track_data2 = pgp.PanDataSet(881580).data
track_data = pd.concat([track_data1, track_data2])

In [None]:
track_data = track_data[(track_data['Date/Time'] > date.strftime('%Y-%m-%d 00:00:00')) & 
                        (track_data['Date/Time'] < date.strftime('%Y-%m-%d 23:59:59'))] 

In [None]:
lat = track_data['Latitude'].values
lon = track_data['Longitude'].values

In [None]:
fig = plt.figure(figsize=(6,6))

ax = fig.add_subplot(1,1,1, projection=ccrs.Orthographic(lon.mean(), lat.mean()))
ax.plot(lon, lat, 'bx', transform=ccrs.PlateCarree())
ax.set_extent([lon.min()-15, lon.max()+15, lat.min()-5, lat.max()+5], ccrs.PlateCarree())
ax.coastlines()
ax.set_aspect(0.8)

gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=2, color='gray', alpha=0.5, linestyle='--')
gl.top_labels = False
gl.left_labels = False
gl.xformatter = LONGITUDE_FORMATTER
gl.yformatter = LATITUDE_FORMATTER

* If this results in an error, make sure to update matplotlib and cartopy. 

## Plotting example
### Cloud net classification 

In [None]:
colors =  ['white', 'skyblue', 'red', 'blue', 'yellow', 'limegreen', 'orange', 'darkcyan', 'lightgrey', 'grey']
cmap = LinearSegmentedColormap.from_list('cloudnet_cmap', colors, N=len(colors))

fig, ax = plt.subplots(figsize=(8,4))
pc = ax.pcolormesh(data.time.values, data.height.values, data.target_classification.values.T, 
                   shading='auto', cmap=cmap, vmin=-0.5, vmax=9.5)
ax.set_xlabel('time UTC')
ax.set_ylabel('height (m)')
fig.autofmt_xdate()

cb = fig.colorbar(pc, ticks=[0,1,2,3,4,5,6,7,8,9])
cb.ax.set_yticklabels(['Clear sky', 'Cloud droplets only', 'Drizzle or rain', 'Drizzle/rain & cloud droplets', 
                       'Ice', 'Ice & supercooled droplets', 'Melting ice', 'Melting ice &cloud droplets', 
                       'Aerosol', 'Insects'])
plt.show()

### Cloud top height

In [None]:
fig, ax = plt.subplots(figsize=(6,4))
ax.plot(data.time.values, data.cloud_top_height.values)
ax.set_xlabel('time UTC')
ax.set_ylabel('height (m)')
fig.autofmt_xdate()
plt.show()

### Clout base height

In [None]:
fig, ax = plt.subplots(figsize=(6,4))
ax.plot(data.time.values, data.cloud_base_height.values)
ax.set_xlabel('time UTC')
ax.set_ylabel('height (m)')
fig.autofmt_xdate()
plt.show()

### Cloud net classification + cloud base and top heights

In [None]:
fig, ax = plt.subplots(figsize=(10,6))
pc = ax.pcolormesh(data.time.values, data.height.values, data.target_classification.values.T, 
                   shading='auto', cmap=cmap, vmin=-0.5, vmax=9.5)

ax.plot(data.time.values, data.cloud_top_height.values, 'k', lw=1, label= 'cloud top height')
ax.plot(data.time.values, data.cloud_base_height.values, color='darkviolet', lw=1, label= 'cloud base height')

ax.set_xlabel('time UTC')
ax.set_ylabel('height (m)')

fig.autofmt_xdate()

cb = fig.colorbar(pc, ticks=[0,1,2,3,4,5,6,7,8,9])
cb.ax.set_yticklabels(['Clear sky', 'Cloud droplets only', 'Drizzle or rain', 'Drizzle/rain & cloud droplets', 
                       'Ice', 'Ice & supercooled droplets', 'Melting ice', 'Melting ice &cloud droplets', 
                       'Aerosol', 'Insects'])
plt.legend(loc='upper right')
plt.show()