<font size="6"><b>Scientific Visualization using Python 2020</b></font>

<font size="3"> October 20-23: 09:00 - 13:00 </b></font>

<img src="../figures/summary/title.png"   width="600">

<font size="3">Day 1</b></font>

- Introduction to some important data processing libraries
- Getting started with Matplotlib

<font size="5"><b>Python Visualization Landscape</b></font>

<img src="../figures/summary/viz_landscape_1.png"   width="500">

<font size="4">We will focus on:</font>

<img src="../figures/summary/viz_landscape_2.png"   width="500">

<br/><br/>
<font size="5"><b>A few Teasers</b></font>

<img src="../figures/summary/figures_teaser/ex2_2_helper_functions.png"   width="500">
<img src="../figures/summary/figures_teaser/ex1_6_annotations.png"   width="500">
<img src="../figures/summary/figures_teaser/ex3_5_stippling_RPSS.png"   width="500">
<img src="../figures/summary/figures_teaser/ex3_8_trajectories.png"   width="500">

<font size="4"><b>Checking if everything works!</b></font>

### Integrated development environment
 - Jupyter notebooks (This is what we will use) (Exercise 0.1)
 - PyCharm
 - Spyder
 - Vim
 .....

<b>Short warmup: Exercise 0.1</b>
<br/><br/>
Aim: Revising a few Python concepts and using Jupyter notebooks

### Numpy 
 - “NumPy is the fundamental package for scientific computing with python” – www.numpy.org
 - N-dimensional array

In [14]:
import numpy as np

In [15]:
a = np.arange(15).reshape(3,5)
print(a)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


In [16]:
a.shape

(3, 5)

<b>Exercise 0.2: Getting familiar with Numpy</b>

### NetCDF

- Python interface to the netCDF C library

- Module to read & write netCDF (both ver. 3 & 4) files

In [17]:
import netCDF4 as nc

In [18]:
fN = '../data/HadEX2_GSL.nc'
ncf = nc.Dataset(fN)
print(ncf)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    data: Growing season length
    source: HadEX2 (http://www.climdex.org/)
    reference: Donat et al., 2013
    dimensions(sizes): lon(96), lat(73), time(50)
    variables(dimensions): float64 lon(lon), float64 lat(lat), int32 time(time), float32 GSL(time, lat, lon), float64 trend(lat, lon), float64 p_val(lat, lon)
    groups: 


In [19]:
trend = ncf.variables['trend']

print(trend)

print(trend[30:40,30:40])

<class 'netCDF4._netCDF4.Variable'>
float64 trend(lat, lon)
    _FillValue: nan
unlimited dimensions: 
current shape = (73, 96)
filling on
[[-- -- -- -- 0.0020566136610928876 0.0020273329058180245
  0.002149247882748284 -- 0.01941575946785279 --]
 [-- -- -- -- -- -- 0.00016648749343484148 -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]]


<b>Exercise 0.2: Getting familiar with Numpy</b>

<b>Exercise 0.3: Reading NetCDF data</b>

### Xarray

- Multidimensional labeled arrays

- combines a netCDF-like data model with capabilities of pandas

In [20]:
import xarray as xr

In [21]:
fN = '../data/HadEX2_GSL.nc'

ds = xr.open_dataset(fN)

print(ds)

<xarray.Dataset>
Dimensions:  (lon: 96, lat: 73, time: 50)
Coordinates:
  * lon      (lon) float64 0.0 3.75 7.5 11.25 15.0 ... 345.0 348.8 352.5 356.2
  * lat      (lat) float64 -90.0 -87.5 -85.0 -82.5 -80.0 ... 82.5 85.0 87.5 90.0
  * time     (time) datetime64[ns] 1956-01-01 1957-01-01 ... 2005-01-01
Data variables:
    GSL      (time, lat, lon) float32 ...
    trend    (lat, lon) float64 ...
    p_val    (lat, lon) float64 ...
Attributes:
    data:       Growing season length
    source:     HadEX2 (http://www.climdex.org/)
    reference:  Donat et al., 2013


In [22]:
ds.trend

In [23]:
lat = slice(30,50)

lon = slice(360-105, 360-85)

ds.trend.sel(lat=lat, lon=lon)

### Pandas
- Statistical package for “labeled” data

- Provides an R-like DataFrame

In [24]:
import pandas as pd

In [25]:
dates = pd.date_range('20130101', periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [26]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,-1.672373,1.024128,-0.005413,0.910795
2013-01-02,0.527432,0.047632,0.643902,1.815426
2013-01-03,-1.521713,-0.940745,0.484545,0.162157
2013-01-04,0.366109,0.529675,1.134515,-0.940293
2013-01-05,-0.182168,2.665547,-1.350632,0.135855
2013-01-06,-0.112823,-0.667161,0.717541,0.824567
