<font size="6"><b>Scientific Visualization using Python 2020</b></font>

<font size="3"> October 20-23: 09:00 - 13:00 </b></font>

<img src="../figures/summary/title.png"   width="600">

<font size="3">Day 1</b></font>

- Introduction to some important data processing libraries
- Getting started with Matplotlib

<font size="5"><b>Python Visualization Landscape</b></font>

<img src="../figures/summary/viz_landscape_1.png"   width="500">

<font size="4">We will focus on:</font>

<img src="../figures/summary/viz_landscape_2.png"   width="500">

<br/><br/>
<font size="5"><b>A few Teasers</b></font>

<img src="../figures/summary/figures_teaser/ex2_2_helper_functions.png"   width="500">
<img src="../figures/summary/figures_teaser/ex1_6_annotations.png"   width="500">
<img src="../figures/summary/figures_teaser/ex3_5_stippling_RPSS.png"   width="500">
<img src="../figures/summary/figures_teaser/ex3_8_trajectories.png"   width="500">

<font size="4"><b>Checking if everything works!</b></font>

### Integrated development environment
 - Jupyter notebooks (This is what we will use) (Exercise 0.1)
 - PyCharm
 - Spyder
 - Vim
 .....

<b>Short warmup: Exercise 0.1</b>
<br/><br/>
Aim: Revising a few Python concepts and using Jupyter notebooks

### Numpy 
 - “NumPy is the fundamental package for scientific computing with python” – www.numpy.org
 - N-dimensional array

In [3]:
import numpy as np

In [5]:
a = np.arange(15).reshape(3,5)
print(a)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


In [7]:
a.shape

(3, 5)

<b>Exercise 0.2: Getting familiar with Numpy</b>

### NetCDF

- Python interface to the netCDF C library

- Module to read & write netCDF (both ver. 3 & 4) files

In [30]:
import netCDF4 as nc

In [33]:
fN = '../data/HadEX2_GSL.nc'
ncf = nc.Dataset(fN)
print(ncf)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    data: Growing season length
    source: HadEX2 (http://www.climdex.org/)
    reference: Donat et al., 2013
    dimensions(sizes): lon(96), lat(73), time(50)
    variables(dimensions): float64 lon(lon), float64 lat(lat), int32 time(time), float32 GSL(time, lat, lon), float64 trend(lat, lon), float64 p_val(lat, lon)
    groups: 


In [36]:
trend = ncf.variables['trend']

print(trend)

print(trend[30:40,30:40])

<class 'netCDF4._netCDF4.Variable'>
float64 trend(lat, lon)
    _FillValue: nan
unlimited dimensions: 
current shape = (73, 96)
filling on
[[-- -- -- -- 0.0020566136610928876 0.0020273329058180245
  0.002149247882748284 -- 0.01941575946785279 --]
 [-- -- -- -- -- -- 0.00016648749343484148 -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]]


<b>Exercise 0.2: Getting familiar with Numpy</b>

<b>Exercise 0.3: Reading NetCDF data</b>

### Xarray

- Multidimensional labeled arrays

- combines a netCDF-like data model with capabilities of pandas

In [37]:
import xarray as xr

In [38]:
fN = '../data/HadEX2_GSL.nc'

ds = xr.open_dataset(fN)

print(ds)

<xarray.Dataset>
Dimensions:  (lat: 73, lon: 96, time: 50)
Coordinates:
  * lon      (lon) float64 0.0 3.75 7.5 11.25 15.0 ... 345.0 348.8 352.5 356.2
  * lat      (lat) float64 -90.0 -87.5 -85.0 -82.5 -80.0 ... 82.5 85.0 87.5 90.0
  * time     (time) datetime64[ns] 1956-01-01 1957-01-01 ... 2005-01-01
Data variables:
    GSL      (time, lat, lon) float32 ...
    trend    (lat, lon) float64 ...
    p_val    (lat, lon) float64 ...
Attributes:
    data:       Growing season length
    source:     HadEX2 (http://www.climdex.org/)
    reference:  Donat et al., 2013


In [39]:
ds.trend

<xarray.DataArray 'trend' (lat: 73, lon: 96)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
  * lon      (lon) float64 0.0 3.75 7.5 11.25 15.0 ... 345.0 348.8 352.5 356.2
  * lat      (lat) float64 -90.0 -87.5 -85.0 -82.5 -80.0 ... 82.5 85.0 87.5 90.0

In [40]:
lat = slice(30,50)

lon = slice(360-105, 360-85)

ds.trend.sel(lat=lat, lon=lon)

<xarray.DataArray 'trend' (lat: 9, lon: 6)>
array([[-0.013822, -0.017993,  0.028642,  0.068076,  0.050725,  0.052645],
       [ 0.042555, -0.08705 ,  0.044582,  0.072828,  0.033329,  0.08891 ],
       [ 0.286903, -0.121509, -0.163817, -0.209337, -0.071513,  0.080229],
       [ 0.20771 , -0.051205, -0.129159, -0.217335, -0.049765,  0.085104],
       [ 0.126969,  0.000905,  0.014469,  0.052174,  0.166199,  0.24968 ],
       [ 0.080082,  0.036576,  0.072964,  0.082449,  0.183254,  0.271937],
       [ 0.209573,  0.201362,  0.147796,  0.125882,  0.153077,       nan],
       [ 0.191687,  0.171211,  0.084009,  0.078607, -0.008008,       nan],
       [ 0.13068 ,  0.106646,  0.065728,  0.041136, -0.049716, -0.080854]])
Coordinates:
  * lon      (lon) float64 255.0 258.8 262.5 266.2 270.0 273.8
  * lat      (lat) float64 30.0 32.5 35.0 37.5 40.0 42.5 45.0 47.5 50.0

### Pandas
- Statistical package for “labeled” data

- Provides an R-like DataFrame

In [41]:
import pandas as pd

In [42]:
dates = pd.date_range('20130101', periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [44]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.101495,1.680337,0.49372,0.103922
2013-01-02,0.643077,0.045238,-1.861463,0.368117
2013-01-03,-1.093761,0.096083,-1.671314,-0.224084
2013-01-04,-0.921476,0.365818,0.616308,-0.442112
2013-01-05,-0.012086,0.815396,0.101439,1.656478
2013-01-06,0.389411,0.850273,-0.862893,0.685046
