<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg"
     align="right"
     width="30%"
     alt="Dask logo\">


# Introduction

Welcome to the Dask Tutorial.

Dask is a parallel computing library that scales the existing Python ecosystem. This tutorial will introduce Dask and parallel data analysis more generally.

Dask can scale down to your laptop laptop and up to a cluster. Accordingly, the tutorial comes in two pieces. In the first part, we'll use the environment you setup on your laptop to analyze medium sized datasets in parallel on your laptop.

For the second half, you'll log into a [Pangeo](https://pangeo-data.github.io/) [Jupyterhub](https://jupyterhub.readthedocs.io/en/stable/) deployment that will provide you with your own Dask cluster to solve even larger problems using a cluster of machines.

## Tutorial Structure

Each section is a Jupyter notebook. There's a mixture of text, code, and exercises.

If you hasn't used Jupyterlab, it's similar to the Jupyter Notebook. If you haven't used the Notebook, the quick intro is

1. There are two modes: command and edit
2. From command mode, press `Enter` to edit a cell (like this markdown cell)
3. From edit mode, press `Esc` to change to command mode
4. Press `shift+enter` to execute a cell and move to the next cell.

The toolbar has commands for executing, converting, and creating cells.

Each notebook will have exercises for you to solve. You'll be given a blank or partially completed cell, followed by a "magic" cell that will load the solution. For example

## Exercise: Print `Hello, world!`

Print the text "Hello, world!".

In [11]:
#! wget --no-check "http://cdat.llnl.gov/cdat/sample_data/clt.nc"
import cdms2
f1 =cdms2.open("clt.nc")
f1.listvariables()
u=f1('u')
v=f1('v')
from cdms2 import MV
vel = MV.sqrt(u[0]**2+v[0]**2)
# print(vel)
print(u.getLatitude()[:], u.getLongitude()[:])

(array([-88.2884   , -86.0711   , -83.8408   , -81.6074   , -79.37299  ,
       -77.13799  , -74.902695 , -72.6673   , -70.431694 , -68.196    ,
       -65.9603   , -63.7245   , -61.488598 , -59.2528   , -57.0169   ,
       -54.781    , -52.545097 , -50.309196 , -48.0733   , -45.8374   ,
       -43.6014   , -41.365498 , -39.129498 , -36.893597 , -34.657597 ,
       -32.4217   , -30.1857   , -27.9497   , -25.713799 , -23.477798 ,
       -21.241798 , -19.005798 , -16.7699   , -14.533899 , -12.297899 ,
       -10.061899 ,  -7.8258996,  -5.5899997,  -3.3539999,  -1.1179999,
         1.1179999,   3.3539999,   5.5899997,   7.8258996,  10.061899 ,
        12.297899 ,  14.533899 ,  16.7699   ,  19.005798 ,  21.241798 ,
        23.477798 ,  25.713799 ,  27.9497   ,  30.1857   ,  32.4217   ,
        34.657597 ,  36.893597 ,  39.129498 ,  41.365498 ,  43.6014   ,
        45.8374   ,  48.0733   ,  50.309196 ,  52.545097 ,  54.781    ,
        57.0169   ,  59.2528   ,  61.488598 ,  63.7245   ,  65.

In [2]:
# %load solutions/00-hello-world.py
print("Hello, world!")

Hello, world!


You'll need to run the solution cell twice; once to load the solution, and a second time to execute it.

## Contents

Now, let's officially start.

- [Dask Delayed](01-dask.delayed.ipynb)
- [Dask Arrays](02-dask-arrays.ipynb)
- [Dask DataFrames](03-dask-dataframes.ipynb)
- [Schedulers](04-schedulers.ipynb)
- [Distributed DataFrames](05-distributed-dataframes-and-efficiency.ipynb)
- [Advanced Distribued Techniques](06-distributed-advanced.ipynb)
- [Scalable Machine Learing](07-machine-learning.ipynb)