# Dask

- Suitable for both CPU bound and Memory bound problems
- Distributes stroage and compute
- Efficiently utilize multiple CPUs on a single node or multiple nodes (can cross node boundary)
- Handles big data that cannot fit in the memory


#### DASK in Python Ecosystem


![image.png](attachment:image.png)

#### Dask-API for Scikit-Learn to perform distributed task executions
![image.png](attachment:image-2.png)


## Dask Collections
- dask.bag: an unordered set, effectively a distributed replacement for Python iterators, read from text/binary files or from arbitrary Delayed sequences
- dask.array: Distributed arrays with a numpy-like interface, great for scaling large matrix operations
- dask.dataframe: Distributed pandas-like dataframes, for efficient handling of tabular, organized data
- dask_ml: distributed wrappers around scikit-learn-like machine-learning tools

In [1]:
# Importing dask array and dataframe
import dask
import dask.array as da
import dask.dataframe as dd
dask.__version__

'2.30.0'

## Dask delayed and compute
- Delayed function -  builds task graphs
- Compute function -  Executes the tasks according to the Scheduler

## Dask Scheduler
- Threads - the default choice, calling compute() or compute(scheduler=’threads’). This uses multiple threads in the same processes. 
- Processes - uses a pool of child process, calling compute(scheduler-’process’).Each process has its own Python interpreter. This takes longer to start up than threads. 
- Single thread - no parallelism, calling .compute(scheduler=’single-threaded’). Useful for debugging. 
- Distributed - uses a pool of worker processes along with a scheduler process. It can be used on a single machine or scaled out to many machines. 

## Dask Distributed Cluster

In [2]:
from dask.distributed import Client, LocalCluster
client = Client(n_workers=6, threads_per_worker=4, memory_limit='4GB')
client 

0,1
Client  Scheduler: tcp://127.0.0.1:46628  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 6  Cores: 24  Memory: 24.00 GB


In [4]:
client.shutdown()