# Use Dask with Big Data
- https://towardsdatascience.com/why-and-how-to-use-dask-with-big-data-746e34dac7c3
- https://dask.org
- https://github.com/dask/dask-tutorial

Now, Pandas is very efficient with small data (usually from 100MB up to 1GB) and performance is rarely a concern.
But when you have more data that’s way larger than your local RAM (say 100GB), you can either still use Pandas to handle data with some tricks to certain extent or choose a better tool — in this case, Dask.

How to install: https://docs.dask.org/en/latest/install.html

## 1. Read CSV files to Dask dataframe

In [0]:
import dask.dataframe as dd
df = dd.read_csv('https://e-commerce-data.s3.amazonaws.com/E-commerce+Data+(1).csv', encoding = 'ISO-8859-1', blocksize=32e6)

In [2]:
df

Unnamed: 0_level_0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
npartitions=2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
,object,object,object,int64,object,float64,float64,object
,...,...,...,...,...,...,...,...
,...,...,...,...,...,...,...,...


## 2. Use compute() to execute the operation

In [3]:
df.UnitPrice.mean()

dd.Scalar<series-..., dtype=float64>

Most Dask user interfaces are lazy, meaning that they don’t evaluate until you explicitly ask for a result using the compute method.

In [4]:
df.UnitPrice.mean().compute()

4.611113626088513

## 3. Check number of missing values for each column

In [5]:
df.isnull().sum().compute()

InvoiceNo           0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
UnitPrice           0
CustomerID     135080
Country             0
dtype: int64

## 4. Filter rows based on conditions

In [6]:
df[df.Quantity < 10].compute()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
...,...,...,...,...,...,...,...,...
159374,581587,23256,CHILDRENS CUTLERY SPACEBOY,4,12/9/2011 12:50,4.15,12680.0,France
159376,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,12/9/2011 12:50,2.10,12680.0,France
159377,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,12/9/2011 12:50,4.15,12680.0,France
159378,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,12/9/2011 12:50,4.15,12680.0,France
