## Setup

1. Build docker using <br>`docker build . -t pyarrow`
2. Run container with memory restrictions using `-m option`:<br>
   `docker run -d -p 127.0.0.1:5440:5440 -v ./nb:/nb -m 512m pyarrow`

In [None]:
import pyarrow as pa

In [None]:
batch = pa.RecordBatch.from_arrays([range(1,1_000_000),
                                    range(1,1_000_000),
                                    range(1,1_000_000)],
                                   names=["x", "y", "z"])
batch.nbytes / 1024**2

In [None]:
with pa.ipc.new_file("test.arrow", schema=batch.schema) as f:
    for i in range(50):
        f.write_batch(batch)

In [None]:
!ls -lah

In [None]:
# not enough memory!
with pa.ipc.open_file("test.arrow") as f:
    tbl = f.read_all()

In [None]:
import pyarrow as pa
import mmap

### File backed mmap

In [None]:
with open("test.arrow", "rb") as f:
    mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

In [None]:
mm[:10]

Observe memory usage using `htop` and `docker stats` commands as you run the below cells of code.

In [None]:
with pa.ipc.open_file(mm) as f:
    tbl = f.read_all() 
# data is still not in RAM because we haven't used it yet

Because of page evictions, it is now possible for us to perform computations on this dataset despite it being larger than allocated 512 MB for the `docker` container.

In [None]:
import pyarrow.compute as pc
pc.sum(tbl["x"]).as_py()

In [None]:
pc.sum(tbl["y"])

In [None]:
pc.sum(tbl["z"])