In [17]:
%load_ext memory_profiler

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler


In [54]:
from datetime import (datetime, timedelta)
import pyarrow as pa

A `Buffer` represents a single contiguous memory segment (struct: *offset* and *length* props )

A `field` represents a named column in a record / row batch or child of a nested type.

# `pyarrow.Buffer`

[Documentation](https://arrow.apache.org/docs/python/generated/pyarrow.Buffer.html)

## pyarrow.py_buffer()

`py_buffer` constructs an Arrow buffer from a Python bytes-like or buffer-like object

In [47]:
data0 = b'abcdefghijklmnopqrstuvwxyz'
buf0 = pa.py_buffer(data0)
buf0

<pyarrow.lib.Buffer at 0x7f1fd8dae458>

**Methods**

In [105]:
buf1 = pa.py_buffer(b'abcdefghijklmnopqrstuvwxy')
buf2 = pa.py_buffer(b'abcdefghijklmnopqrstuvwxyz')

In [111]:
buf0.equals(buf1), buf0.equals(buf2), buf0.slice(), buf0.to_pybytes()

(False,
 True,
 <pyarrow.lib.Buffer at 0x7f1f56d93ca8>,
 b'abcdefghijklmnopqrstuvwxyz')

New in pyarrow 0.14: `buf0.hex()`

**Attributes**

In [48]:
buf0.address, buf0.is_mutable, buf0.parent, buf0.size

(139774776982416, False, None, 26)

*bytes* object implement the [buffer protocol](https://docs.python.org/3/c-api/buffer.html) known as [PEP 3118](http://legacy.python.org/dev/peps/pep-3118/): no copy

In [49]:
buf0[0] is data0[0]

True

## Get the underlining `Buffer` to `Array`

In [117]:
%%memit 
ref1 = datetime(2019, 1, 1)
dates1 = [ref1 + timedelta(days=i) for i in range(0, 1_000_000, 3)]

peak memory: 224.48 MiB, increment: 8.35 MiB


In [118]:
arr1 = pa.array(dates1)
# 
arr2 = pa.array(dates1, type=pa.timestamp('ms'))
arr3 = pa.array(dates1, size=128, type=pa.timestamp('ms'))
arr1.type, arr2.type, arr3.type

(TimestampType(timestamp[us]),
 TimestampType(timestamp[ms]),
 TimestampType(timestamp[ms]))

In [120]:
from sys import getsizeof

In [128]:
getsizeof(dates1) // getsizeof(ref1)

55793

In [100]:
arr2.type

TimestampType(timestamp[ms])

In [101]:
arr3.type

TimestampType(timestamp[ms])

In [64]:
buf1 = arr1.buffers()[-1]

In [81]:
slice1 = buf1.slice(offset=128, length=64)

In [86]:
bytes01 = slice1.to_pybytes()

In [53]:
arr1[0] is dates1[0]

False

Python list do not implement the [buffer protocol](https://docs.python.org/3/c-api/buffer.html) known as [PEP 3118](http://legacy.python.org/dev/peps/pep-3118/)

See [An Introduction to the Python Buffer Protocol](https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/) 
from Jake VanderPlas 

%memit pa.array(dates)

In [26]:
arr = pa.array(dates1)

In [27]:
arr.type

TimestampType(timestamp[us])

In [28]:
%memit arr.to_pandas()

peak memory: 119.68 MiB, increment: 1.35 MiB


In [29]:
s = arr.to_pandas()

In [30]:
arr1 = arr.cast('timestamp[ms]')

In [31]:
arr1.type

TimestampType(timestamp[ms])

In [129]:
arr

<pyarrow.lib.TimestampArray object at 0x7f1f566a4e58>
[
  1546300800000000,
  1546560000000000,
  1546819200000000,
  1547078400000000,
  1547337600000000,
  1547596800000000,
  1547856000000000,
  1548115200000000,
  1548374400000000,
  1548633600000000,
  ...
  87943881600000000,
  87944140800000000,
  87944400000000000,
  87944659200000000,
  87944918400000000,
  87945177600000000,
  87945436800000000,
  87945696000000000,
  87945955200000000,
  87946214400000000
]

In [32]:
buf1 = arr1.buffers()[-1]
buf1

<pyarrow.lib.Buffer at 0x7f1fd9eaa5e0>

In [33]:
arr2 = arr1[4:50]

In [34]:
buf2 = arr2.buffers()[-1]

In [35]:
buf2.size

2666672

In [36]:
buf1.size

2666672

In [37]:
buf2.equals(buf1)

True

In [38]:
buf1[1]

188

In [39]:
buf2[1]

188

In [40]:
buf1?

[0;31mType:[0m        Buffer
[0;31mString form:[0m <pyarrow.lib.Buffer object at 0x7f1fd9eaa5e0>
[0;31mLength:[0m      2666672
[0;31mFile:[0m        /opt/conda/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so
[0;31mDocstring:[0m  
Buffer()

The base class for all Arrow buffers.

A buffer represents a contiguous memory area.  Many buffers will own
their memory, though not all of them do.


## Allocate buffer

Allocate 1 Gb data

In [42]:
%%memit
buf = pa.allocate_buffer(1024 * 1024 * 1024, resizable=False)

peak memory: 119.55 MiB, increment: 0.01 MiB


In [43]:
pa.total_allocated_bytes() / 1024 / 1024 / 1024

1.005005955696106

Delete the buffer

In [56]:
buf = None

In [57]:
pa.total_allocated_bytes() / 1024 / 1024 / 1024

0.007528364658355713