---
title: Awkward Array Primer
date: 2023-10-16
authors:
  - name: Angus Hollands
    affiliations:
      - Princeton University
---

[Awkward Array](https://awkward-array.org/doc/main/) is 
> a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms.

Creating an Awkward Array is as simple as

In [1]:
import awkward as ak

array = ak.Array([
    ["this", "is"],
    ["a", "ragged", "list", "of", "strings"]
])
array

Of course, Python data structures can be used to store ragged data, e.g.

In [2]:
[
    ["this", "is"],
    ["a", "ragged", "list", "of", "strings"]
]

[['this', 'is'], ['a', 'ragged', 'list', 'of', 'strings']]

But the real power of using Awkward Array is the _performance_ and _expressiveness_ provided by our compiled kernels, and high-level API:

In [3]:
ak.num(array, axis=1)

In [4]:
%psearch -e builtin ak.operations.*? function

ak.operations.all
ak.operations.almost_equal
ak.operations.any
ak.operations.argcartesian
ak.operations.argcombinations
ak.operations.argmax
ak.operations.argmin
ak.operations.argsort
ak.operations.backend
ak.operations.broadcast_arrays
ak.operations.broadcast_fields
ak.operations.cartesian
ak.operations.categories
ak.operations.combinations
ak.operations.concatenate
ak.operations.copy
ak.operations.corr
ak.operations.count
ak.operations.count_nonzero
ak.operations.covar
ak.operations.drop_none
ak.operations.enforce_type
ak.operations.fields
ak.operations.fill_none
ak.operations.firsts
ak.operations.flatten
ak.operations.from_arrow
ak.operations.from_arrow_schema
ak.operations.from_avro_file
ak.operations.from_buffers
ak.operations.from_categorical
ak.operations.from_cupy
ak.operations.from_dlpack
ak.operations.from_feather
ak.operations.from_iter
ak.operations.from_jax
ak.operations.from_json
ak.operations.from_numpy
ak.operations.from_parquet
ak.operations.from_rdataframe
ak.operatio

Awkward supports many different data types:

__Numerics__

In [5]:
ak.Array([1, 2, 3])

__Optionals__

In [6]:
ak.Array([1, None, 2])

__Tagged Unions__

In [7]:
ak.Array([1, "hello world", 2])

__Records__ (composed structures)

In [8]:
ak.Array([
    {'x': 1, 'y': 2},
    {'x': 3, 'y': 4}
])

__(Byte) strings__

In [9]:
ak.Array(["hello", "world!"])

Arrays are represented in-memory in columnar format:

In [10]:
array = ak.Array([
    [1, 2, 3],
    [4, 5, 6, 7],
    [8, 9]
]) 
array.layout

<ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>
        [0 3 7 9]
    </Index></offsets>
    <content><NumpyArray dtype='int64' len='9'>[1 2 3 4 5 6 7 8 9]</NumpyArray></content>
</ListOffsetArray>

In [11]:
array = ak.Array([
    [
        [1, 2], [3]
    ],
    [
        [4, 5, 6], [7]
    ],
    [
        [8, 9]
    ]
])
array.layout

<ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>
        [0 2 4 5]
    </Index></offsets>
    <content><ListOffsetArray len='5'>
        <offsets><Index dtype='int64' len='6'>[0 2 3 6 7 9]</Index></offsets>
        <content><NumpyArray dtype='int64' len='9'>[1 2 3 4 5 6 7 8 9]</NumpyArray></content>
    </ListOffsetArray></content>
</ListOffsetArray>

And can easily be composed to form new arrays

In [12]:
array = ak.Array([1, 2, 3, 4, 5, 6, 7, 8, 9])
array

In [13]:
ak.unflatten(
    array,
    counts=[2, 1, 3, 1, 2]
)

Because of this representation, it's easy to compose arrays into records

In [58]:
ak.zip({
    'x': [1, 2, 3, 4],
    'y': [5, 6, 7, 8]
})

Arrays have types

In [14]:
array.show()

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9]


In [15]:
array.type.show()

9 * int64


In [16]:
array.type

ArrayType(NumpyType('int64'), 9, None)

Arrays can be reduced

In [17]:
array = ak.Array([
    [1, 2, 4],
    [],
    [8],
    [16]
])

In [18]:
ak.sum(array, axis=1)

Awkward Array supports _ragged_ reduction:

:::{figure} img/example-reduction-sum-only.svg
:align: left

Ragged reduction at `axis=0`
:::


In [19]:
array = ak.Array([
    [1,    2,   4],
    [            ],
    [None, 8,    ],
    [16,         ]
])

In [59]:
ak.sum(array, axis=0)

Arrays can be sliced

In [21]:
array

In [22]:
array[:1]

In [23]:
array[:, :1]

Awkward provides helpers for slicing single items from maybe-empty sublists

In [24]:
array

In [25]:
ak.firsts(array, axis=1)

It's very easy to move between structured and in-memory representation

In [26]:
array

We provide a function `ak.to_buffers` that can decompose an Array into its data and structure

In [27]:
form, length, container = ak.to_buffers(array)

In [28]:
form

ListOffsetForm('i64', IndexedOptionForm('i64', NumpyForm('int64', form_key='node2'), form_key='node1'), form_key='node0')

In [29]:
length

4

In [30]:
container

{'node0-offsets': array([0, 3, 3, 5, 6]),
 'node1-index': array([ 0,  1,  2, -1,  3,  4]),
 'node2-data': array([ 1,  2,  4,  8, 16])}

It is trivial to reassemble these components into an Array

In [31]:
ak.from_buffers(form, length, container)

In Awkward Array, strings are views over an array of characters

In [40]:
array = ak.Array([
    "I am", "a list", "of strings!"
])
array.layout

<ListOffsetArray len='3'>
    <parameter name='__array__'>'string'</parameter>
    <offsets><Index dtype='int64' len='4'>
        [ 0  4 10 21]
    </Index></offsets>
    <content><NumpyArray dtype='uint8' len='21'>
        <parameter name='__array__'>'char'</parameter>
        [ 73  32  97 109  97  32 108 105 115 116 111 102  32 115 116 114 105
         110 103 115  33]
    </NumpyArray></content>
</ListOffsetArray>

Unlike NumPy, which has fixed-length strings

In [56]:
import numpy as np
strings = np.array(["I am", "a list", "of strings!"])
strings

array(['I am', 'a list', 'of strings!'], dtype='<U11')

In [57]:
strings.view(np.uint32).reshape(-1, 11)

array([[ 73,  32,  97, 109,   0,   0,   0,   0,   0,   0,   0],
       [ 97,  32, 108, 105, 115, 116,   0,   0,   0,   0,   0],
       [111, 102,  32, 115, 116, 114, 105, 110, 103, 115,  33]],
      dtype=uint32)

Through PyArrow, we provide a suite of string operations

In [36]:
array = ak.str.split_whitespace(
    ["A \"Hello, World!\" program is generally a computer program that ignores any input"]
)
array

In [38]:
ak.str.upper(
    array
)