<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Documents-and-chunks" data-toc-modified-id="Documents-and-chunks-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Documents and chunks</a></span><ul class="toc-item"><li><span><a href="#Getting-matches" data-toc-modified-id="Getting-matches-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Getting matches</a></span></li></ul></li><li><span><a href="#Iterating-over-chunks-of-a-DocumentArray" data-toc-modified-id="Iterating-over-chunks-of-a-DocumentArray-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Iterating over chunks of a DocumentArray</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Traversal-paths" data-toc-modified-id="Traversal-paths-2.0.1"><span class="toc-item-num">2.0.1&nbsp;&nbsp;</span>Traversal paths</a></span></li><li><span><a href="#Chunk-Traversal" data-toc-modified-id="Chunk-Traversal-2.0.2"><span class="toc-item-num">2.0.2&nbsp;&nbsp;</span>Chunk Traversal</a></span></li><li><span><a href="#Root-Traversal" data-toc-modified-id="Root-Traversal-2.0.3"><span class="toc-item-num">2.0.3&nbsp;&nbsp;</span>Root Traversal</a></span></li></ul></li></ul></li><li><span><a href="#Segmenters" data-toc-modified-id="Segmenters-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Segmenters</a></span></li></ul></div>

## Documents and chunks

When a `Document` is created it has acces to a `chunks` field

In [2]:
from jina import Document

with Document() as root:
    root.text = 'What is love? Oh baby do not hurt me.'

print(root.adjacency)                  # outputs 0

0


We can see that the filed `.chunks` returns a `ChunkArray`

In [3]:
root.chunks

<jina.types.arrays.chunk.ChunkArray length=0 at 140479529883056>

Underneath Jina is creating at runtime a ChunkArray Object from the `._pb_body.chunks` field

```python
  @property
    def chunks(self) -> 'ChunkArray':
        """Get all chunks of the current document.

        :return: the array of chunks of this document
        """
        return ChunkArray(self._pb_body.chunks, reference_doc=self)
```

Note that since we just created a Document with text we will not have anythin in the chunks

In [4]:
root._pb_body.chunks

[]

In [5]:

# Initialise two Documents and add as chunks to root.
with Document() as chunk1:
    chunk1.text = 'What is love?'
    root.chunks.append(chunk1)

with Document() as chunk2:
    chunk2.text = 'Oh baby do not hurt me.'
    root.chunks.append(chunk2)


Now we will have two chunks

In [6]:
root._pb_body.chunks

[id: "e69792f0-b880-11eb-9de2-787b8ab3f5de"
mime_type: "text/plain"
text: "What is love?"
granularity: 1
parent_id: "e53d0714-b880-11eb-9de2-787b8ab3f5de"
content_hash: "d12a28ebfdc62258"
, id: "e697adf8-b880-11eb-9de2-787b8ab3f5de"
mime_type: "text/plain"
text: "Oh baby do not hurt me."
granularity: 1
parent_id: "e53d0714-b880-11eb-9de2-787b8ab3f5de"
content_hash: "c0781807296e94d1"
]

To access the text inside each chunk we can do

In [7]:
root.chunks[0].text

'What is love?'

In [8]:
root.chunks[1].text

'Oh baby do not hurt me.'

### Getting matches

In [10]:
root._pb_body.matches

[]

In [13]:

# Add a match Document.
with Document() as match:
    # a match Document semantically related to our root
    match.text = 'What is love? Oh please do not hurt me.'
    root.matches.append(match)

print(len(root.matches))               # outputs 1
print(root.matches[0].granularity)     # outputs 0
print(root.matches[0].adjacency)       # outputs 1

3
0
1


In [20]:
root

In [None]:
Now the 

In [17]:
root.granularity

0

In [21]:
root.chunks[0].granularity

1

## Iterating over chunks of a DocumentArray

In [23]:
from jina import Document, DocumentArray

with Document() as doc1:
    doc1.text = 'What is love? Oh baby do not hurt me.'
with Document() as chunk1:
    chunk1.text = 'What is love?'
    doc1.chunks.append(chunk1)
with Document() as chunk2:
    chunk2.text = 'Oh baby do not hurt me.'
    doc1.chunks.append(chunk2)

    
with Document() as doc2:
    doc2.text = 'Ronaldo? Oh Ronaldo does not hurt me.'
with Document() as chunk1:
    chunk1.text = 'Ronaldo is worth some milions'
    doc2.chunks.append(chunk1)
with Document() as chunk2:
    chunk2.text = 'Ronaldo plays at Madrid'
    doc2.chunks.append(chunk2)

In [24]:
x = DocumentArray([doc1,doc2])

#### Traversal paths

Traversing over a `DocumentArray`

In [25]:
for doc in x.traverse(traversal_paths='r'):
    print(doc)

DocumentArray has 2 items:
{'id': 'bd9af7b4-b887-11eb-9de2-787b8ab3f5de', 'chunks': [{'id': 'bd9afde0-b887-11eb-9de2-787b8ab3f5de', 'mime_type': 'text/plain', 'text': 'What is love?', 'granularity': 1, 'parent_id': 'bd9af7b4-b887-11eb-9de2-787b8ab3f5de', 'content_hash': 'd12a28ebfdc62258'}, {'id': 'bd9b0830-b887-11eb-9de2-787b8ab3f5de', 'mime_type': 'text/plain', 'text': 'Oh baby do not hurt me.', 'granularity': 1, 'parent_id': 'bd9af7b4-b887-11eb-9de2-787b8ab3f5de', 'content_hash': 'c0781807296e94d1'}], 'mime_type': 'text/plain', 'text': 'What is love? Oh baby do not hurt me.', 'content_hash': '93bf85a364ff576b'},
{'id': 'bd9b0e16-b887-11eb-9de2-787b8ab3f5de', 'chunks': [{'id': 'bd9b11a4-b887-11eb-9de2-787b8ab3f5de', 'mime_type': 'text/plain', 'text': 'Ronaldo is worth some milions', 'granularity': 1, 'parent_id': 'bd9b0e16-b887-11eb-9de2-787b8ab3f5de', 'content_hash': 'ebe255baa97a6b3e'}, {'id': 'bd9b1758-b887-11eb-9de2-787b8ab3f5de', 'mime_type': 'text/plain', 'text': 'Ronaldo plays

#### Chunk Traversal

If we want to iterate over all chunks of the documents of the DocumentArray `x` we can use `traversal_paths='c'`.

Note that if we iterate over chunks:

-  `chunk[0].text` will be the text of the first chunk in each of the documents that we iterate on.

-  `chunk[1].text` will be the text of the second chunk in each of the documents that we iterate on.




In [26]:
for chunk in x.traverse(traversal_paths='c'):
    print(chunk[0].text)

What is love?
Ronaldo is worth some milions


In [27]:
for chunk in x.traverse(traversal_paths='c'):
    print(chunk[1].text)

Oh baby do not hurt me.
Ronaldo plays at Madrid


#### Root Traversal

If we want to iterate over all root documents inside a
DocumentArray `x` we can use `traversal_paths='r'`.

Note that if we iterate over root documents:

-  `root[0].text` will be the text of the first document.

-  `root[1].text` will be the text of the second document.



In [28]:
for aux in x.traverse(traversal_paths='r'):
    print(aux[0].text)

What is love? Oh baby do not hurt me.


In [29]:
for aux in x.traverse(traversal_paths='r'):
    print(aux[1].text)

Ronaldo? Oh Ronaldo does not hurt me.


We can iterate over all texts

In [30]:
for d in x:
    print(d.text)

What is love? Oh baby do not hurt me.
Ronaldo? Oh Ronaldo does not hurt me.


In [31]:
traversal = x.traverse(traversal_paths='rc')

for d in traversal:
    for i in range(len(d)):
        print(d[i].text)

What is love? Oh baby do not hurt me.
Ronaldo? Oh Ronaldo does not hurt me.
What is love?
Oh baby do not hurt me.
Ronaldo is worth some milions
Ronaldo plays at Madrid


## Segmenters

A Segmenter is the name that Jina uses to define classes that partition (or segment) the data into chunks. 

