# Chapter 2 - Flair Base Types

This Jupyter notebook provides a resource to help you follow the code examples from the book more easily. The notebook covers all practical code snippets and exercises found in: Chapter 2 - Flair Base Types.

## Sentence and Token objects

### Understanding the Sentence class

In [None]:
from flair.data import Sentence

sentence = Sentence('Some nice text.')
print(sentence)

### Tokenization in Flair

In [None]:
from flair.data import Sentence
from flair.tokenization import SpaceTokenizer

tokenizer = SpaceTokenizer()
s = Sentence('Some nice text.', use_tokenizer=tokenizer)

# getting the string representation using magic method __str__ 
print(s)

### Sentence and Token object helper methods

In [None]:
from flair.data import Sentence

sentence = Sentence('A short sentence')
sentence.get_token(1).add_label('manual-pos', 'DT')

print(sentence)

In [None]:
for token in sentence:
    print(token)

## Using custom tokenizers

### Tokenization in Flair

In [None]:
from flair.tokenization import Token, TokenizerWrapper

def space_splitter(sentence):
    tokens = []
    for index, char in enumerate(sentence):
        tokens.append(Token(text=char, start_position=index))
    return tokens

space_tokenizer = TokenizerWrapper(space_splitter)

In [None]:
from flair.data import Sentence

text = "Good day."
sentence = Sentence(text, use_tokenizer=space_tokenizer)

for token in sentence:
    print(token)

### Understanding the Corpus object

In [None]:
from flair import datasets

corpus = datasets.UD_ENGLISH()

print(corpus)

In [None]:
train_dataset = corpus.train

In [None]:
sentence = train_dataset[100]

print(sentence)

In [None]:
downsampled_corpus = corpus.downsample(0.01)

print(downsampled_corpus)