### Sub-word level Selection
Short demo to show how to set up token boundaries for sub-word annotation selection. See [**thresh.tools/?t=demo_tokenization**](https://thresh.tools/?t=demo_tokenization) for more information and an example.

In [1]:
!pip install -q transformers

In [2]:
import transformers
tokenizer = transformers.RobertaTokenizer.from_pretrained('roberta-base')

In [3]:
# Let's declare some data to be tokenized
data = [{
    "source": "Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and, by opposing, end them?",
    "target": "Is it nobler to put up with all the nasty things that luck throws your way, or to fight against all those troubles by simply putting an end to them once and for all?"
}, {
    "source": "Alas, poor Yorick! I knew him, Horatio: a fellow of infinite jest, of most excellent fancy: he hath borne me on his back a thousand times; and now, how abhorred in my imagination it is!",
    "target": "Oh, poor Yorick! I used to know him, Horatio—a very funny guy, and with an excellent imagination. He carried me on his back a thousand times, and now—how terrible—this is him."
}]

In [4]:
# Tokenize our data
tokenized = []
for sent in data:
  sent_pair_tokenized = {}
  for sent_type in sent:
    sent_tokenized = ' '.join(tokenizer.tokenize(sent[sent_type]))
    # sent_tokenized = sent_tokenized.replace(' ', '').replace('Ġ', ' ') # Uncomment to recover the original sentence
    sent_pair_tokenized[sent_type] = sent_tokenized
  tokenized += [sent_pair_tokenized]

In [5]:
# Now our data will work out-of-the-box with the tokenization: tokenized flag!
tokenized

[{'source': "Whether Ġ' tis Ġnob ler Ġin Ġthe Ġmind Ġto Ġsuffer Ġthe Ġsl ings Ġand Ġarrows Ġof Ġoutrageous Ġfortune , Ġor Ġto Ġtake Ġarms Ġagainst Ġa Ġsea Ġof Ġtroubles , Ġand , Ġby Ġopposing , Ġend Ġthem ?",
  'target': 'Is Ġit Ġnob ler Ġto Ġput Ġup Ġwith Ġall Ġthe Ġnasty Ġthings Ġthat Ġluck Ġthrows Ġyour Ġway , Ġor Ġto Ġfight Ġagainst Ġall Ġthose Ġtroubles Ġby Ġsimply Ġputting Ġan Ġend Ġto Ġthem Ġonce Ġand Ġfor Ġall ?'},
 {'source': 'Al as , Ġpoor ĠYor ick ! ĠI Ġknew Ġhim , ĠHor atio : Ġa Ġfellow Ġof Ġinfinite Ġj est , Ġof Ġmost Ġexcellent Ġfancy : Ġhe Ġhath Ġborne Ġme Ġon Ġhis Ġback Ġa Ġthousand Ġtimes ; Ġand Ġnow , Ġhow Ġabhor red Ġin Ġmy Ġimagination Ġit Ġis !',
  'target': 'Oh , Ġpoor ĠYor ick ! ĠI Ġused Ġto Ġknow Ġhim , ĠHor atio âĢĶ a Ġvery Ġfunny Ġguy , Ġand Ġwith Ġan Ġexcellent Ġimagination . ĠHe Ġcarried Ġme Ġon Ġhis Ġback Ġa Ġthousand Ġtimes , Ġand Ġnow âĢĶ how Ġterrible âĢĶ this Ġis Ġhim .'}]