# Reversible Tokenizer

Here we show an example of how you can use the `ReversibleTokenizer` to tokenize data within a pandas dataframe.

The `ReversibleTokenizer` will tokenize the input data so it can be used in a privacy preserving manner.

The `ReversibleTokenizer` can be used in conjunction with the `TokenReverser` to recover the original data.

### Tokenizing Data

The `ReversibleTokenizer` and `TokenReverser` classes can be found in the `pandas.transformations` package.

In [63]:
from cape_dataframes.pandas.transformations import ReversibleTokenizer
from cape_dataframes.pandas.transformations import TokenReverser

The `ReversibleTokenizer` and `TokenReverser` classes both take a `key` as input.

For the `TokenReverser` to be able to reverse the tokens produced by the `ReversibleTokenizer`, you must
use the same key.

In [64]:
key=b"5" * 32
key

b'55555555555555555555555555555555'

In this example, we will simply hide the names within our dataset.

In [71]:
import pandas as pd
plaintext_data = pd.DataFrame({'name': ["Alice", "Bob", "Carol"], "# friends": [100, 200, 300]})
plaintext_data

Unnamed: 0,name,# friends
0,Alice,100
1,Bob,200
2,Carol,300


You instantiate a `ReversibleTokenizer` by passing it your key

In [72]:
tokenizer = ReversibleTokenizer(key=key)
tokenizer

<cape_privacy.pandas.transformations.tokenizer.ReversibleTokenizer at 0x11a8da630>

Next, we can pass our dataframe to the `tokenizer`

In [73]:
tokenized = pd.DataFrame(plaintext_data)
tokenized["name"] = tokenizer(plaintext_data["name"])
tokenized

Unnamed: 0,name,# friends
0,c8c7e80144304276183e5bcd589db782bc5ff95309,100
1,e0f40aea0d5c21b35967c4231b98b5b3e5338e,200
2,7bfcdf25f73a1fe7a7fcb0970976f3393ed5df5ceb,300


## Recovering Tokens

If we ever need to reveal the tokenized data, we can use the `TokenReverser` class.

In [74]:
reverser = TokenReverser(key=key)
recovered = pd.DataFrame(tokenized)
recovered["name"] = reverser(tokenized["name"])
recovered

Unnamed: 0,name,# friends
0,Alice,100
1,Bob,200
2,Carol,300
