# A simple hands-on mercury-dataschema

In [None]:
from mercury.dataschema import DataSchema
from mercury.dataschema.anonymize import Anonymize


## Getting a dataset from seaborn examples

We import seaborn just in case just to load the tips dataset to play with it.

We pip install it first.

In [None]:
!pip install seaborn

In [None]:
import seaborn as sns


We change the types of the strings to string.

In [None]:
tips = sns.load_dataset('tips')
tips['sex'] = tips['sex'].astype(str)
tips['smoker'] = tips['smoker'].astype(str)
tips['day'] = tips['day'].astype(str)
tips['time'] = tips['time'].astype(str)

tips


## Automated type detection

In [None]:
schema = DataSchema().generate(tips)

The method `.generate` generates for each of the columns an object of class Feature that allows abstracting its details
and using it in the same way across types.

This is how many mercury packages work.

As you can see in the previous warning, it treats an integer variable as categorical because it has only two values. This behavior can be controlled 

  * [see documentation](https://bbva.github.io/mercury-dataschema/)

In [None]:
schema.feats

## Anonymize example

The pckage also includes an Anonymize class that supports multiple key management functions, controlable precision and secure cryptography.

In [None]:
anon = Anonymize()

In [None]:
anon.set_key('Mickey Mouse')

In [None]:
anon.anonymize_list_any_type(list(tips['total_bill']))[0:10]

## Same example with shorter digest length

We run the same example with 12 bit digest (2 base-64 digits).

In [None]:
anon = Anonymize(digest_bits = 12)

In [None]:
anon.set_key('Mickey Mouse')

In [None]:
anon.anonymize_list_any_type(list(tips['total_bill']))[0:10]