# Data Tokenization

Tokenization is a technique where sensitive data is replaced with unique tokens, preserving the original data's format and structure while concealing its actual value. Here's an example of how tokenization can be implementedin Python. 


In this example, the tokenize_data() function replaces each personal information entry (e.g., names, addresses, social security numbers) with a unique token generated using Python's uuid module. Each time the function is called, it generates a new UUID (Universally Unique Identifier), ensuring that each token is unique and not tied to any specific individual.

By tokenizing the data, sensitive information is replaced with random tokens, preserving the dataset's structure while protecting the privacy of individuals. This technique is commonly used in data anonymization to enhance privacy and security when sharing or analyzing sensitive data.

In [2]:
import pandas as pd
import uuid


In [3]:

# Sample dataset with personal information
data = {
    'Name': ['John Smith', 'Jane Doe', 'Alice Johnson'],
    'Address': ['123 Main St', '456 Elm St', '789 Oak St'],
    'Social Security Number': ['123-45-6789', '987-65-4321', '456-78-9123']
}

# Load dataset into a pandas DataFrame
df = pd.DataFrame(data)

# Define a function to tokenize personal information
def tokenize_data(df):
    # Generate unique tokens for each personal information entry
    df['Name'] = [str(uuid.uuid4()) for _ in range(len(df))]
    df['Address'] = [str(uuid.uuid4()) for _ in range(len(df))]
    df['Social Security Number'] = [str(uuid.uuid4()) for _ in range(len(df))]
    
    return df

# Tokenize personal information
tokenized_df = tokenize_data(df)

# Display tokenized dataset
print("Tokenized Dataset:")
print(tokenized_df)


Tokenized Dataset:
                                   Name                               Address  \
0  7a9ecaf9-87e4-4071-bd74-ef8eb555083a  53c984f9-4ea3-47c8-8c8f-c9201019a671   
1  c9dcb003-c453-4c18-b4f6-38130e25d8ea  792d42e0-347b-4f50-a873-6534dfddf2d6   
2  4a5da139-4401-4b76-a01a-0101fbcd77c1  db52b72f-d5f6-4777-a86e-60bb96b25eca   

                 Social Security Number  
0  be5686ba-4794-4fc1-9188-32864e701015  
1  c0ab5763-ba32-492f-993a-1e52e3151230  
2  941ca5cb-8001-4147-879c-9268a64c570a  
