
# NB Anonymizer: A Pioneering Data Privacy Tool

Welcome to the **NB Anonymizer** project, a groundbreaking tool designed by [NextBrain](https://nextbrain.ai) to champion data privacy and security. In an era where data protection is paramount, our tool stands as a sentinel, providing an unwavering shield against privacy breaches. By seamlessly obfuscating personal identifiers, the NB Anonymizer guarantees that sensitive information remains cloaked, ensuring that no data is ever stored or compromised.

## The Genesis of NB Anonymizer
Our journey began during the intricate development phase of our flagship machine learning solution, NextBrain. We identified a critical gap in data privacy - a hurdle that often impedes the progress in the proof of concept phase. Recognizing this, we embarked on a mission to create a solution that not only addresses our needs but also resonates with the broader challenges faced by the community.

## Our Commitment to Open-Source and Data Security
In our relentless pursuit of excellence, we have dedicated NB Anonymizer to the open-source community. This initiative reflects our deep commitment to fostering a secure and transparent digital ecosystem. By sharing our solution, we aim to empower businesses, researchers, and developers alike, enabling them to safeguard their data with utmost confidence and compliance.

Join us in this endeavor to redefine the standards of data security and contribute to a safer, more privacy-conscious world.

---

## How to use this notebook?

Effortlessly navigate through this notebook by executing each code cell sequentially. Follow the accompanying instructions for a smooth experience.

**No Code Reading Required**: You don't need to delve into the code details. Simply upload your data and download it once processed. Any additional steps are for visualization or optional configurations, ensuring a user-friendly journey.

---

### Required Dependency: DataSynthesizer
For the NB Anonymizer to function optimally, it relies on DataSynthesizer, a versatile library that excels in data anonymization and synthesis. This dependency ensures robust privacy protection while maintaining data utility, aligning perfectly with our tool's objectives.

In [None]:
# @title
!pip install DataSynthesizer==0.1.13 pandas

## 1. Upload the data you want to anonymize

In [16]:
# @title
import base64
import ipywidgets as widgets
import io
import pandas as pd
import pathlib
import tempfile

from DataSynthesizer.DataDescriber import DataDescriber
from DataSynthesizer.DataGenerator import DataGenerator
from IPython.display import display, HTML

In [None]:
# @title
uploader = widgets.FileUpload(
    accept='.csv,.txt,.xls,.xlsx',
    multiple=False
)
uploader

## 2. Visualize your original data

In [None]:
# @title
if uploader.value:
    uploaded_file = next(iter(uploader.value.values()))
    file_name = uploaded_file['metadata']['name']
    file_extension = file_name.split('.')[-1].lower()

    if file_extension in ['xls', 'xlsx']:
        df = pd.read_excel(io.BytesIO(uploaded_file['content']))
    else:
        df = pd.read_csv(io.BytesIO(uploaded_file['content']))
else:
  print('Re-execute this cell once you upload your data')

df

## 3. Select your anonymization algorithm (optional).

You can check in more detail all the algorithms [here](https://github.com/DataResponsibly/DataSynthesizer)

In [None]:
# @title
options = [
    'Random mode (fast speed)',
    'Independent attribute mode (medium speed)',
    'Correlated attribute mode (low speed)'
]
dropdown = widgets.Dropdown(
    options=options,
    value=options[1],
    description='Select:',
    disabled=False,
)

print('Anonymization algorithm')
# Display the dropdown
display(dropdown)

## 3. Generate anonymized data! 🎉

In [None]:
# @title
with tempfile.TemporaryDirectory() as temp_dir:
    temp_dir = pathlib.Path(temp_dir)

    csv_file = temp_dir / 'data.csv'
    description_file = temp_dir / 'describe.json'
    synthetic_data_file = temp_dir / 'synthetic.csv'

    df.to_csv(csv_file)

    algorithm_level = options.index(dropdown.value)

    describer = DataDescriber(category_threshold=20)

    level_to_describer_algorithm = {
        0: describer.describe_dataset_in_random_mode,
        1: describer.describe_dataset_in_independent_attribute_mode,
        2: describer.describe_dataset_in_correlated_attribute_mode,
    }

    # Generate description
    level_to_describer_algorithm[algorithm_level](csv_file)
    # Save
    describer.save_dataset_description_to_file(description_file)

    generator = DataGenerator()

    level_to_generator_algorithm = {
        0: generator.generate_dataset_in_random_mode,
        1: generator.generate_dataset_in_independent_mode,
        2: generator.generate_dataset_in_correlated_attribute_mode,
    }

    # Generate synthetic data
    level_to_generator_algorithm[algorithm_level](len(df), description_file=description_file)
    # Save
    generator.save_synthetic_data(synthetic_data_file)

    synthetic_df = pd.read_csv(synthetic_data_file)

synthetic_df

## 4. Download your anonymized CSV.

You can then upload it to [NextBrain Application](https://app.nextbrain.ai) without any privacy concerns.

In [None]:
# @title
def create_download_link(df, filename="data.csv", title="Download CSV file"):
    # Generate CSV from DataFrame
    csv = df.to_csv(index=False)
    b64 = base64.b64encode(csv.encode()).decode()

    # Create download link
    href = f'<a href="data:file/csv;base64,{b64}" download="{filename}" target="_blank">{title}</a>'
    return href

try:
  synthetic_df
except Exception:
  print('Execute previous cells before trying to download synthetic data.')
else:
  display(HTML(create_download_link(synthetic_df)))

# Thank You for Using NB Anonymizer!
As we wrap up your experience with the **NB Anonymizer**, we want to extend our heartfelt gratitude for choosing our tool to safeguard your data privacy. We are committed to empowering you with tools that ensure the utmost security and efficiency in handling sensitive information.

## Explore the Vast Potential of NextBrain
We warmly invite you to explore [NextBrain](https://nextbrain.ai), our comprehensive machine learning platform designed to revolutionize the way you interact with data. At NextBrain, we're not just about data processing; we're about unlocking a world of possibilities:

* **Advanced Analytics**: Dive into deep insights and transform data into actionable intelligence.
* **User-Friendly Interface**: Navigate with ease, whether you're a beginner or an expert.
* **Cutting-Edge Machine Learning Models**: Leverage the latest in AI to propel your projects forward.
* **Custom Solutions**: Tailor the platform to meet your unique needs and challenges.
* **Community and Support**: Join a growing community of enthusiasts and experts, ready to support and collaborate.
Your Journey Towards Data Mastery Begins Here
Whether it's for business growth, research, or personal projects, NextBrain is equipped to be your partner in success. We believe in the power of data to change the world, and with NextBrain, you're at the forefront of this transformation.

Thank you once again for your trust in **NB Anonymizer**. We can't wait to see what you'll achieve with NextBrain!

Start exploring NextBrain today!