Data Masking Tool

A Python application for masking sensitive data in CSV and Excel files using customizable fake data.

Description

The Data Masking Tool is designed to help users anonymize sensitive data within CSV and Excel files. It replaces selected columns with fake data generated using the Faker library. The tool provides a graphical user interface (GUI) built with Tkinter, allowing users to easily select columns to mask, configure fake data types, and customize settings.

Features

Supports CSV and Excel (.xlsx, .xls) files.
Customizable masking options for each column.
Multiple fake data types, including names, emails, phone numbers, dates, and more.
Configurable settings for fake data generation (e.g., prefixes, suffixes, ranges).
Option to keep mappings consistent across the dataset.
Ability to introduce blank values at a specified percentage.
Simple and intuitive GUI.
Generates a masked data file without altering the original file.

Installation

Prerequisites

Python 3.6 or higher.
Required Python libraries:
- pandas
- numpy
- faker
- xlsxwriter
- xlrd
- tkinter (comes pre-installed with Python on most systems)

Steps

Clone the Repository

git clone https://github.com/Marcelo-Has/data-masking-tool.git
cd data-masking-tool

Create a Virtual Environment (Recommended)
```
python -m venv venv
```
Activate the Virtual Environment
- On Windows:
```
venv\Scripts\activate
```
- On macOS/Linux:
```
source venv/bin/activate
```

Install Dependencies

pip install -r app/docs/requirements.txt

Run the Application
```
python main.py
```

Usage

Launch the Application

Run the main script:
```
python main.py
```
Select CSV Delimiter (If Applicable)

If you're working with CSV files, choose the appropriate delimiter from the dropdown menu.
Upload a File

Click on the "Upload and Mask Data" button and select the CSV or Excel file you wish to mask.
Select Columns to Mask

Check the boxes next to the columns you want to mask.
For each selected column:
- Choose the Field Type for fake data generation.
- Optionally, set the Blank Percentage to introduce null values.
- Click on the ⚙️ button to configure additional settings.

Configure Fake Data (Optional)

Customize settings such as prefixes, suffixes, ranges, and custom lists in the configuration pane.
Generate Fake Data

Click the "Generate Fake Data" button to start the masking process.
Save the Masked Data

After processing, you'll be prompted to choose a location to save the masked file. The default filename will be the original name appended with _masked.xlsx.
View Logs

Monitor the progress and view detailed logs in the log display area within the application.

Examples

Masking a CSV File

Launch the application.
Select the comma (,) delimiter, or the appropriate one.
Upload your data.csv, data.xlsx, or data.xls file.
Select columns like Name, Email, and Phone to mask.
Configure each field type as desired.
Generate the fake data and save the output.

Note: Your computer may deny saving depending on the folder (for example the downloads folder), in this case try saving in another folder, such as on the desktop.

Customizing Fake Data

Use the Custom List field type to mask a column with specific values.
Set up a Number field type to generate random integers within a range.
Configure the Date field type to generate random dates between two specified dates.

Documentation

Supported Field Types

Name
Full Name
Address
Phone
Email
UUID
Company
Department
City
Country
Zip Code
Product Name
State or Province
Row Number
Custom List
Number
Date

Configuration Options

Prefix/Suffix: Add custom text before or after the generated fake data.
Blank Percentage: Specify the percentage of blank (null) values to introduce.
Custom Lists: Provide a list of custom values for masking.
Number Ranges: Set minimum and maximum values for numeric fields.
Date Ranges: Define start and end dates for date fields.
UUID Types: Choose between standard UUIDs or custom alphanumeric codes.

Jupyter Notebook Example

If you prefer to use the Data Masking Tool within a Jupyter Notebook or want to integrate it into your data processing workflows without the GUI, we've provided a comprehensive notebook example that demonstrates how to use the tool programmatically.

Accessing the Notebook

Notebook File: docs/DataMaskingToolExample.ipynb

Overview

The Jupyter Notebook example covers:

Importing Necessary Modules: Instructions on setting up your environment with the required libraries.
Defining Utility Functions: Essential functions needed for data masking operations.
Configuring Masking Parameters: How to specify which columns to mask and configure their settings.
Running the Masking Function: Applying the masking to your dataset.
Reviewing Results: Viewing the masked data and logs.
Saving Masked Data: Instructions on saving the masked DataFrame to a file.

Usage Steps

Clone the Repository (If Not Already Done)

git clone https://github.com/Marcelo-Has/data-masking-tool.git
cd data-masking-tool

Navigate to the Docs Directory
```
cd app/docs
```

Open the Notebook

You can open the notebook using Jupyter Notebook or JupyterLab:

jupyter notebook DataMaskingToolExample.ipynb

or

jupyter lab DataMaskingToolExample.ipynb

Install Dependencies (If Needed)

Make sure you have all the required Python libraries installed:
```
pip install pandas numpy faker openpyxl xlsxwriter
```
Run the Notebook

Execute each cell sequentially to understand how the Data Masking Tool works in a notebook environment.
The notebook includes detailed explanations and test cases for various masking configurations.

Customization

Adjust Configurations: Modify the configurations in the notebook to suit your dataset and masking requirements.
Integrate into Your Workflow: Use the code snippets as a starting point to integrate data masking into your data processing pipelines.

Benefits of Using the Notebook

Programmatic Control: Run data masking operations without the GUI, allowing for automation and integration with other code.
Flexibility: Customize the masking process extensively through code.
Documentation: The notebook serves as both a tutorial and a reference guide.

Executable

If you prefer to use the application without setting up the development environment, you can use the standalone executable file.

The executable file can be found in the dist folder.
Path: dist/DataMaskingTool.exe
Usage: Navigate to the dist directory. Run the executable: On Windows: Double-click DataMaskingTool.exe or run it via command prompt.

Note: The executable includes all necessary dependencies and can be run on any Windows machine without installing Python or additional libraries.

Contributing

Contributions are welcome! To contribute:

Fork the Repository

Click the Fork button on the top right to create a copy of this repository on your GitHub account.

Clone Your Fork

git clone https://github.com/Marcelo-Has/data-masking-tool.git

Create a Feature Branch

git checkout -b feature/your-feature-name

Commit Your Changes
```
git commit -am 'Add new feature'
```

Push to the Branch

git push origin feature/your-feature-name

Open a Pull Request

Submit a pull request to the main repository for review.

Coding Guidelines

Follow PEP 8 style guidelines.
Write clear, concise commit messages.
Include docstrings and comments where necessary.
Update or add tests for new features.

Reporting Issues

Use the GitHub issue tracker to report bugs or request features.
Provide detailed information and steps to reproduce issues.

Author

Created by Marcelo Has

Email: marcelo_has@outlook.com
GitHub: Marcelo-Has
Linkedin: https://www.linkedin.com/in/marcelohas/

License

This project is licensed under the MIT License.

Feel free to reach out with questions, suggestions, or contributions!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
dist		dist
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Masking Tool

Table of Contents

Description

Features

Installation

Prerequisites

Steps

Usage

Examples

Masking a CSV File

Customizing Fake Data

Documentation

Supported Field Types

Configuration Options

Jupyter Notebook Example

Accessing the Notebook

Overview

Usage Steps

Customization

Benefits of Using the Notebook

Executable

Contributing

Coding Guidelines

Reporting Issues

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Masking Tool

Table of Contents

Description

Features

Installation

Prerequisites

Steps

Usage

Examples

Masking a CSV File

Customizing Fake Data

Documentation

Supported Field Types

Configuration Options

Jupyter Notebook Example

Accessing the Notebook

Overview

Usage Steps

Customization

Benefits of Using the Notebook

Executable

Contributing

Coding Guidelines

Reporting Issues

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages