The Data Anonymizer is a command-line tool written in Go that anonymizes sensitive data in CSV files. It supports various anonymization techniques for specified columns and generates a new CSV file with anonymized data.
-
Input CSV Support: Accepts a CSV file with column headers as input.
-
Anonymization Techniques:
- Masking: Replaces sensitive data with partial data or asterisks.
- Pseudonymization: Replaces data with random placeholders.
- Hashing: Applies irreversible hashing (e.g., SHA-256).
- Generalization: Broadens specific values into categories.
- Phone Masking: Masks phone numbers, keeping the first three digits and masking the rest.
-
Customizable: Specify columns and techniques via command-line arguments.
-
Logging: Logs anonymization activities for transparency.
-
Security: Avoids storing sensitive data in memory longer than necessary.
- Go installed (version 1.20+).
- A Linux environment with a terminal (works on macOS and Windows as well).
- Clone the repository:
git clone https://github.com/computerscienceiscool/anonymizer.git cd anonymizer - Build the executable:
go build -o anonymizer
| Argument | Description |
|---|---|
--input |
Path to the input CSV file. |
--columns |
Comma-separated list of columns to anonymize. |
--techniques |
Comma-separated list of anonymization techniques. |
--help |
Displays usage information. |
| Technique | Description |
|---|---|
mask |
Replace data with partial information or asterisks. |
pseudonymize |
Replace data with random placeholders (e.g., "Person 1"). |
hash |
Apply irreversible hashing (e.g., SHA-256). |
generalize |
Replace numeric data with broader categories (e.g., 29 → "20-29"). |
phone_mask |
Keep the first three digits of a phone number, masking the rest (e.g., 555-****). |
./anonymizer --input test_data.csv --columns "Name,Email,Phone" --techniques "mask,hash,phone_mask"Name,Email,Age,Phone
John Doe,john.doe@example.com,29,555-1234
Jane Smith,jane.smith@example.com,34,555-5678
Name,Email,Age,Phone
Jo******,836f82db99121b3481011f16b49dfa5fbc714a0d1b1b9f784a1ebbbf5b39577f,29,555-****
Ja********,f2d1f1c853fd1f4be1eb5060eaae93066c877d069473795e31db5e70c4880859,34,555-****
New anonymization techniques can be added by modifying the anonymize function in main.go. Ensure each technique is properly tested.
- Support for additional file formats (e.g., Excel, JSON).
- Interactive mode for column selection and technique assignment.
- Batch processing for multiple files.
This project is licensed under the MIT License. See the LICENSE file for details.