Skip to content

data-minder/camouflage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Camouflage πŸ›‘οΈ

Python License Coverage Tests

Anonymize. Protect. Restore.
Flexible and reversible anonymization for modern Python workflows.


Ready to get started?

Install Camouflage with pip:

pip install camouflage

✨ What is Camouflage?

Camouflage lets you easily anonymize sensitive data, store reversible mappings, and restore the original dataset when needed β€” all while being fast, lightweight, and fully customizable.

  • πŸ”₯ Anonymize large datasets quickly.
  • πŸ› οΈ Add your own anonymizers easily (your data, your rules).
  • πŸ”„ Reversible by design β€” restore original values without headaches.
  • πŸ§ͺ 100% test coverage for maximum trust.
  • 🏎️ Tested on datasets with over 100,000 rows across 6 columns β€” handles big data smoothly.

πŸ“ˆ How it Works

Camouflage uses a one-to-one mapping to anonymize data. It generates a unique, consistent, and reversible mapping for each value. See Bijection on Wikipedia.

Camouflage guarantees that every anonymized value is unique, consistent, and traceable back β€” only when you need it. There are predefined anonymizers for common data types (facets), and you can easily add your own.

Available Facets

facet description example
age An int representing the age of a person. 25
amount A float representing a monetary amount. 100.50
country A str representing a country name. Germany
datetime A datetime.datetime object. 2023-02-01 00:00:00
ipv4 A str representing an IPv4 address. 213.209.12.210
user_agent A str representing a user agent. Mozilla/5.0 (Windows)
... Coming soon... ...

πŸš€ Quick Start

1️⃣ One-Time Anonymization

from camouflage import anonymize

original_value = "192.168.1.1"

anonymized_value = anonymize("ipv4", original_value)

2️⃣ Reversible Anonymization

from camouflage import anonymize, deanonymize, Transform

original_value = "192.168.1.1"

transform = Transform()

# Anonymize
anonymized_value = anonymize("ipv4", original_value, transform)

# Do something with the anonymized value
# ...

# De-anonymize
deanonymized_value = deanonymize("ipv4", anonymized_value, transform)

3️⃣ Anonymizing a Pandas DataFrame

import pandas as pd
from camouflage import PandasAdapter

df = pd.DataFrame({
    "ip": ["192.168.1.1", "10.0.0.1"],
    "joined_at": [pd.Timestamp("2023-01-01"), pd.Timestamp("2023-02-01")],
    "revenue": [1234.56, 7890.12],
})
# | ip          | joined_at           |   revenue |
# |:------------|:--------------------|----------:|
# | 192.168.1.1 | 2023-01-01 00:00:00 |   1234.56 |
# | 10.0.0.1    | 2023-02-01 00:00:00 |   7890.12 |

mapper = {
    "ip": "ipv4",
    "joined_at": "datetime",
    "revenue": "amount",
}

pd_adapter = PandasAdapter(mapper)

df_safe = pd_adapter.anonymize(df)
# | ip             | joined_at           |   revenue |
# |:---------------|:--------------------|----------:|
# | 137.224.91.30  | 2024-12-05 00:00:00 |   1279.97 |
# | 213.209.12.210 | 2023-06-27 00:00:00 |   5506.58 |

# Do something with the anonymized DataFrame
# ...

# When you want to restore:
original_df = pd_adapter.deanonymize(df_safe)
# | ip          | joined_at           |   revenue |
# |:------------|:--------------------|----------:|
# | 192.168.1.1 | 2023-01-01 00:00:00 |   1234.56 |
# | 10.0.0.1    | 2023-02-01 00:00:00 |   7890.12 |

🧩 Extending with Custom Anonymizers

Want to anonymize new types of data? Super easy:

1️⃣ Create your Custom Anonymizers

import random


def anonymize_color(_):  # It is crucial for the anonymizer to accept a single argument.
    return random.choice(['red', 'green', 'blue'])


def anonymize_red_channel(original_hex):
    hex_color = original_hex.lstrip('#')

    green = hex_color[2:4]
    blue = hex_color[4:6]

    random_red = random.randint(0, 255)

    return "#{:02X}{}{}".format(random_red, green, blue)

2️⃣ Register the Anonymizers

from camouflage import register_anonymizer

register_anonymizer('color', anonymize_color)
register_anonymizer('red_channel', anonymize_red_channel)

3️⃣ Use the Custom Anonymizers

from camouflage import anonymize

original_value = "cyan"
anonymized_value = anonymize("color", original_value)

original_hex = "#00FF00"
anonymized_hex = anonymize("red_channel", original_hex)

4️⃣ Or Use the Custom Anonymizers for Pandas

import pandas as pd
from camouflage import PandasAdapter

df = pd.DataFrame({
    "color": ["cyan", "magenta", "yellow"],
    "hex": ["#FF0000", "#00FF00", "#0000FF"],
})
# | color   | hex     |
# |:--------|:--------|
# | cyan    | #FF0000 |
# | magenta | #00FF00 |
# | yellow  | #0000FF |

mapper = {
    "color": "color",
    "hex": "red_channel",
}

pd_adapter = PandasAdapter(mapper)
df_safe = pd_adapter.anonymize(df)
# | color   | hex     |
# |:--------|:--------|
# | green   | #B90000 |
# | blue    | #96FF00 |
# | red     | #FD00FF |

βœ… That's it β€” now you can anonymize columns as "color" or "red_channel" either one-time or in adapters!


βœ… Quality You Can Trust

  • βœ… 100% code coverage (Pytest + Coverage)
  • βœ… PEP8 compliant, linted
  • βœ… Fast anonymization for datasets of 100,000+ rows
  • βœ… Extensible facet system
  • βœ… Tested and battle-ready

πŸ§ͺ Testing

Run tests on your setup with:

pip install pytest
pytest

πŸ“œ License

MIT License β€” do whatever you want, but be cool. ✌️


πŸ‘¨β€πŸ’» Made with ❀️ by Developers, for Developers.

Camouflage is built to empower privacy-first applications without slowing you down.


πŸ”— Links

Source Code: https://github.com/data-minder/camouflage
PyPI: https://pypi.org/project/camouflage/

About

Camouflage library, reversible data anonymization

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages