You’ve been there too — setting up a data loss prevention solution might be a damn long project (DLP), if you need to support multiple languages and don’t have adequate data sources.
This repository consolidate Data Loss/Leak Prevention insight and sample files (e.g., datasets), that I have collected and used over the years. Your quality assurance library does not have to be unique, everyone strives for consistency.
Fork this repository, and improve your library. Even better, send me an update 😆.
A DLP solution is a set of enterprise processes, tools, and techniques that monitor sensitive information and prevent data exfiltration.
I wasn't happy with the provided bundle of mock files to test my DLP policies and demonstrate compliance. They were either too simple or not localized for my use case.
Friend don’t let friends test the effectiveness of a DLP solution with production data. You need realistic test data1 in several formats such as CSV, JSON, SQL, TXT, and Excel to make sure your DLP Policies are working correctly especially after a significant change.
dataLossPrevention by Benoît H. Dicaire is shared with an unlicense. For more information, please refer to unlicense.org.
Name | Cybersecurity | Finance | Legal | Personal | Technology |
---|---|---|---|---|---|
DLP Test | X | X | X | X | X |
Fake Person Generator | X | X | X | X | X |
Fake Generator | X | X | X | X | X |
GenerateData.com2 | X | X | X | X | X |
Get Fake Data | X | X | X | X | X |
Get Bored Human | X | X | X | X | X |
Mockaroo | X | X | X | X | X |
Mock Turtle | X | X | X | X | X |
Venkom | X | X | X | X | X |
You can also search on GitHub for library code and C tool related to data-generator, fake-data, mock-data , mock-data-generator, and test data.
Footnotes
-
Refer to the sensitive information type entity definitions provided by Microsoft for more information about the required structure. ↩
-
Source code is available on GitHub/benkeen/generatedata ↩