Access to model definitions and training/validation data? #108

nlykkei · 2021-05-29T09:06:53Z

Would it be possible to get access to model definitions and training/validation data for the models used in SAP/credential-digger?

I'm interested to see how these models were trained, and to possible contribute to their future development.

Currently it seems that only trained models are available for download.

SlimTrabelsi · 2021-06-01T07:30:48Z

Hi @nlykkei,

Thank you for the interest to the project.
I'll start first with a clarification with regards to the training/validation data. Currently we trained two types of Models, one based on real data that we keep internal (for privacy reasons), and a second one that is open source, that is trained using synthetic generated data. If you are interested we can give you more details on how this data is generated or how to train your own data (already some details are avaialble in our publication here ).
If you are interested in contributing to the project or if you want to deploy it in your professional environment , let's then have a call together with the team and discuss this in details. You can join me directly on my e-mail that you will find in the publication ;) .
Best regards
Slim

nlykkei · 2021-06-08T11:25:21Z

Hi @SlimTrabelsi

Thanks for your reply,

If you are interested we can give you more details on how this data is generated or how to train your own data (already some details are avaialble in our publication here ).

I'd be very grateful, if you'd provide more details than already provided in the publication.

Personally, I've been working on a similar problem, but it has been very difficult to progress from a strict set of regular expressions (blacklist) to using ML to decide on results that are hard to express using regular expressions without introducing too many false positives (e.g. social security numbers: \d{8}[-: ]?\d{4}).

The experience I have gained is that it was only possible to identify sensitive data given a sufficient amount of context in its neighbourhood (e.g. think of a URL, https://user:pass@example.com/foo/bar).

My experience with ML is elementary university courses and DeepLearning.AI certifications. Would you say that my skill level is inadequate to develop this kind of system?

Best regards
Nicolas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access to model definitions and training/validation data? #108

Access to model definitions and training/validation data? #108

nlykkei commented May 29, 2021

SlimTrabelsi commented Jun 1, 2021 •

edited

nlykkei commented Jun 8, 2021 •

edited

Access to model definitions and training/validation data? #108

Access to model definitions and training/validation data? #108

Comments

nlykkei commented May 29, 2021

SlimTrabelsi commented Jun 1, 2021 • edited

nlykkei commented Jun 8, 2021 • edited

SlimTrabelsi commented Jun 1, 2021 •

edited

nlykkei commented Jun 8, 2021 •

edited