Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Enable ad-hoc recognizers for presidio #2099

Closed
krrishdholakia opened this issue Feb 20, 2024 · 2 comments
Closed

[Feature]: Enable ad-hoc recognizers for presidio #2099

krrishdholakia opened this issue Feb 20, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@krrishdholakia
Copy link
Contributor

The Feature

curl -d '{
  "text": "John Smith AHV number is 756.3026.0705.92. Zip code: 10023",
  "language": "en",
  "ad_hoc_recognizers": [
    {
      "name": "Zip code Recognizer",
      "supported_language": "en",
      "patterns": [
        {
          "name": "zip code (weak)",
          "regex": "(\b\d{5}(?:\-\d{4})?\b)",
          "score": 0.01
        }
      ],
      "context": ["zip", "code"],
      "supported_entity": "ZIP"
    },
    {
      "name": "Swiss AHV Number Recognizer",
      "supported_language": "en",
      "patterns": [
        {
          "name": "AHV number (strong)",
          "regex": "(756\.\d{4}\.\d{4}\.\d{2})|(756\d{10})",
          "score": 0.95
        }
      ],
      "context": ["AHV", "social security", "Swiss"],
      "supported_entity": "AHV_NUMBER"
    }
  ]
}' -H "Content-Type: application/json" -X POST http://localhost:3000/analyze

this detects the swiss AHV Number: [{"analysis_explanation": null, "end": 41, "entity_type": "AHV_NUMBER", "recognition_metadata": {"recognizer_identifier": "Swiss AHV Number Recognizer_140453887187744", "recognizer_name": "Swiss AHV Number Recognizer"}, "score": 0.95, "start": 25}, {"analysis_explanation": null, "end": 10, "entity_type": "PERSON", "recognition_metadata": {"recognizer_identifier": "SpacyRecognizer_140453912086800", "recognizer_name": "SpacyRecognizer"}, "score": 0.85, "start": 0}, {"analysis_explanation": null, "end": 58, "entity_type": "ZIP", "recognition_metadata": {"recognizer_identifier": "Zip code Recognizer_140453887187264", "recognizer_name": "Zip code Recognizer"}, "score": 0.4, "start": 53}]

Motivation, pitch

user request:

  • it would be great if you could extent the config to support additional ad-hoc recognizer. It would give everything a huge flexibility without changing the presidio images.

Twitter / LinkedIn details

No response

@krrishdholakia
Copy link
Contributor Author

No ad-hoc recognizer result: <PERSON> AHV number is 756.3026.0705.92. Zip code: <US_DRIVER_LICENSE>

With ad-hoc recognizer result: <PERSON> AHV number is <AHV_NUMBER>. Zip code: <US_DRIVER_LICENSE>

@krrishdholakia
Copy link
Contributor Author

merged into main, will be out in next release vv1.26.5+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant