open-redact is an open source api to help anonymize pdf. This can be used can be used to redact names and other identifiable information from resume before review to create a more equitable hiring process.
Today open-redact supports the following redactions *People's names *email address
This package uses a named entity recognition model. By default it is set to their en_core_web_lg model, but you can choose a smaller model to make development easier or a larger model for more performance. You can also select models for other languages. Check out for English options.
2 places need to be edited to use a different model. Inside the dockerfile the following line should be edited to install the model of your choice.
RUN python -m spacy download en_core_web_lg
Inside app/main/ the following line should be edited to install the model of your choice.
nlp = spacy.load("en_core_web_lg")
Clone from source and build an image using the included docker file
docker build --tag openredact:python .
If not using image be sure manually install your named entity recognition model with the following
python -m spacy download en_core_web_lg
When up and running the system auto generates swagger documentation which can be viewed at where the address and port should be updated for your deployment.
From root run the following command to execute all unit tests
python -m pytest .
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.