Skip to content

bumble-tech/private-detector

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

Private Detector

This is the repo for Bumble's Private Detector™ model - an image classifier that can detect lewd images.

The internal repo has been heavily refactored and released as a fully open-source project to allow for the wider community to use and finetune a Private Detector model of their own. You can download the pretrained SavedModel and checkpoint here

Model

The SavedModel can be found in saved_model/ within private_detector.zip above

The model is based on Efficientnet-v2 and trained on our internal dataset of lewd images - more information can be found at the whitepaper here or here

Inference

Inference is pretty simple and an example has been given in inference.py

python3 inference.py \
    --model saved_model/ \
    --image_paths \
        Yes_samples/1.jpg \
        Yes_samples/2.jpg \
        Yes_samples/3.jpg \
        Yes_samples/4.jpg \
        Yes_samples/5.jpg \
        No_samples/1.jpg \
        No_samples/2.jpg \
        No_samples/3.jpg \
        No_samples/4.jpg \
        No_samples/5.jpg \
Sample Output
Probability: 93.71% - Yes_samples/1.jpg
Probability: 93.43% - Yes_samples/2.jpg
Probability: 94.06% - Yes_samples/3.jpg
Probability: 94.08% - Yes_samples/4.jpg
Probability: 91.01% - Yes_samples/5.jpg
Probability: 9.76% - No_samples/1.jpg
Probability: 7.14% - No_samples/2.jpg
Probability: 8.83% - No_samples/3.jpg
Probability: 4.87% - No_samples/4.jpg
Probability: 5.29% - No_samples/5.jpg

Additional Training

You can finetune the model yourself on your own data, to do so is fairly simple - though you will need the checkpoint files as can be found in saved_checkpoint/ in private_detector.zip

Set up a JSON file with links to your image path lists for each class:

{
    "Yes": {
        "path": "/home/sofarrell/private_detector/Yes.txt",
        "label": 0
    },
    "No": {
         "path": "/home/sofarrell/private_detector/No.txt",
         "label": 1
    }
}

With each .txt file listing off the image paths to your images

/home/sofarrell/private_detector_images/Yes/1093840880_309463828.jpg
/home/sofarrell/private_detector_images/Yes/657954182_3459624.jpg
/home/sofarrell/private_detector_images/Yes/1503714421_3048734.jpg

You can create the training environment with conda:

conda env create -f environment.yaml
conda activate private_detector

And then retrain like so:

python3 ./train.py \
    --train_json /home/sofarrell/private_detector/train_classes.json \
    --eval_json /home/sofarrell/private_detector/eval_classes.json \
    --checkpoint_dir saved_checkpoint/ \
    --train_id retrained_private_detector

The training script has several parameters that can be tweaked:

Command Description Type Default
train_id ID for this particular training run str
train_json JSON file(s) which describes classes and contains lists of filenames of data files List[str]
eval_json Validation json file which describes classes and contains lists of filenames of data files str
num_epochs Number of epochs to train for int
batch_size Number of images to process in a batch int 64
checkpoint_dir Directory to store checkpoints in str
model_dir Directory to store graph in str .
data_format Data format: [channels_first, channels_last] str channels_last
initial_learning_rate Initial learning rate float 1e-4
min_learning_rate Minimal learning rate float 1e-6
min_eval_metric Minimal evaluation metric to start saving models float 0.01
float_dtype Float Dtype to use in image tensors: [16, 32] int 16
steps_per_train_epoch Number of steps per train epoch int 800
steps_per_eval_epoch Number of steps per evaluation epoch int 1
reset_on_lr_update Whether to reset to the best model after learning rate update bool False
rotation_augmentation Rotation augmentation angle, value <= 0 disables it float 0
use_augmentation Add speckle, v0, random or color distortion augmentation str
scale_crop_augmentation Resize image to the model's size times this scale and then randomly crop needed size float 1.4
reg_loss_weight L2 regularization weight float 0
skip_saving_epochs Do not save good checkpoint and update best metric for this number of the first epochs int 0
sequential Use sequential run over randomly shuffled filenames vs equal sampling from each class bool False
eval_threshold Threshold above which to consider a prediction positive for evaluation float 0.5
epochs_lr_update Maximum number of epochs without improvement used to reset/decrease learning rate int 20