I built this project to remove unwanted objects from photos — not just the object itself, but also the shadow and reflection it leaves behind. You give it an image and a mask (telling it what to remove), and it gives you back a clean photo as if the object was never there.
Under the hood it uses a diffusion model (FLUX) with a LoRA adapter trained specifically for removal. I took the idea from a research paper called OmniEraser and set up the code so I can experiment with it, train on my own data, and plug it into a simple web UI.
Here are some real examples — input on the left, output on the right:
| Input (with object) | Output (object removed) |
|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Notice how shadows and reflections are also cleanly removed, not just the object itself.
- Remove objects from real-world photos (people, signs, cars, random clutter — anything you mask).
- Handles tricky stuff like shadows and reflections that most inpainting tools leave behind.
- Works on anime/illustration style images too, not just photographs.
- Comes with a Gradio web interface so you can upload, draw a mask, and see the result instantly.
- You can also train your own LoRA if you have paired data (with-object vs without-object images).
I kept things simple — there are only a few folders and each one has one job:
.
├── omnieraser/ ← The actual model code. You import from here.
│ ├── pipeline_flux_control_removal.py (the diffusion pipeline)
│ └── utils.py (training helpers)
│
├── scripts/ ← Things you run from the terminal.
│ ├── test_control_lora_flux.py (quick test — does it work?)
│ ├── gradio_control_lora_flux.py (web UI)
│ ├── train_control_lora_flux.py (training script)
│ └── train_control_lora_flux.sh (shell command to start training)
│
├── configs/ ← Settings for training (GPU count, precision, etc.)
│ └── accelerate.yaml
│
├── example/ ← A few sample images and masks to play with.
│ ├── image/
│ └── mask/
│
├── ControlNet_version/ ← A different variant that uses ControlNet. Self-contained.
├── requirements.txt ← Python dependencies
└── pyproject.toml ← So you can do `pip install -e .`
The rule is straightforward:
omnieraser/is the library — you import stuff from it.scripts/is where you run things — test, train, launch the UI.- Everything else is config, data, or the website.
If you want the full walkthrough of every file, check docs/PROJECT_STRUCTURE.md.
For the deep technical stuff:
- docs/ARCHITECTURE.md — model components, tensor shapes, LoRA config, channel expansion
- docs/INFERENCE.md — line-by-line inference walkthrough with shapes at every stage
- docs/TRAINING.md — dataset format, loss function, all hyperparameters, memory requirements
git clone https://github.com/Gaurav14cs17/ImageObjectRemoval.git
cd ImageObjectRemoval
python -m venv .venv
source .venv/bin/activatepip install -r requirements.txt
pip install -e .That second command (pip install -e .) registers the omnieraser folder as a proper Python package. After that, from omnieraser import ... works from anywhere.
If you don't want to do the editable install for some reason, you can also just run commands with PYTHONPATH=. in front, like PYTHONPATH=. python scripts/test_control_lora_flux.py.
The fastest way to see it work:
python scripts/test_control_lora_flux.pyThis grabs the FLUX model and the LoRA weights from Hugging Face (first run will download a few GB), takes one of the sample images from example/, removes the masked object, and saves the result as flux_inpaint.png.
python scripts/gradio_control_lora_flux.pyThis opens a local page at http://localhost:7999. Upload any photo, draw over the thing you want gone, hit the button, done.
If you have your own dataset of paired images (same scene with and without the object):
- Open
scripts/train_control_lora_flux.shand change the dataset path and output path. - Check
configs/accelerate.yamlmatches your GPU setup. - Run it:
bash scripts/train_control_lora_flux.shHere is the short version:
-
FLUX is a text-to-image diffusion model (like Stable Diffusion, but newer). It normally takes text and generates an image from noise.
-
We modify its input layer so it can also accept the original photo and a mask alongside the noise. That means the model now has 4x the input channels it normally does (64 becomes 256).
-
A LoRA adapter (a small trainable layer, rank 32) is plugged into the model and trained on paired data — thousands of images where we know what the scene looks like with and without the object.
-
At inference time, you feed in the image + mask, the model runs 28 denoising steps, and you get back a clean version. The text prompt is just something like "There is nothing here."
┌─────────────┐ ┌──────────┐
│ Your photo │ │ Mask │
└──────┬───────┘ └────┬─────┘
│ │
└────────┬────────┘
│
v
┌────────────────┐
│ VAE Encoder │
└───────┬─────────┘
│
v
┌────────────────────────┐ ┌──────────────────────┐
│ Image latents │ │ Text prompt │
│ + Mask latents │ │ "There is nothing │
│ + Random noise │ │ here." │
└───────────┬────────────┘ └──────────┬─────────────┘
│ │
│ v
│ ┌──────────────────────┐
│ │ Text Encoders │
│ │ (CLIP + T5) │
│ └──────────┬───────────┘
│ │
│ v
│ ┌──────────────────────┐
│ │ Text embeddings │
│ └──────────┬───────────┘
│ │
└──────────────┬───────────────┘
│
v
┌──────────────────────────────┐
│ FLUX Transformer │
│ + LoRA adapter (rank 32) │
│ │
│ Runs 28 denoising steps │
└──────────────┬────────────────┘
│
v
┌──────────────────┐
│ Cleaned latents │
└────────┬─────────┘
│
v
┌────────────────┐
│ VAE Decoder │
└────────┬───────┘
│
v
┌────────────────────┐
│ Clean output photo │
│ (object + shadow │
│ removed) │
└─────────────────────┘
Step by step, this is what happens when you run inference:
- Your photo and mask go through the VAE encoder and get turned into small latent tensors.
- The text prompt gets encoded separately by CLIP and T5 text encoders.
- These latents (image + mask + random noise) and text embeddings are fed into the FLUX transformer, which has the LoRA adapter plugged in.
- The transformer runs 28 denoising steps — each step it predicts and removes a bit of noise, gradually revealing the clean version underneath.
- The final cleaned latents go through the VAE decoder and come out as a normal image.
┌─────────────────────────────────┐
│ Training dataset │
│ (paired with-object & │
│ without-object images) │
└────────────────┬────────────────┘
│
v
┌───────────────────────┐
│ PairedRandomCrop │
│ (augmentation) │
└───────────┬───────────┘
│
v
┌────────────────┐
│ VAE Encoder │
└───┬────────┬───┘
│ │
v v
┌──────────────┐ ┌─────────────────────┐
│ Ground truth │ │ Condition latents │
│ latents │ │ (image with object │
│ (clean img) │ │ + mask) │
└──────┬───────┘ └──────────┬───────────┘
│ │
v │
┌──────────────┐ │ ┌───────────────┐
│ Add noise │ │ │ Text prompt │
│ (random │ │ └───────┬───────┘
│ timestep) │ │ │
└──────┬───────┘ │ v
│ │ ┌───────────────┐
v │ │ Text Encoders │
┌──────────────┐ │ │ (frozen) │
│ Noisy latents│ │ └───────┬───────┘
└──────┬───────┘ │ │
│ │ │
└─────────┬───────────┴─────────────────┘
│
v
┌───────────────────────────────────┐
│ FLUX Transformer (frozen) │
│ + LoRA adapter (trainable) │
└──────────────────┬────────────────┘
│
v
┌───────────────────┐
│ Predicted noise │
└─────────┬─────────┘
│
v
┌─────────────────────────────────────┐
│ Loss = predicted noise vs actual │
└──────────────────┬──────────────────┘
│
│ backprop
│ (only updates LoRA weights)
v
┌────────┐
│ Done │
└─────────┘
During training, the base FLUX model is frozen — only the small LoRA layers get updated. That's why it trains fast and the weight file is small (just the LoRA adapter, not the whole model).
┌──────────────────────────────────────────────────────────────────────┐
│ Hugging Face Hub │
│ (FLUX.1-dev + LoRA weights) │
└────────────┬──────────────────┬──────────────────┬───────────────────┘
│ │ │
v v v
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│ test_control_ │ │ gradio_control_ │ │ train_control_ │
│ lora_flux.py │ │ lora_flux.py │ │ lora_flux.py │
│ (quick test) │ │ (web UI) │ │ (training loop) │
└───┬─────────┬───┘ └───┬─────────┬───┘ └──┬──────────┬────┬───┘
│ │ │ │ │ │ │
│ │ │ │ │ │ │
┌───┴─────────┴───────────┴─────────┘ │ │ │
│ imports from │ │ │
v v │ │
┌──────────────────────────────────┐ ┌────────────┐ │ │
│ omnieraser/ │ │ omnieraser/ │ │ │
│ pipeline_flux_control_ │ │ utils.py │ │ │
│ removal.py │ └─────────────┘ │ │
│ (diffusion pipeline) │ │ │
└──────────────────────────────────┘ │ │
│ │
┌──────────────────────┐ ┌───────────────────────┐ │ │
│ example/ │ │ configs/ │ │ │
│ image/ + mask/ │ │ accelerate.yaml │───────┘ │
│ (sample data) │────┘ (GPU & precision) │ │
└──────────────────────┘ └────────────────────────┘ │
│
┌────────────────────────┐ │
│ train_control_ │ │
│ lora_flux.sh │───────────┘
│ (launch command) │
└─────────────────────────┘
This diagram shows how the folders talk to each other. The core pipeline lives in omnieraser/, the scripts import from it, weights come from Hugging Face, and sample data comes from example/.
Everything downloads automatically the first time you run the code. Here is what gets pulled:
- FLUX backbone — black-forest-labs/FLUX.1-dev (the base diffusion model)
- LoRA weights — theSure/Omnieraser (the fine-tuned adapter for object removal)
- ControlNet weights (if you use that variant) — theSure/Omnieraser_Controlnet_version
You don't need to download anything manually.
I'd suggest going through the files in this order:
-
scripts/test_control_lora_flux.py— Start here. It's about 70 lines and shows the full flow: load model, inject LoRA, process one image, save result. You'll understand the whole pipeline just from this. -
omnieraser/pipeline_flux_control_removal.py— This is the big one. It has the actual diffusion pipeline: how the image gets encoded into latents, how the denoising loop runs, how the final image gets decoded. Read it after you understand what the test script does. -
omnieraser/utils.py— Small file. Just hasPairedRandomCrop, which is used during training to randomly crop the image and mask together so they stay aligned. -
scripts/train_control_lora_flux.py— The training loop. How data is loaded, how the LoRA is configured, how the loss is computed, how checkpoints are saved. This one is long (~1400 lines) but most of it is boilerplate from HuggingFace's training examples. -
scripts/gradio_control_lora_flux.py— Shows how to wrap the pipeline in a web UI. Good to read if you want to build your own interface later.
There's a separate folder called ControlNet_version/. It does the same job (object removal) but uses ControlNet instead of plain LoRA for better background consistency. It's completely self-contained — has its own pipeline, its own model files, its own training script, even its own requirements.txt.
If you want to try it, just go into that folder and follow its own instructions. You don't need to understand it to use the main project.
How this method compares against other removal approaches on a real test set:
The method and the pre-trained weights come from this paper:
OmniEraser: Remove Objects and Their Effects in Images with Paired Video-Frame Data Runpu Wei, Zijin Yin, Shuo Zhang, Lanxiang Zhou, Xueyi Wang, Chao Ban, Tianwei Cao, Hao Sun, Zhongjiang He, Kongming Liang, Zhanyu Ma arXiv:2501.07397 (2025)
@article{wei2025omnieraserremoveobjectseffects,
title = {OmniEraser: Remove Objects and Their Effects in Images
with Paired Video-Frame Data},
author = {Runpu Wei and Zijin Yin and Shuo Zhang and Lanxiang Zhou
and Xueyi Wang and Chao Ban and Tianwei Cao and Hao Sun
and Zhongjiang He and Kongming Liang and Zhanyu Ma},
journal = {arXiv preprint arXiv:2501.07397},
year = {2025},
url = {https://arxiv.org/abs/2501.07397},
}This is a research/learning project. If you plan to use it commercially, check the licenses for FLUX.1-dev and the OmniEraser weights first — they have their own terms.









