GIFZA

Semantic search for your memes, stickers, GIFs, and reaction images.

Why?

Have you ever struggled to find that perfect sticker or GIF for a group chat? Or that selfie you took with your besties on a trip a few years ago? Well I have. That is why I built GIFZA.

GIFZA (GIF - ZAH) is a local-first semantic search engine for visual assets. Upload an image, sticker, or GIF, annotate it naturally, and retrieve it later using meaning instead of exact filenames or manual tags.

Searches like:

"sad cat staring into space"
"strict asian obama"
"low quality shocked reaction image"
"anime girl crying aggressively"

...will retrieve semantically similar assets, even if those exact words were never used in the filename or tags.

Features

Local-first on-device inference
Semantic image retrieval using vector embeddings
Approximate nearest neighbour (ANN) search
Natural language querying
Shared embedding space between images and text
Fast retrieval with ObjectBox vector search
Offline-capable
Native mobile inference with ExecuTorch

Heres how it works

GIFZA uses Apple's MobileCLIP-S1 model, split into separate image and text encoders. Both encoders project their inputs into a shared embedding space, enabling natural language queries to retrieve visually and semantically related assets.

Search Flow:

User enters a query (e.g., "confused cat at 3am")
Text encoder converts query into an embedding vector
ObjectBox runs ANN similarity search against stored image embeddings
Nearest matching assets are returned instantly

Architecture

                ┌──────────────────┐
                │  Image / GIF     │
                └────────┬─────────┘
                         │
                         ▼
                ┌──────────────────┐
                │ MobileCLIP Image │
                │     Encoder      │
                └────────┬─────────┘
                         │
                  Image Embedding
                         │
                         ▼
                ┌──────────────────┐
                │    ObjectBox     │
                │  Vector Storage  │
                └────────┬─────────┘
                         ▲
                  ANN Search
                         │
                Query Embedding
                         │
                ┌────────┴─────────┐
                │ MobileCLIP Text  │
                │     Encoder      │
                └──────────────────┘

Technical Details

Embeddings & Cross-Modal Retrieval

Both image and text modalities exist in the same latent vector space. This enables zero-shot cross-modal retrieval without requiring paired training data at inference time.

Vector Storage & Retrieval

Embeddings are stored in ObjectBox as key-value pairs alongside asset metadata. On search, the query text is embedded, and ObjectBox performs native ANN similarity search to return the closest image vectors.

So a typical storage would look something like:

Image of a cat => Image location on users file system( we do not duplicate their assets) 'My cat'(Users annotation) => Same Image location

This means that even if the query does not match the image embedings, it will at least match the annotations embedding.

Heres a pretty good visualization I made with Manim

On-Device Inference

All inference runs locally. Both MobileCLIP encoders were converted from PyTorch to ExecuTorch (.pte) format for mobile execution. This enables:

Fully offline retrieval
Low-latency inference
Private, local asset indexing
Zero external API dependencies

But converting to Executorch also brought about a problem, tokenization! before generating text embeddings, we need to first tokenize + pad the query(or annotation), but we could not trace the tokenizer and pack it into .pte convert model, the solution to this was to download the models tokenizer.json and then use the dart sentencepice package for BPE tokenization.

Stack

Flutter
ObjectBox
Executorch
Apple MobileCLIP S1 OpenCLIP

Want to try it out?

First, run the install script at the project root

chmod +x install.sh
./install.sh

and then run the conversion script

python3 download_and_convert.py

Run the app!

flutter run --release

PS: release mode gives us a little better inference performance

Challenges

Some of the more interesting engineering hurdles included:

Splitting MobileCLIP into separate, deployable encoders
Achieving reasonable inference latency for the text encoder on mobile(still a challenge!!)
Converting PyTorch models to ExecuTorch without losing accuracy
Handling unsupported tokenizer tracing and implementing manual BPE
Maintaining embedding consistency across image and text modalities
Integrating ANN search smoothly with ObjectBox

Feedback and Contributions

Anything you want me to know? Improvements I should make? You're welcome to file a PR or open an issue!

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
android		android
assets		assets
ios		ios
lib		lib
linux		linux
macos		macos
test/services		test/services
web		web
windows		windows
.gitignore		.gitignore
.metadata		.metadata
README.md		README.md
analysis_options.yaml		analysis_options.yaml
devtools_options.yaml		devtools_options.yaml
download_and_convert.py		download_and_convert.py
install.sh		install.sh
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GIFZA

Why?

Features

Heres how it works

Architecture

Technical Details

Embeddings & Cross-Modal Retrieval

Vector Storage & Retrieval

On-Device Inference

Stack

Want to try it out?

Challenges

Feedback and Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GIFZA

Why?

Features

Heres how it works

Architecture

Technical Details

Embeddings & Cross-Modal Retrieval

Vector Storage & Retrieval

On-Device Inference

Stack

Want to try it out?

Challenges

Feedback and Contributions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages