GUing: A Mobile GUI Search Engine using a Vision-Language Model

GUing is a GUI search engine based on a large vision-language model called GUIClip, which we trained specifically for the app GUI domain. For this, we first collected app introduction images from Google Play, which usually display the most representative screenshots selected and often captioned (i.e.~labeled) by app vendors. Then, we developed an automated pipeline to classify, crop, and extract the captions from these images. This finally results in a large dataset which we share with this paper: including 303k app screenshots, out of which 135k have captions. We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind in GUI retrieval.

If you find our work useful, please cite our paper:

@misc{wei2024guing,
      title={GUing: A Mobile GUI Search Engine using a Vision-Language Model}, 
      author={Jialiang Wei and Anne-Lise Courbis and Thomas Lambolais and Binbin Xu and Pierre Louis Bernard and Gérard Dray and Walid Maalej},
      year={2024},
      eprint={2405.00145},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

Setup

Install poetry
Install dependencies

poetry install

Set the environment variables

export GUI_SEARCH_ROOT_PATH= your project path
export MONGODB_URI= your mongodb uri

GUIClip

You can find our GUIClip model that was pretrained on SCapRepo, Screen2Words and Clarity dataset at: https://huggingface.co/Jl-wei/guiclip-vit-base-patch32

GUIClip for GUI retrieval

The code for GUIClip training and testing is in the retrieval folder.

To train the classifier, use the train_clip.py script.

To run the classifier on whole dataset, use the test.py script.

GUIClip for image classification

The code is in the more_applications/classification folder.

GUIClip for text-to-sketch retrieval

The code is in the more_applications/sketch folder.

Dataset

The labels for the SCapRepo (Google Play Screenshot Caption) dataset are available at dataset/google_play/captioning. Due to their substantial size, we are unable to provide the images directly. However, you can download these images from Google Play using the IDs provided in the data.jsonl file, and then crop them according to the specified bounding boxes.

The datasets utilized for training the image classifier (dataset/google_play/classification) and the screen cropper (dataset/google_play/detection) will be released upon the acceptance of our paper.

You can access the other datasets by downloading them from their respective websites.

Dataset Creation Pipeline

Image Classification

The code for classification is in the classification folder.

To train the classifier, use the train_image_classification.py script.

To run the classifier on whole dataset, use the run_image_classification.py script.

Screen Cropping

The code for cropping screens is in the detection folder.

To train the detector, use the train_object_detection.py script.

To run the detector on whole dataset, use the run_object_detection.py script.

Caption Recognition

The code for caption extraction is in the ocr folder.

To run the detector on whole dataset, use the run_ocr.py script.

Search Engine

The code for GUing, our GUI search engine, is in the search_engine folder.

You need to add data to MongoDB and create faiss index before use it.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
classification		classification
dataset		dataset
detection		detection
more_applications		more_applications
ocr		ocr
retrieval		retrieval
search_engine		search_engine
.gitignore		.gitignore
LICENSE		LICENSE
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GUing: A Mobile GUI Search Engine using a Vision-Language Model

Setup

GUIClip

GUIClip for GUI retrieval

GUIClip for image classification

GUIClip for text-to-sketch retrieval

Dataset

Dataset Creation Pipeline

Image Classification

Screen Cropping

Caption Recognition

Search Engine

About

Releases

Languages

License

Jl-wei/guing

Folders and files

Latest commit

History

Repository files navigation

GUing: A Mobile GUI Search Engine using a Vision-Language Model

Setup

GUIClip

GUIClip for GUI retrieval

GUIClip for image classification

GUIClip for text-to-sketch retrieval

Dataset

Dataset Creation Pipeline

Image Classification

Screen Cropping

Caption Recognition

Search Engine

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages