Project is Deployed here
Here are some text and images examples you can add/upload and search by
Two features are implemented in this project.
- Search Image from text.
- Search Image from image (Similar Image Search.)
Embeddings for images in the image Repository are generated and stored in the annoy index.
Embeedings are generated using OpenAI's Clip pre-trained model. More details Here.
Project uses pyenv and pytest for testing. Detail instructions and troubleshooting in here
The major components and data flow of the project are as follows:
- Search API is designed in way it can be extracted to a REST API. Making the search API available to other applications.
- As the model is designed to learn from natural language and image, we can fine tune the model to learn from live data and images.
- Embedding Generation can be speeded up by using a GPU.
- Can be scaled with the same API to server more than 100k images using vector search engine like Milvus
- Make sure the plan is doable MVP (in ipynb)
- Annoy tree store and retrieve - done
- Image feature extraction - done
- Image search is it any good? - Good, as far as we have a good amount of images indexed
- Back End - (Using flask)
- API to search - Building a react app would be nice, but time-consuming.So using Flask Views
- API to upload image for search - Done
- API to add images to repo(may be) - Images can be uploaded by the user to index. But index is batch based, would need be done by the user. In future we can add a airflow job to run every hour to keep the index updated.
- Front End - (Using boot strap)
- Grid of images limit to 16 images a page
- Search text box
- Search image box
- Make the user experience delightful
- Deployment
- local server setup instructions
- DockerFile so that it is easy to get the application running.
- Heroku deployment - deployed to digital ocean instead of heroku
- Automate index creation when new images are added to the repo.
- Use fastAPI to get, so that API can be used by other projects
- React App would be nice.
- Host all image assets on GCS or Digital Ocean Storage - Will also reduce the image size.
- Should be able to see the work, without any installation. (Deployed version)
- Should be able to run the project in 10 min. (docker or simple <5 steps)