Skip to content
🔥 Use pre-trained models in PyTorch to extract vector embeddings for any image
Branch: master
Clone or download
Latest commit 4a48301 Nov 29, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
example added clustering example and changed readme Nov 3, 2017
.gitignore init Sep 21, 2017
README.md added clustering example and changed readme Nov 3, 2017
img_to_vec.py patch alexnet Nov 29, 2018

README.md

Image 2 Vec with PyTorch

Applications of image embeddings:

  • Ranking for recommender systems
  • Clustering images to different categories
  • Classification tasks

Available models

  • Resnet-18 (CPU, GPU)
    • Returns vector length 512
  • Alexnet (CPU, GPU)
    • Returns vector length 4096

Installation

Tested on Python 3.6

Dependencies

Pytorch: http://pytorch.org/

Pillow: pip install Pillow

For running the example, you will additionally need:
  • Sklearn pip install scikit-learn

Running the example

git clone https://github.com/christiansafka/img2vec.git

cd img2vec/example

python test_img_to_vec.py

Expected output

Which filename would you like similarities for?
cat.jpg
0.72832 cat2.jpg
0.641478 catdog.jpg
0.575845 face.jpg
0.516689 face2.jpg

Which filename would you like similarities for?
face2.jpg
0.668525 face.jpg
0.516689 cat.jpg
0.50084 cat2.jpg
0.484863 catdog.jpg

Try adding your own photos!

Using img2vec as a library

from img_to_vec import Img2Vec
from PIL import Image

# Initialize Img2Vec with GPU
img2vec = Img2Vec(cuda=True)

# Read in an image
img = Image.open('test.jpg')
# Get a vector from img2vec
vec = img2vec.get_vec(img)

Img2Vec Params

cuda = (True, False)   # Run on GPU?     default: False
model = ('resnet-18', 'alexnet')   # Which model to use?     default: 'resnet-18'

Advanced users


Additional Parameters

layer = 'layer_name' or int   # For advanced users, which layer of the model to extract the output from.   default: 'avgpool'
layer_output_size = int   # Size of the output of your selected layer

Resnet-18

Defaults: (layer = 'avgpool', layer_output_size = 512)
Layer parameter must be an string representing the name of a layer below

conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
bn1 = nn.BatchNorm2d(64)
relu = nn.ReLU(inplace=True)
maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
layer1 = self._make_layer(block, 64, layers[0])
layer2 = self._make_layer(block, 128, layers[1], stride=2)
layer3 = self._make_layer(block, 256, layers[2], stride=2)
layer4 = self._make_layer(block, 512, layers[3], stride=2)
avgpool = nn.AvgPool2d(7)
fc = nn.Linear(512 * block.expansion, num_classes)

Alexnet

Defaults: (layer = 2, layer_output_size = 4096)
Layer parameter must be an integer representing one of the layers below

alexnet.classifier = nn.Sequential(
            7. nn.Dropout(),                  < - output_size = 9216
            6. nn.Linear(256 * 6 * 6, 4096),  < - output_size = 4096
            5. nn.ReLU(inplace=True),         < - output_size = 4096
            4. nn.Dropout(),		      < - output_size = 4096
            3. nn.Linear(4096, 4096),	      < - output_size = 4096
            2. nn.ReLU(inplace=True),         < - output_size = 4096
            1. nn.Linear(4096, num_classes),  < - output_size = 4096
        )

To-do

  • Benchmark speed and accuracy
  • Add ability to fine-tune on input data
  • Export documentation to a normal place
  • Package for Pip
You can’t perform that action at this time.