Text encoder #6

renderless · 2022-05-27T10:22:30Z

Thank you for your awesome work. Do you have plan to release pretrained text encoder?

yinglinzheng · 2022-06-21T07:50:59Z

Hi, thanks for your attention.
The pretrained backbones we released contain the weights of the text encoder. In fact, you can load the weights of FaRL using exactly the same network structure as CLIP VIT-B16, and use it exactly like CLIP. Here I show the example modified from CLIP.

import torch
import clip
from PIL import Image

device ="cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/16", device="cpu")
model = model.to(device)
farl_state=torch.load("FaRL-Base-Patch16-LAIONFace20M-ep16.pth") # you can download from https://github.com/FacePerceiver/FaRL#pre-trained-backbones
model.load_state_dict(farl_state["state_dict"],strict=False)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text encoder #6

Text encoder #6

renderless commented May 27, 2022

yinglinzheng commented Jun 21, 2022

Text encoder #6

Text encoder #6

Comments

renderless commented May 27, 2022

yinglinzheng commented Jun 21, 2022