We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I tried the following piece of code present in the repo at location https://github.com/FreddeFrallan/Multilingual-CLIP/blob/main/src/multilingual_clip.py
The only changes I made is that I added print statements in between.
` import pickle
import torch import transformers
AVAILABLE_MODELS = { 'M-BERT-Distil-40': { 'model_name': 'M-CLIP/M-BERT-Distil-40', 'tokenizer_name': 'M-CLIP/M-BERT-Distil-40', 'head_name': 'M-BERT Distil 40 Linear Weights.pkl' },
'M-BERT-Base-69': { 'model_name': 'M-CLIP/M-BERT-Base-69', 'tokenizer_name': 'M-CLIP/M-BERT-Base-69', 'head_name': 'M-BERT-Base-69 Linear Weights.pkl' }, 'Swe-CLIP-500k': { 'model_name': 'M-CLIP/Swedish-500k', 'tokenizer_name': 'M-CLIP/Swedish-500k', 'head_name': 'Swedish-500k Linear Weights.pkl' }, 'Swe-CLIP-2M': { 'model_name': 'M-CLIP/Swedish-2M', 'tokenizer_name': 'M-CLIP/Swedish-2M', 'head_name': 'Swedish-2M Linear Weights.pkl' }, 'M-BERT-Base-ViT-B': { 'model_name': 'M-CLIP/M-BERT-Base-ViT-B', 'tokenizer_name': 'M-CLIP/M-BERT-Base-ViT-B', 'head_name': 'M-BERT-Base-69-ViT Linear Weights.pkl' },
}
class MultilingualClip2(torch.nn.Module): def init(self, model_name, tokenizer_name, head_name, weights_dir='data/weights/'): super().init() self.model_name = model_name self.tokenizer_name = tokenizer_name self.head_path = weights_dir + head_name
self.tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_name) self.transformer = transformers.AutoModel.from_pretrained(model_name) self.clip_head = torch.nn.Linear(in_features=768, out_features=640) self._load_head() def forward(self, txt): txt_tok = self.tokenizer(txt, padding=True, return_tensors='pt').to(device) embs = self.transformer(**txt_tok)[0] print('embs_text') print(embs.size()) att = txt_tok['attention_mask'] print('att_text') print(att.size()) embs = (embs * att.unsqueeze(2)).sum(dim=1) / att.sum(dim=1)[:, None] print('embs_text') print(embs.size()) p = self.clip_head(embs) print('clip head obj') print(self.clip_head) print('cliphed_text') print(p.size()) return p def _load_head(self): with open(self.head_path, 'rb') as f: lin_weights = pickle.loads(f.read()) self.clip_head.weight = torch.nn.Parameter(torch.tensor(lin_weights[0]).float().t()) self.clip_head.bias = torch.nn.Parameter(torch.tensor(lin_weights[1]).float()) print('ok') print(self.clip_head.weight.size()) print(self.clip_head.bias.size())
def load_model2(name): config = AVAILABLE_MODELS[name] return MultilingualClip2(**config)
mod = load_model2('M-BERT-Base-ViT-B') z = mod(Query[0]) `
Output for this code : ok torch.Size([512, 768]) torch.Size([512]) embs_text torch.Size([1, 6, 768]) att_text torch.Size([1, 6]) embs_text torch.Size([1, 768]) clip head obj Linear(in_features=768, out_features=640, bias=True) cliphed_text torch.Size([1, 512])
This output suggest that the file 'M-BERT-Base-69-ViT Linear Weights.pkl' doesn't have the size of 640 X 768 but 512 X 768
Is there any issue with the config then ?
The text was updated successfully, but these errors were encountered:
@FreddeFrallan please have a look
Sorry, something went wrong.
Hello, I looked into this issue too.
I think the issue is related to CLIP embedding size. Where 512 in ViT, 640 in ResNet. Since M-BERT-Base-69-ViT use CLIP ViT, the 512 seems right.
However I think out_features should be included in configuration for prevent misunderstanding.
No branches or pull requests
I tried the following piece of code present in the repo at location https://github.com/FreddeFrallan/Multilingual-CLIP/blob/main/src/multilingual_clip.py
The only changes I made is that I added print statements in between.
`
import pickle
import torch
import transformers
AVAILABLE_MODELS = {
'M-BERT-Distil-40': {
'model_name': 'M-CLIP/M-BERT-Distil-40',
'tokenizer_name': 'M-CLIP/M-BERT-Distil-40',
'head_name': 'M-BERT Distil 40 Linear Weights.pkl'
},
}
class MultilingualClip2(torch.nn.Module):
def init(self, model_name, tokenizer_name, head_name, weights_dir='data/weights/'):
super().init()
self.model_name = model_name
self.tokenizer_name = tokenizer_name
self.head_path = weights_dir + head_name
def load_model2(name):
config = AVAILABLE_MODELS[name]
return MultilingualClip2(**config)
mod = load_model2('M-BERT-Base-ViT-B')
z = mod(Query[0])
`
Output for this code :
ok
torch.Size([512, 768])
torch.Size([512])
embs_text
torch.Size([1, 6, 768])
att_text
torch.Size([1, 6])
embs_text
torch.Size([1, 768])
clip head obj
Linear(in_features=768, out_features=640, bias=True)
cliphed_text
torch.Size([1, 512])
This output suggest that the file 'M-BERT-Base-69-ViT Linear Weights.pkl' doesn't have the size of 640 X 768 but 512 X 768
Is there any issue with the config then ?
The text was updated successfully, but these errors were encountered: