-
Notifications
You must be signed in to change notification settings - Fork 370
Description
Hi,
First of all, thank you for your outstanding work on Evo 2. It has been incredibly valuable for my research.
I'm using the Evo 2 7B model to obtain sequence-level representations and would appreciate clarification on the expected behavior of the embedding vectors.
I'm working on obtaining sequence-level representations from the Evo 2 7B model, but I'm not sure the output is in the right form.
I would really appreciate if you check out my procedure and code.
Here is the process I'm currently following to extract embeddings:
- Prepare a sequence and tokenize it into byte-level token IDs.
- Perform a forward pass and extract token embeddings from layer "blocks.28.mlp.l3".
- Take the mean across token embeddings (dimension-wise) to get a single vector representing the sequence.
This is the snippet of my code for embedding vector inference (adapted from the provided tutorial):
import numpy as np
import torch
from evo2 import Evo2
device = "cuda" if torch.cuda.is_available() else "cpu"
model = Evo2("evo2_7b")
list_of_embs = []
model.model.eval()
with torch.no_grad():
for seq in sequences: # List of sequences is defined in the original code
inputs = torch.tensor(model.tokenizer.tokenize(seq), dtype=torch.int).unsqueeze(0).to(device)
_, token_embs = model(inputs, return_embeddings=True, layer_names=["blocks.28.mlp.l3"])
seq_repr = torch.mean(token_embs[layer_name], dim=1).squeeze().detach().cpu().tolist()
list_of_embs.append(seq_repr)
emb_array = np.array(list_of_embs)
The resulting embeddings look like this:
array([[ 90. , -228. , 26.375, ..., -241. , -113. , -100.5 ],
[ 181. , -288. , 212. , ..., -292. , -69.5 , 63.75 ],
[ 119. , -208. , 90. , ..., -201. , -46.25 , 14.25 ],
...,
[ 126.5 , -175. , 95. , ..., -221. , -91.5 , 57.75 ],
[ 162. , -308. , 195. , ..., -262. , -106. , 116.5 ],
[ 180. , -318. , 202. , ..., -276. , -58.75 , 75. ]])
and I've observed that
- The magnitude of values within a single vector varies quite a bit.
- The same dimension across different vectors also shows large variation.
This variance seems to result in unstable behavior in downstream tasks. So here are my questions:
- Is this expected behavior for the extracted embeddings?
- If so, is there a recommended normalization strategy before using them downstream (e.g., z-score normalization)?
Thanks again for your work and support!