- sentence-transformers
    - all-MiniLM-L6-v2: 384
    - all-mpnet-base-v2: 768
    - The `all-mpnet-base-v2` model provides the best quality, while `all-MiniLM-L6-v2` is 5 times faster and still offers good quality. 
- openai
    - text-embedding-3-small: 1536
    - text-embedding-3-large: 3072
- llama3
    - https://stackoverflow.com/questions/76926025/sentence-embeddings-from-llama-2-huggingface-opensource
    - 4096

### mean pooling of llama3

```
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
model = AutoModel.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
tokenizer.pad_token = tokenizer.eos_token

# 生成embeddings
embeddings = []
with torch.no_grad():
    for text in texts:
        inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
        outputs = model(**inputs)

        # outputs.last_hidden_state.shape: [batch_size, seq_len, 4096]
        # 使用最后一层的隐藏状态的平均值作为句子嵌入
        embedding = outputs.last_hidden_state.mean(dim=1)
        # embedding.shape: [batch_size, 4096]

        embeddings.append(embedding[0].numpy())
```

### prompt-based last token

`prompt_template = "This sentence: {text} means in one word:"`

```
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
model = AutoModel.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
tokenizer.pad_token = tokenizer.eos_token

# Add prompt template
prompt_template = "This sentence: {text} means in one word:"
prompted_texts = [prompt_template.format(text=text) for text in texts]

embeddings = []
with torch.no_grad():
    # Batch process all texts
    inputs = tokenizer(prompted_texts, padding=True, return_tensors="pt", truncation=True)
    outputs = model(**inputs, output_hidden_states=True, return_dict=True)
    
    # Get the last hidden state
    last_hidden_state = outputs.hidden_states[-1]
    
    # Get the index of the last non-padding token for each sequence
    last_token_indices = inputs.attention_mask.bool().sum(1) - 1
    
    # Extract embeddings for the last token of each sequence
    batch_embeddings = last_hidden_state[torch.arange(last_hidden_state.shape[0]), last_token_indices]
    embeddings = batch_embeddings.numpy()
```