<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/embeddings/nomic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# Nomic嵌入

Nomic发布了v1.5 🪆🪆🪆，能够使用母子娃娃学习和8192上下文进行可变大小的嵌入，嵌入维度在64到768之间。

在这个笔记本中，我们将探索在不同维度上使用Nomic v1.5嵌入。


### 安装说明


In [None]:
%pip install -U llama-index llama-index-embeddings-nomic


### 设置API密钥


In [None]:
nomic_api_key = "<NOMIC API KEY>"

In [None]:
import nest_asyncio

nest_asyncio.apply()

from llama_index.embeddings.nomic import NomicEmbedding

#### 使用128维度


In [None]:
embed_model = NomicEmbedding(
    api_key=nomic_api_key,
    dimensionality=128,
    model_name="nomic-embed-text-v1.5",
)

embedding = embed_model.get_text_embedding("Nomic Embeddings")

In [None]:
print(len(embedding))

128


In [None]:
embedding[:5]

[0.05569458, 0.057922363, -0.30126953, -0.09832764, 0.05947876]

#### 尺寸为256


In [None]:
embed_model = NomicEmbedding(
    api_key=nomic_api_key,
    dimensionality=256,
    model_name="nomic-embed-text-v1.5",
)

embedding = embed_model.get_text_embedding("Nomic Embeddings")

In [None]:
print(len(embedding))

256


In [None]:
embedding[:5]

[0.044708252, 0.04650879, -0.24182129, -0.07897949, 0.04776001]

#### 维度为768


In [None]:
embed_model = NomicEmbedding(
    api_key=nomic_api_key,
    dimensionality=768,
    model_name="nomic-embed-text-v1.5",
)

embedding = embed_model.get_text_embedding("Nomic Embeddings")

In [None]:
print(len(embedding))

768


In [None]:
embedding[:5]

[0.027282715, 0.028381348, -0.14758301, -0.048187256, 0.029144287]

#### 你仍然可以使用v1 Nomic Embeddings

它具有768个固定的嵌入维度


In [None]:
embed_model = NomicEmbedding(
    api_key=nomic_api_key, model_name="nomic-embed-text-v1"
)

embedding = embed_model.get_text_embedding("Nomic Embeddings")

In [None]:
print(len(embedding))

768


In [None]:
embedding[:5]

[0.0059013367, 0.03744507, 0.0035305023, -0.047180176, 0.0154418945]

### 让我们使用Nomic v1.5嵌入构建端到端的RAG流水线。

我们将使用OpenAI进行生成步骤。


#### 设置嵌入模型和llm。


In [None]:
from llama_index.core import settings
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

import os

os.environ["OPENAI_API_KEY"] = "<YOUR OPENAI API KEY>"

embed_model = NomicEmbedding(
    api_key=nomic_api_key,
    dimensionality=128,
    model_name="nomic-embed-text-v1.5",
)

llm = OpenAI(model="gpt-3.5-turbo")

settings.llm = llm
settings.embed_model = embed_model

#### 下载数据


In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-02-16 18:37:03--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8003::154, 2606:50c0:8000::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: 'data/paul_graham/paul_graham_essay.txt'


2024-02-16 18:37:03 (3.87 MB/s) - 'data/paul_graham/paul_graham_essay.txt' saved [75042/75042]



#### 加载数据


In [None]:
documents = SimpleDirectoryReader("./data/paul_graham").load_data()

#### 索引创建


In [None]:
index = VectorStoreIndex.from_documents(documents)

#### 查询引擎


In [None]:
query_engine = index.as_query_engine()

In [None]:
response = query_engine.query("what did author do growing up?")
print(response)

The author, growing up, worked on writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer. Later, they got a microcomputer and started programming more extensively, writing simple games and a word processor.
