## Together AI + RAG

[Together AI](https://python.langchain.com/docs/integrations/llms/together)通过推理API提供了广泛的开源语言模型。

请参阅[这里](https://docs.together.ai/docs/inference-models)。我们在Mixtral论文中使用了`"mistralai/Mixtral-8x7B-Instruct-v0.1`来进行RAG。

下载论文：
https://arxiv.org/pdf/2401.04088.pdf

In [None]:
# 安装所需的库
! pip install --quiet pypdf chromadb tiktoken openai langchain-together

In [None]:
# 加载
from langchain_community.document_loaders import PyPDFLoader

# 创建一个PyPDFLoader对象，指定要加载的PDF文件路径
loader = PyPDFLoader("~/Desktop/mixtral.pdf")

# 使用loader对象加载PDF文件，并将结果保存在data变量中
data = loader.load()

# 分割
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 创建一个RecursiveCharacterTextSplitter对象，指定每个分块的大小和重叠量
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)

# 使用text_splitter对象将文档分割成多个块，并将结果保存在all_splits变量中
all_splits = text_splitter.split_documents(data)

# 添加到vectorDB
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

"""
from langchain_together.embeddings import TogetherEmbeddings
embeddings = TogetherEmbeddings(model="togethercomputer/m2-bert-80M-8k-retrieval")
"""

# 创建一个Chroma对象，将分割后的文档作为参数传入
# 指定集合名称为"rag-chroma"，使用OpenAIEmbeddings进行嵌入
vectorstore = Chroma.from_documents(
    documents=all_splits,
    collection_name="rag-chroma",
    embedding=OpenAIEmbeddings(),
)

# 将vectorstore转换为retriever对象
retriever = vectorstore.as_retriever()

In [3]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# RAG prompt
# 定义一个模板，用于生成问题回答的提示信息
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# LLM
# 导入 Together 类，用于加载预训练的语言模型
from langchain_together import Together

llm = Together(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",  # 使用的预训练模型
    temperature=0.0,  # 生成文本时的温度参数
    max_tokens=2000,  # 生成文本的最大长度
    top_k=1,  # 选择生成文本时考虑的候选词数量
)

# RAG chain
# 构建一个文本生成的处理链
chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})  # 并行处理，传入上下文和问题
    | prompt  # 添加问题回答的提示信息
    | llm  # 使用语言模型生成文本
    | StrOutputParser()  # 解析生成的文本结果
)

In [4]:
# 使用chain对象调用一个特定的问题："Mixtral的架构细节是什么？"
chain.invoke("What are the Architectural details of Mixtral?")

'\nAnswer: The architectural details of Mixtral are as follows:\n- Dimension (dim): 4096\n- Number of layers (n\\_layers): 32\n- Dimension of each head (head\\_dim): 128\n- Hidden dimension (hidden\\_dim): 14336\n- Number of heads (n\\_heads): 32\n- Number of kv heads (n\\_kv\\_heads): 8\n- Context length (context\\_len): 32768\n- Vocabulary size (vocab\\_size): 32000\n- Number of experts (num\\_experts): 8\n- Number of top k experts (top\\_k\\_experts): 2\n\nMixtral is based on a transformer architecture and uses the same modifications as described in [18], with the notable exceptions that Mixtral supports a fully dense context length of 32k tokens, and the feedforward block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router network chooses two of these groups (the “experts”) to process the token and combine their output additively. This technique increases the number of parameters of a model while controlling cost and latency, as the model 

追踪：

https://smith.langchain.com/public/935fd642-06a6-4b42-98e3-6074f93115cd/r