This version uses Milvus through Docker Compose so you must have Docker installed to run this notebook (Milvus is spun up via `docker compose up -d` as shown in the block below)

In [2]:
%pip install pymilvus milvus langchain sentence-transformers tiktoken octoai-sdk
# docker compose up -d

Collecting pymilvus
  Obtaining dependency information for pymilvus from https://files.pythonhosted.org/packages/87/a5/80a4f36bce43caa034591e37b87822f4f4f1646a661528b7d5f3cf9cc723/pymilvus-2.4.0-1-py3-none-any.whl.metadata
  Downloading pymilvus-2.4.0-1-py3-none-any.whl.metadata (4.5 kB)
Collecting milvus
  Obtaining dependency information for milvus from https://files.pythonhosted.org/packages/ab/8f/2a24d629f194c1bc83725652a9341794fbab09d0de978d5e5ec72e1f80e4/milvus-2.2.16-py3-none-win_amd64.whl.metadata
  Downloading milvus-2.2.16-py3-none-win_amd64.whl.metadata (7.0 kB)
Collecting langchain
  Obtaining dependency information for langchain from https://files.pythonhosted.org/packages/f8/1b/697dec4ff03114b049b687d4fdbdcefdfff365868876ec58c1ab2cf75253/langchain-0.1.13-py3-none-any.whl.metadata
  Downloading langchain-0.1.13-py3-none-any.whl.metadata (13 kB)
Collecting sentence-transformers
  Obtaining dependency information for sentence-transformers from https://files.pythonhosted.org/


[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms.octoai_endpoint import OctoAIEndpoint

In [7]:
from dotenv import load_dotenv
import os

load_dotenv()
os.environ["OCTOAI_API_TOKEN"] = os.getenv("OCTOAI_API_TOKEN")

In [54]:
old_template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.\n Instruction:\n{question}\n Response: """
prompt = PromptTemplate.from_template(old_template)
prompt

PromptTemplate(input_variables=['question'], template='Below is an instruction that describes a task. Write a response that appropriately completes the request.\n Instruction:\n{question}\n Response: ')

In [55]:
llm = OctoAIEndpoint(
    endpoint_url="https://text.octoai.run/v1/chat/completions",
    model_kwargs={
        "model": "mixtral-8x7b-instruct-fp16",
        "max_tokens": 128,
        "presence_penalty": 0,
        "temperature": 0.01,
        "top_p": 0.9,
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant. Keep your responses limited to one short paragraph if possible.",
            },
        ],
    },
)

In [56]:
question = "Who was leonardo davinci?"

llm_chain = LLMChain(prompt=prompt, llm=llm)

print(llm_chain.invoke(question)["text"])

 Leonardo da Vinci (1452-1519) was an Italian polymath who is often regarded as one of the greatest painters in history. He is also celebrated for his technological ingenuity, scientific curiosity, and philosophical wisdom. Da Vinci is widely known for his masterpieces such as 'The Last Supper' and 'Mona Lisa.' As an artist, scientist, mathematician, engineer, inventor, anatomist, geologist, cartographer, botanist, musician, and writer, da Vinci embodied the Renaissance ideal. His thirst for


In [12]:
from langchain_community.embeddings import OctoAIEmbeddings
from langchain_community.vectorstores import Milvus

In [13]:
embeddings = OctoAIEmbeddings(endpoint_url="https://text.octoai.run/v1/embeddings")

In [14]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
import os

In [16]:
files = os.listdir("./data")
files

['Boston.txt',
 'Cambridge, Massachusetts.txt',
 'Chicago.txt',
 'Houston.txt',
 'San Francisco.txt',
 'Seattle.txt',
 'Toronto.txt',
 'Washington, D.C..txt']

In [32]:
file_texts = []

In [39]:
for file in files:
    with open(f"./data/{file}", encoding="utf8") as f:
        file_text = f.read()
    text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=512, chunk_overlap=64, 
    )
    texts = text_splitter.split_text(file_text)
    for i, chunked_text in enumerate(texts):
        file_texts.append(Document(page_content=chunked_text, 
                metadata={"doc_title": file.split(".")[0], "chunk_num": i}))

Created a chunk of size 713, which is longer than the specified 512
Created a chunk of size 543, which is longer than the specified 512
Created a chunk of size 838, which is longer than the specified 512
Created a chunk of size 666, which is longer than the specified 512
Created a chunk of size 690, which is longer than the specified 512
Created a chunk of size 758, which is longer than the specified 512
Created a chunk of size 1142, which is longer than the specified 512
Created a chunk of size 1014, which is longer than the specified 512
Created a chunk of size 531, which is longer than the specified 512
Created a chunk of size 1038, which is longer than the specified 512
Created a chunk of size 585, which is longer than the specified 512
Created a chunk of size 716, which is longer than the specified 512
Created a chunk of size 631, which is longer than the specified 512
Created a chunk of size 972, which is longer than the specified 512
Created a chunk of size 696, which is longer 

Created a chunk of size 597, which is longer than the specified 512
Created a chunk of size 524, which is longer than the specified 512
Created a chunk of size 535, which is longer than the specified 512
Created a chunk of size 843, which is longer than the specified 512
Created a chunk of size 746, which is longer than the specified 512
Created a chunk of size 887, which is longer than the specified 512
Created a chunk of size 662, which is longer than the specified 512
Created a chunk of size 515, which is longer than the specified 512
Created a chunk of size 615, which is longer than the specified 512
Created a chunk of size 739, which is longer than the specified 512
Created a chunk of size 837, which is longer than the specified 512
Created a chunk of size 640, which is longer than the specified 512
Created a chunk of size 944, which is longer than the specified 512
Created a chunk of size 656, which is longer than the specified 512
Created a chunk of size 1213, which is longer th

In [40]:
vector_store = Milvus.from_documents(
    file_texts,
    embedding=embeddings,
    connection_args={"host": "localhost", "port": 19530},
    collection_name="cities"
)

In [41]:
file_texts[0]

Document(page_content="Boston (US: ), officially the City of Boston, is the capital and most populous city of the U.S. state of Massachusetts, and the cultural and financial center of New England in the Northeastern United States, with an area of 48.4 sq mi (125 km2) and a population of 675,647 in 2020. Greater Boston metropolitan statistical area is the eleventh-largest in the country.Boston is one of the United States's oldest municipalities. It was founded on the Shawmut Peninsula in 1630 by Puritan settlers from Boston, Lincolnshire. During the American Revolution, Boston was the location of several key events, including the Boston Massacre, the Boston Tea Party, the hanging of Paul Revere's lantern signal in Old North Church, the Battle of Bunker Hill, and the siege of Boston. Following American independence from Great Britain, the city continued to play an important role as a port, manufacturing hub, and center for education and culture. The city has expanded beyond the original 

In [42]:
retriever = vector_store.as_retriever()

In [57]:
# old answer
prompt = PromptTemplate.from_template(old_template)

question = "How big is the city of boston?"

llm_chain = LLMChain(prompt=prompt, llm=llm)

llm_chain.invoke(question)["text"]

' The city of Boston, Massachusetts, covers approximately 89.6 square miles, according to the U.S. Census Bureau. This includes both land and water areas within the city limits. Boston is one of the oldest cities in the United States and is known for its rich history, cultural attractions, and institutions of higher education.'

In [58]:
# RAG approach
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = PromptTemplate.from_template(template)
prompt

PromptTemplate(input_variables=['context', 'question'], template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n')

In [59]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [60]:
chain.invoke("How big is the city of Boston?")

' The city of Boston has an area of 48.4 square miles (125 km2).'

In [20]:
# Let's make this a bit more fun and showcase the multilingual capabilities of Mixtal which really outshine other open source models

# Our Vector DB is populated with entries from english text - even the embedding model we're using here, GTE-Large
# works best on english text. However Mixtral has good mutlilingual capabilities in French, German, Spanish and Italian.
# So what we'll do is ask the assistant to only answer in french in the system and user prompt. RAG here is performed based on 
# english text, but upon producing the user response, the Mixtral LLM will generate tokens in a different language here (french)
french_llm = OctoAIEndpoint(
    endpoint_url="https://text.octoai.run/v1/chat/completions",
    model_kwargs={
        "model": "mixtral-8x7b-instruct-fp16",
        "max_tokens": 128,
        "presence_penalty": 0,
        "temperature": 0.1,
        "top_p": 0.9,
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant who responds in French and not in English.",
            },
        ],
    },
)

french_template = """Answer the question in French based only on the following context:
{context}

Question: {question}
"""
french_prompt = PromptTemplate.from_template(french_template)

In [21]:
french_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | french_prompt
    | french_llm
    | StrOutputParser()
)

In [22]:
fr_1 = french_chain.invoke("How big is the city of Seattle?")

In [23]:
from pprint import pprint
pprint(fr_1)

(' La ville de Seattle est assez grande avec une population de 749 256 '
 "habitants en 2022. C'est la ville la plus peuplée de l'État de Washington et "
 "de la région du Nord-Ouest Pacifique de l'Amérique du Nord. L'aire "
 "métropolitaine de Seattle compte 4,02 millions d'habitants, ce qui en fait "
 'la 15e plus importante aux États-Unis. La croissance de la population de '
 'Seattle a été rapide, avec une augmentation de 21,1%')
