# Example of how to use Embeddings.

Embeddings are a way to represent information (such as words, paragraphs, images, etc.) in a vector space. This type of numerical reprsentation is useful because it let us use mathematical operations in _semantic space_. For example, we can find the similarity between two paragraphs words by calculating the cosine similarity between their embedding vectors.

This notebook will illustrate how to use embeddings from a local model - using Ollama - and from an external model - using OpenAI and HuggingFace.

## Using a local model

Using a local model has the advantage of being able to run the model offline. The disadvantage is that the model may not be as good as a model from a large company like OpenAI.

To install Ollama, follow the instructions at their [download homepage](https://ollama.com/download).

Once you have Ollama installed, you need to dpwnload a model. You can see the available models from the [Ollama models page](https://ollama.com/models). For example, to install _llama3_ you will type on your terminal.

```bash
ollama pull mistral
```

You can then use the model directly using ollama's [RESTful API](https://github.com/ollama/ollama/blob/main/docs/api.md).

In this notebook, we will show how to use [LangChain](https://python.langchain.com/v0.2/docs/introduction/) to interact with the model embedings.

In [None]:
# read secrets ------------------------------------------------
import json
import os

# you define your on secrets.json file with the following structure
# {
#     "openai": "your-openai-api-key",
#     "groq": "your-groq-api-key"
# }

with open("./secrets.json") as f:
    secrets = json.load(f)

os.environ["OPENAI_API_KEY"] = secrets["openai"]
os.environ["GROQ_API_KEY"] = secrets["groq"]

# this is tp disable some warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [None]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings


embeddings = OpenAIEmbeddings()
# embeddings = OllamaEmbeddings(model="llama3")
# embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

text = "This is a test document."
query_result = embeddings.embed_query(text)

# Lets show what an embeddding vector looks like (first 5 elements)
query_result[:5]


## Example using an embedding function to find the document (part) most likely to have the information we need to answer a question.

In this example we will first load a number of documents from a directory. We will them split them into chunks - we do this because:

- The context window of a model may be limited, so we may need to split the document into smaller parts.
- Documents are not monolitic semantically, so we may need to split them into parts that are more coherent.

We then use these chunks to populate a vector database. We will use the vector database to find the most similar chunk to a phrase or question. The quality of the result (how actually similar a chunk is to a phrase) depens on the quality of the embedding function (Transformer).

In [None]:
# Loading documents from a directory
import json

from langchain_community.document_loaders import  DirectoryLoader
from langchain_text_splitters import CharacterTextSplitter

loader = DirectoryLoader('./documents', glob="**/*.html", show_progress=True)
docs = loader.load()

# Splitting documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
documents = text_splitter.split_documents(docs)

In [None]:
# Let's load the chunks into a vector database

from langchain_community.vectorstores import FAISS

db = FAISS.from_documents(documents=documents, embedding=embeddings)

retriever = db.as_retriever()

In [None]:
# Let's try the store to retrieve the (parts of a) document that is likely to contain the answer to a question

question = "what is the personal fitness market size?"
results = db.similarity_search(question, k=3)

# I want to make results pretty
# use map and lamda to transform the results into a list of dictionaries
results = list(map(lambda x: {"text": x.page_content, 'metadata': x.metadata}, results))

print(json.dumps(results, indent=2, sort_keys=True))

# hopefully the top documemnt contains the information needed to answer the question...

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_community.llms import Ollama
from langchain_groq import ChatGroq

# model = Ollama(model="llama3", temperature=0.25)
model = ChatGroq(temperature=0.25, model_name="mixtral-8x7b-32768")

retriever = db.as_retriever()

# Prompt template
prompt = PromptTemplate.from_template(
    "Answer the question: {question} using the information from this context: {context}"
)

# put together a retrieval chain
retrieval_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

retrieval_chain.invoke(question)