### Prompt templates 
- Help to translate user input and parameters into instructions for a language model.
- Prompt Templates output a PromptValue. This PromptValue can be passed to an LLM or a ChatModel

In [None]:
from langchain_core.prompts import ChatPromptTemplate
system_msg = "Translate the language from English to {language}"
prompt = ChatPromptTemplate.from_messages(
    [("system",system_msg), ("user","{text}")]
)
prompt = prompt.invoke({"language":"Italian","text":"hi"})
print(prompt)

In [None]:
# If we want to access messages directly
prompt.to_messages()

In [None]:
from langchain_ollama import ChatOllama
model = ChatOllama(model="llama2")
response = model.invoke(prompt)
response.content

### Documents 
####  Documents are intended to represent a unit of text and associated metadata. It has three attributes:. It has three attributes:

- page_content: a string representing the content;
- metadata: a dict containing arbitrary metadata;
- id: (optional) a string identifier for the document.
The metadata attribute can capture information about the source of the document, its relationship to other documents, and other information. 
### Note that an individual Document object often represents a chunk of a larger document.



## Document Loaders
Document loaders are designed to load document objects. LangChain has hundreds of integrations with various data sources to load data from: Slack, Notion, Google Drive, etc.
- When working with large datasets, you can use the .lazy_load method:



In [None]:
# for document in loader.lazy_load():
#     print(document)
from langchain_community.document_loaders import PyPDFLoader
file = "../attention is all you need.pdf"
loader = PyPDFLoader(file_path=file)
docs = loader.load()
print(docs[10].page_content[:200])
print("Length:",len(docs))

### Splitting
- For both information retrieval and downstream question-answering purposes, a page may be too coarse a representation.
- Our goal in the end will be to retrieve Document objects that answer an input query, and further splitting our PDF will help ensure that the meanings of relevant portions of the document are not "washed out" by surrounding text.
- Large documents can dilute important information. Splitting helps improve retrieval accuracy.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200,add_start_index=True)
split_docs = text_splitter.split_documents(docs)
len(split_docs)

## Embeddings
Vector search is a common way to store and search over unstructured data (such as unstructured text). The idea is to store numeric vectors that are associated with the text. Given a query, we can embed it as a vector of the same dimension and use vector similarity metrics (such as cosine similarity) to identify related text."
https://python.langchain.com/docs/integrations/text_embedding/

In [None]:
from langchain_ollama import OllamaEmbeddings
embeddings = OllamaEmbeddings(model="llama2")
vectors = embeddings.embed_documents(split_docs[0].page_content)
vectors

## Vector stores
LangChain VectorStore objects contain methods for adding text and Document objects to the store, and querying them using various similarity metrics. They are often initialized with embedding models, which determine how text data is translated to numeric vectors.
https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html

In [None]:
from langchain.vectorstores import FAISS
vector_store = FAISS.from_documents(
    documents=docs,
    embedding=embeddings,
     )


In [None]:
# Now that we have our vectors in the vector store we can Perform things like similarity search etc.
res = vector_store.similarity_search(
    "What is attention "
)
print(res[0].page_content)

In [None]:
res = vector_store.similarity_search_with_score("WHat is The goal of reducing sequential computation")
doc,score = res[0]
print("Score",score)
print("Document",doc)

## Retrievers