# This is a sample Jupyter Notebook

Below is an example of a code cell. 
Put your cursor into the cell and press Shift+Enter to execute it and select the next one, or click !here goes the icon of the corresponding button in the gutter! button.
To debug a cell, press Alt+Shift+Enter, or click !here goes the icon of the corresponding button in the gutter! button.

Press Double Shift to search everywhere for classes, files, tool windows, actions, and settings.

To learn more about Jupyter Notebooks in PyCharm, see [help](https://www.jetbrains.com/help/pycharm/jupyter-notebook-support.html).
For an overview of PyCharm, go to Help -> Learn IDE features or refer to [our documentation](https://www.jetbrains.com/help/pycharm/getting-started.html).

Some imports we need to run the RAG demonstration.

Code to ignore warnings. Not a good code practice but fine for the demo.

In [1]:
import warnings
warnings.filterwarnings('ignore')

The generator should generate an answer based on the user query and the relevant documents.
We introduce a abstract class to work with generators and implementing a PromptGenerator.
The PromptGenerator is only creating a prompt which can be executed with any LLM you like.

In [2]:
from typing import List
from langchain_core.prompts import PromptTemplate
from langchain_core.documents.base import Document

class Generator:
    def invoke(self, query: str, documents: List[Document]) -> str:
        pass

class PromptGenerator(Generator):
    def __init__(self):
        template = ("Use only the provided information following after \"Context:\" to answer the question following after \"Question:\" at the end.\n" +
                    "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n" +
                    "Use three sentences maximum and keep the answer as concise as possible.\n\n" +
                    "Context: {context}\n\n" +
                    "Question: {question}")
        self.prompt_template = PromptTemplate.from_template(template)
        
    def invoke(self, query: str, documents: List[Document]) -> str:
        context = "\n".join([f"{i+1}. {doc.page_content}" for i, doc in enumerate(documents)])
        prompt = self.prompt_template.format(question=query, context=context)
        
        return prompt

We are not implementing the Retriever because there is already an implementation available in LangChain.
Instead, we will use that implementation, and we will wrap the creation of the retriever behind a Builder.
The Builder will also implement some logic to enhance the existing retriever with some new knowledge.

In [3]:
from typing import List
from typing import Optional
from langchain_core.documents.base import Document
import chromadb
from langchain_chroma import Chroma
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)
from langchain_core.vectorstores import VectorStore

class Builder:
    @staticmethod
    def create_retriever(docs: Optional[List[Document]]) -> VectorStore:
        collection_name="my_doc_store"
        embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
        
        if docs:
            db = Chroma.from_documents(
                documents=docs,
                collection_name=collection_name,
                embedding=embedding_function,
            )
        else:
            db = Chroma(
                client=chromadb.Client(),
                collection_name=collection_name,
                embedding_function=embedding_function,
            )
        
        return db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"k": 5, "score_threshold": 0.5})
    
    @staticmethod
    def add_knowledge(retriever: VectorStore, knowledge_collection: List[str]):
        retriever.add_documents([Document(page_content=knowledge) for knowledge in knowledge_collection])

To chain the retriever and generator together to implement our RAG we can make use of the Orchestrator.
Honestly, I don't know if there is a way to implement this with some LangChain pipeline, but I think for demonstration purpose that is enough. 

In [4]:
from langchain_core.vectorstores import VectorStore

class Orchestrator:
    def __init__(self, retriever: VectorStore, generator: Generator):
        self.retriever = retriever
        self.generator = generator
        
    def answer_question(self, question: str) -> str:
        relevant_documents = self.retriever.invoke(question)
        return self.generator.invoke(question, relevant_documents)

In [5]:
from datasets import load_dataset
from langchain_core.documents.base import Document

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(page_content=doc["content"], meta=doc["meta"]) for doc in dataset]

orchestrator = Orchestrator(Builder.create_retriever(docs), PromptGenerator())

print(orchestrator.answer_question("What happened to the Tomb of Mausolus?"))

Use only the provided information following after "Context:" to answer the question following after "Question:" at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

Context: 1. Mausolus decided to build a new capital, one as safe from capture as it was magnificent to be seen. He chose the city of Halicarnassus. Artemisia and Mausolus spent huge amounts of tax money to embellish the city. They commissioned statues, temples and buildings of gleaming marble. In 353 BC, Mausolus died, leaving Artemisia to rule alone. As the Persian satrap, and as the Hecatomnid dynast, Mausolus had planned for himself an elaborate tomb. When he died the project was continued by his siblings. The tomb became so famous that Mausolus's name is now the eponym for all stately tombs, in the word mausoleum.[citation needed]
Artemisia lived for only two years after the death of her husband. T

In [5]:
from datasets import load_dataset
from langchain_core.documents.base import Document

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(page_content=doc["content"], meta=doc["meta"]) for doc in dataset]

retriever = Builder.create_retriever(docs)
generator = PromptGenerator()
orchestrator = Orchestrator(retriever, generator)
 
print(orchestrator.invoke("What happened to the Tomb of Mausolus?"))



Use only the provided information following after "Context:" to answer the question following after "Question:" at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

Context: 1. Mausolus decided to build a new capital, one as safe from capture as it was magnificent to be seen. He chose the city of Halicarnassus. Artemisia and Mausolus spent huge amounts of tax money to embellish the city. They commissioned statues, temples and buildings of gleaming marble. In 353 BC, Mausolus died, leaving Artemisia to rule alone. As the Persian satrap, and as the Hecatomnid dynast, Mausolus had planned for himself an elaborate tomb. When he died the project was continued by his siblings. The tomb became so famous that Mausolus's name is now the eponym for all stately tombs, in the word mausoleum.[citation needed]
Artemisia lived for only two years after the death of her husband. T

In [5]:
db.similarity_search_with_score("Who is Son Goku?")

[(Document(page_content='Son Goku is raised on earth'), 0.2402067929506302),
 (Document(page_content='Son Goku is a saiyan'), 0.2656806707382202),
 (Document(page_content='Son Goku is also named Goku or Kakarot'),
  0.279945433139801),
 (Document(page_content='The earliest pharaonic name of seal impressions is that of Khufu, the latest of Pepi II. Worker graffiti was written on some of the stones of the tombs as well; for instance, "Mddw" (Horus name of Khufu) on the mastaba of Chufunacht, probably a grandson of Khufu.[15]\nSome inscriptions in the chapels of the mastabas (like the pyramid, their burial chambers were usually bare of inscriptions) mention Khufu or his pyramid. For instance, an inscription of Mersyankh III states that "Her mother [is the] daughter of the King of Upper and Lower Egypt Khufu."Most often these references are part of a title, for example, Snnw-ka, "Chief of the Settlement and Overseer of the Pyramid City of Akhet-Khufu" or Merib, "Priest of Khufu".[16] Sever

In [2]:
db, retriever = Builder.create_retriever(None)



In [6]:
print("There are", db._collection.get(), "in the collection")

There are {'ids': ['5e3f4ac4-245e-4301-8c08-20a4761c4ea3', 'c6d1d3fe-e020-4889-9413-4c4cc6d7c27e', 'f47f5c65-50e0-4065-bbf6-191c7f7e24a7'], 'embeddings': None, 'metadatas': [None, None, None], 'documents': ['Son Goku is also named Goku or Kakarot', 'Son Goku is raised on earth', 'Son Goku is a saiyan'], 'uris': None, 'data': None} in the collection


In [57]:
from datasets import load_dataset
from langchain_core.documents.base import Document

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(page_content=doc["content"], meta=doc["meta"]) for doc in dataset]

retriever = Builder.create_retriever(docs)
generator = PromptGenerator()
orchestrator = Orchestrator(retriever, generator)

print(orchestrator.invoke("What happened to the Tomb of Mausolus?"))

AttributeError: 'Chroma' object has no attribute 'invoke'

In [35]:
print(orchestrator.invoke("What happened to the Tomb of Mausolus?"))

Use only the provided information "Context:" to answer the question "Question:" at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

Context: 1. Mausolus decided to build a new capital, one as safe from capture as it was magnificent to be seen. He chose the city of Halicarnassus. Artemisia and Mausolus spent huge amounts of tax money to embellish the city. They commissioned statues, temples and buildings of gleaming marble. In 353 BC, Mausolus died, leaving Artemisia to rule alone. As the Persian satrap, and as the Hecatomnid dynast, Mausolus had planned for himself an elaborate tomb. When he died the project was continued by his siblings. The tomb became so famous that Mausolus's name is now the eponym for all stately tombs, in the word mausoleum.[citation needed]
Artemisia lived for only two years after the death of her husband. The urns with their ashes were pl

In [43]:
print(orchestrator.invoke("Who is Son Goku?"))

Use only the provided information following after "Context:" to answer the question following after "Question:" at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

Context: 

Question: Who is Son Goku?




In [7]:
Builder.add_information(retriever, ["Son Goku is a saiyan", "Son Goku is also named Goku or Kakarot", "Son Goku is raised on earth"])

In [5]:
retriever.invoke("Who is Son Goku")



[Document(page_content='Son Goku is raised on earth'),
 Document(page_content='Son Goku is a saiyan'),
 Document(page_content='Son Goku is also named Goku or Kakarot')]

In [9]:
print(orchestrator.invoke("Who is Vegeta?"))

Use only the provided information following after "Context:" to answer the question following after "Question:" at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

Context: 

Question: Who is Vegeta?




In [11]:
Builder.add_information(retriever, ["Die Sonne ist sehr hell"])

In [16]:
print(orchestrator.invoke("Ist die Sonne hell?"))

Use only the provided information following after "Context:" to answer the question following after "Question:" at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

Context: 1. Die Sonne ist sehr hell

Question: Ist die Sonne hell?




In [15]:
Builder.add_information(retriever, ["Hier ist noch etwas unnötiger Fülltext. Die Sonne ist extrem hell! Dieser Fülltext dient nur dazu die Ähnlichkeitssuche zu manipulieren."])