# Walkthrough 3 - Chatting with your own data

During the 30 days, a number of people have indicated that the LLMs do not understand their context very well. LLMs are trained on very large amounts of textual data but this is very broad and usually not very deep. This makes them potentially less useful in specialist domains. We can overcome this to some extent using Prompt Engineering where we specify the relevant information as context to the prompt. 

However, this quickly becomes tedious to find and include the relevant context each time you create a prompt. We also have to consider **Data Privacy** issues around the information we include in our prompts, who might have access to that data and how that data might be used. We also have to consider any relevant regulations that apply to our work.

There are 2 other challenges that we can often encounter:
1. **Hallucination** - This is where a model can generate output that is fictitious - this is understandable if we view LLMs as a **Probability Machine** rather than a **Truth Machine**. This gives us less confidence in the output since, depending on the model, the outputs can be convincing.
2. **Information Cut-Off** - Models *learn* a compressed world-view through their training but the information it learns from has a cut-off date. The older the model, the further back in time the information cut-off is. This means that the model has the potential to generate output based on outdated information. Again this reduces the usefulness of these models in some contexts.


 



We can reduce the impact of these challenges using an approach called Retrieval Augmented Generation (or RAG for short). 

>A fairly non-technical description of RAG can be found at https://research.ibm.com/blog/retrieval-augmented-generation-RAG and it is worth reading through the post and/or watch the short video.

With RAG, we store our context specific documentation in a database (called a Vector Database) - actually, each document is broken down into overlapping chunks of text and undergoes a process called Embedding where we convert the document chunks into a numerical representation. This representation has a special property such that document chunks that have similarity context and semantic meaning have similar representations.
The mathematics behind this is complex but if you want to read more on Embeddings then read this post: https://towardsdatascience.com/how-i-explained-word-embeddings-to-my-non-technical-colleagues-52ced76cf3bb

When we create a prompt, we check the Vector Database for document chunks that are closely related to the prompt and add these as context to the prompt and send these to the LLM.

The LLM then use the context when generating output in response to the prompt. In this way we can overcome:
* The LLMs lack of knowledge about your context
* Reduce Hallucination
* Introduce new data that overcomes the information cut-off problem.


The ideas behind RAG are fairly new and evolving and platforms such as Azure AI or Amazon SageMaker already have fantastic support for this type of approach.

The challenge of Data Privacy can still exist depending upon how RAG is implemented - to keep your data private, you would need to create a private Vector Database and LLM instance (either using a cloud provider or hosting these yourself).

In this walkthrough we will create a very simple RAG application that allows you to chat with your own documents.
* The Vector Database will run in memory for this walkthrough so no information is persisted
* The LLM is a small Open-Source LLM so that it can be run in a Colab notebook.

If you adopt this approach within your own organisation, you would likely want to use a larger model for better results and have a persistent Vector Store for your documents. Setting up a RAG based solution can require a fair amount of code but the code is generally re-usable across applications.

# Let's Get Started
For this walkthrough we will use 