# Introduction to LLMs using LangChain

Since the release of [GPT-3.5](https://en.wikipedia.org/wiki/GPT-3#GPT-3.5) by [OpenAI](https://en.wikipedia.org/wiki/OpenAI) in 2022 
generative AI and [large language models (LLM)](https://en.wikipedia.org/wiki/Large_language_model) have gained public attention. 
While different approaches exist for building custom applications based on LLMs, [LangChain](https://www.langchain.com) is used in this notebook. 
According to the documentation [LangChain](https://www.langchain.com) is

> a framework for developing applications powered by large language models (LLMs).

The content of this notebook is based on the [LangChain quickstart](https://python.langchain.com/docs/get_started/quickstart/).

## LangChain overview 

The goal of LangChain is to simplify the development of applications based on LLMs. This is achieved by providing tools for different aspects of 
the application lifecycle. For example, LangChain provides tools for the development, the deployment and the monitoring of LLM based 
applications. Typical use cases for LangChain are: 
- [Question answering using Retrieval augmented generation (RAG)](https://python.langchain.com/docs/use_cases/question_answering/)
- [Chatbots](https://python.langchain.com/docs/use_cases/chatbots/)

One of the nice feature of LangChain is the possibility to build applications based on different LLMs. Using LangChain is is 
possible to build application based on OpenAI GPT as well as on open source LLMs. In this notebook 
[Ollama](https://ollama.com/) will bs used as the basis for the applications.

## Installing the necessary tools and libraries
### LangChain
The first step is to install the necessary tools and libraries. First, we install LangChain using the following command. 

In [None]:
!pip install langchain

### LangSmith
One helpful tool offered by LangChain for developing LLM-based applications is [LangSmith](https://smith.langchain.com/). While not 
necessary for this tutorial it is recommended to create a free LangChain account and setup LangSmith
by setting the following environment variables. 

Note that the environment variables need to be set before Jupyter Lab is started. 

```zsh
# MacOS & Linux
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=<your-api-key>

# Windows or using the GUI
setx LANGCHAIN_TRACING_V2 "true"
setx LANGCHAIN_API_KEY <your-api-key>
```

### Ollama
As mentioned above the LLM used for this tutorial is Ollama. Ollama can be downloaded [here](https://ollama.com/download).

Once Ollama is installed the server cna be started using:

```zsh
ollama serve
```

LangChain as well as the Ollama CLI interact with this server. A short documentation on how to use the Ollama CLI can be found [here](https://github.com/ollama/ollama).
The first thing we need to do is to download a model. In order to download the Llama 3 model execute the following command. Be aware, that each of these models is several GB large. 

```zsh
ollama pull llama3
```

Once a model has been downloaded we can start building the first little application using it. 


## LangChain hello world

A first hello world LLM application can now be build using just 3 lines of Python code. 

In [None]:
# Import the LangChain Ollama library
from langchain_community.llms import Ollama

# Instantiate the Llama3 model
llm = Ollama(model="llama3")

# Ask the model a question
question = "What is the meaning of life?"
print(llm.invoke(question))

## Fun with flags (LLMs)
Using LangChain it is also possible to build more complex prompts. Furthermore, different elements can be linked together in a chain. 

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Create a prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are Sheldon from the Big Bang Theory. The user is Howard Wolowitz. Answer the question of the user in typical Sheldon style."),
    ("user", "What do you think about {topic}?")
])

# Create a output parser
output_parser = StrOutputParser()

# Build a chain 
chain = prompt | llm | output_parser

# Use the chain and provide the template variables
print(chain.invoke({"topic": "Prof. Christian Drumm of the FH Aachen"}))

After execute the last query check the LangChain Web site. In the default project you can use LangSmith to analyze the executed queries. 

## Answering questions about documents
The next step is to use retrieval augmented generation to answer queries about documents. We will use the Web site [Semestertermine](https://www.fh-aachen.de/fachbereiche/wirtschaft/rund-ums-studium/semestertermine) of the FB7 as a basis. 
First we load the document using LangChain's WebBasedLoader.

In [29]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.fh-aachen.de/fachbereiche/wirtschaft/rund-ums-studium/semestertermine")

docs = loader.load()

In order to being able to a answer questions about documents a special representation of these documents is needed. This representation is called an *embedding*. 
The [LangChain documentation](https://python.langchain.com/docs/modules/data_connection/text_embedding/)
contains the following explanation of embeddings. 

> Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

These embeddings will be stored in a database. LangChain support different databases for storing of embedding. For this tutorial FAISS will be use. FAISS can be easily installed using the following command.

In [None]:
!pip install faiss-cpu

Once FAISS is installed we can create the embeddings for the document and store them in the database.

In [30]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create the Ollama embeddings
embeddings = OllamaEmbeddings(model="llama3")

# Create a splitter to split the document in smaller chunks
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

# Store the embeddings in the vector store
vector = FAISS.from_documents(documents, embeddings)

Now that we have a suitable representation of the document we can create a chain to answer questions about the document.

In [32]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

prompt = ChatPromptTemplate.from_template(
    """Beantworte die folgende Frage nur mit dem zur Verfügung gestellten Kontext:

    <context>
    {context}
    </context>

    Frage: {input}""")


document_chain = create_stuff_documents_chain(llm, prompt)

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

response = retrieval_chain.invoke({"input": "Wann findet im Sommersemester die Prüfungsphase statt?"})
print(response["answer"])



Die Prüfungsphase im Sommersemester 2024 findet vom 15.7.-2.8.2024 statt.


# References
- [LangChain quickstart](https://python.langchain.com/docs/get_started/introduction/)
- [LangChain API Reference](https://api.python.langchain.com/)
- [Retrieval augmented generation (RAG) from scratch](https://python.langchain.com/docs/get_started/introduction/)