# Retrieval Augmented Generation(RAG) with Fess

## Setup

### Run Fess

```
$ docker compose up -d
```

### Create Web Crawling Config

1. Log into Fess as admin
2. Click on left menu: Crawler > Web
3. Create Web Crawling Config

- URL: `https://fess.codelibs.org/`
- Included URLs For Crawling: `https://fess.codelibs.org/.*`
- Excluded URLs For Crawling: `https://fess.codelibs.org/ja/.*`

### Create Access Token

1. Click on left menu: System > Access Token
2. Create Access Token

- Name `ChatGPT`
- Permission: `{role}guest`

3. Check the Token


In [None]:
# set it as access_token
access_token = "..."

### Start Crawler

1. Clieck on left menu: System > Scheduler
2. Start Default Crawler

## Run Ollama

```
$ mkdir ollama
$ docker run -v ./ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```

If you have GPU, run as below:

```
$ docker run --gpus=all -v ./ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```

For more details, see [Ollama](https://github.com/ollama/ollama).

## Install LangChain module


In [None]:
!pip install langchain lark

In [None]:
import re

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain.prompts import PromptTemplate
from langchain.retrievers import ChatGPTPluginRetriever
from langchain_community.llms import Ollama
from langchain_core.prompts import ChatPromptTemplate

## Create Ollama instance

In [None]:
llm = Ollama(model="llama2")

In [None]:
# check if ollama works
llm.invoke("Tell me a joke")

## Create ChatTemplate for Query Constructor

In [None]:
# a template depends on LLM
QUERY_CONSTRUCTOR_TEMPLATE = PromptTemplate.from_template("Please extract the primary keyword(s) from this sentence, focusing on nouns, proper nouns, or terms central to the sentence's meaning, without explanations: \"{query}\"")

In [None]:
# check if the template works
QUERY_CONSTRUCTOR_TEMPLATE.format(query="How to install Fess")

## Create FessTextRetriever

In [None]:
class FessTextRetriever(ChatGPTPluginRetriever):

    def _extract_query(self, query: str):
        # the following code depends on LLM
        text = self._lc_kwargs["llm"].invoke(QUERY_CONSTRUCTOR_TEMPLATE.format(query=query))
        words = []
        for s in text.split("\n"):
            s = s.strip()
            if len(s) > 0:
                match = re.search(r"^\d+\.\s*(.*)", s)
                if match:
                    words.append(match.group(1))
        keyword = " ".join(words)
        return keyword

    def _create_request(self, query: str) -> tuple[str, dict, dict]:
        url = f"{self.url}/query"
        json = {
            "queries": [
                {
                    "query": self._extract_query(query),
                    "filter": self.filter,
                    "top_k": self.top_k,
                }
            ]
        }
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {self.bearer_token}",
        }
        return url, json, headers

In [None]:
retriever = FessTextRetriever(llm=llm, url="http://127.0.0.1:8080/chatgpt", bearer_token=access_token)

In [None]:
# check if Fess retriever works
retriever.get_relevant_documents("What is Fess")

## Create Retrieval Chain

In [None]:
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

In [None]:
chain = create_retrieval_chain(retriever, create_stuff_documents_chain(llm, prompt))

In [None]:
response = chain.invoke({"input": "How to install Fess"})
response

In [None]:
response["answer"]