<a href="https://colab.research.google.com/github/astrapi69/DroidBallet/blob/master/NLP_D3_4_L6_QA_Search_Engines_with_Transformers_and_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><a target="_blank" href="https://academy.constructor.org/"><img src="https://jobtracker.ai/static/media/constructor_academy_colour.b86fa87f.png" width="200" style="background:none; border:none; box-shadow:none;" /></a> </center>

_____

<center>Constructor Academy, 2024</center>

# Improved Question-Answering Search Engines with Transformers and Large Language Models (LLMs)


![](https://i.imgur.com/7SXKckD.png)

Transfer Learning is the power of leveraging already trained models and tune \ adapt them to our own downstream tasks.

# QA Search Engine using Transformers

## Retrival and Re-ranking

In Semantic Search we have shown how to use SentenceTransformer to compute embeddings for queries, sentences, and paragraphs and how to use this for semantic search.

For complex search tasks, for example, for question answering retrieval, the search can significantly be improved by using Retrieve & Re-Rank.


## Retrieve & Re-Rank Pipeline

A pipeline for information retrieval / question answering retrieval that works well is the following. All components are provided and explained in this notebook:

![](https://i.imgur.com/yIXJRSo.png)


Given a search query, we first use a retrieval system that retrieves a large list of e.g. 100 possible hits which are potentially relevant for the query.
For the retrieval, we can use either lexical search, e.g. with ElasticSearch, or we can use dense retrieval with a bi-encoder. Simple Lexical searches can be based on TF-IDF, BM25 etc.


However, the retrieval system might retrieve documents that are not that relevant for the search query.
Hence, in a second stage, we use a re-ranker based on a cross-encoder that scores the relevancy of all candidates for the given search query.

The output will be a ranked list of hits we can present to the user.


## Retrieval: Bi-Encoder

For the retrieval of the candidate set, we can either use lexical search (e.g. ElasticSearch), or we can use a bi-encoder (semantic search) which is implemented in this repository.

Lexical search looks for literal matches of the query words in your document collection. It will not recognize synonyms, acronyms or spelling variations.

In contrast, semantic search (or dense retrieval) encodes the search query into vector space and retrieves the document embeddings that are close in vector space.

Bi-Encoders produce for a given sentence or document an embedding.


## Re-Ranker: Cross-Encoder

The retriever has to be efficient for large document collections with millions of entries. However, it might return irrelevant candidates.

A re-ranker based on a Cross-Encoder can substantially improve the final results for the user. The query and a possible document is passed simultaneously to transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query.

![](https://i.imgur.com/PFgkrcI.png)

The advantage of Cross-Encoders is the higher performance, as they perform attention across the query and the document.

Scoring thousands or millions of (query, document)-pairs would be rather slow. Hence, we use the retriever to create a set of e.g. 100 possible candidates which are then re-ranked by the Cross-Encoder.

First, you use an efficient Bi-Encoder to retrieve e.g. the top-100 most similar sentences for a query. Then, you use a Cross-Encoder to re-rank these 100 hits by computing the score for every (query, hit) combination.





## Retrieve & Re-Rank Search Engine over Simple Wikipedia

This examples demonstrates the Retrieve & Re-Rank Setup and allows to search over Simple Wikipedia.

You can input a query or a question. The script then uses semantic search to find relevant passages in Simple English Wikipedia

### Install Dependencies

In [None]:
!pip install -U sentence-transformers

Collecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/86.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sentencepiece (from sentence-transformers)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: sentence-transformers
  Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone
  Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125923 sha256=94ce78309f0563c7814e55f0a4cb93d17a10043322e750463bcbc7460cfc3e84
  Stored in directory: 

### Load Transformer Models, Wikipedia Data and Generate Embeddings

For semantic search, we use `SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')` and retrieve 32 potentially relevant passages that answer the input query.

Next, we use a more powerful CrossEncoder `(cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2'))` that scores the query and all retrieved passages for their relevancy. The cross-encoder further boost the performance.

MS MARCO is a large scale information retrieval corpus that was created based on real user search queries using Bing search engine.

The provided models can be used for semantic search, i.e., given keywords / a search phrase / a question, the model will find passages that are relevant for the search query.

In [None]:
import json
from sentence_transformers import SentenceTransformer, CrossEncoder, util
import gzip
import os
import torch


# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only
# about 170k articles. We split these articles into paragraphs and encode them with the bi-encoder

wikipedia_filepath = 'simplewiki-2020-11-01.jsonl.gz'

if not os.path.exists(wikipedia_filepath):
    util.http_get('http://sbert.net/datasets/simplewiki-2020-11-01.jsonl.gz', wikipedia_filepath)

passages = []
with gzip.open(wikipedia_filepath, 'rt', encoding='utf8') as fIn:
    for line in fIn:
        data = json.loads(line.strip())

        #Add all paragraphs
        #passages.extend(data['paragraphs'])

        #Only add the first paragraph
        passages.append(data['paragraphs'][0])

print("Passages:", len(passages))

  0%|          | 0.00/50.2M [00:00<?, ?B/s]

Passages: 169597


In [None]:
# We subset our data so we only use a subset of wikipedia to run things faster

passages = [passage for passage in passages for x in ['fish', 'india', 'germany', 'north pole', 'nlp',
                                                      'natural language processing', 'linguistics',
                                                      'switzerland', 'machine learning', 'artificial intelligence',
                                                      'zürich', 'zurich'] if x in passage.lower() ]

In [None]:
len(passages)

12216

In [None]:
passages[0]

'The Federal Ministry of Defence (BMVg) is a Ministry of the Federal Republic of Germany. At present the headquarters of the ministry are still in Bonn with 3230 working in the Hardthöhe. The second office employs about 500 people in the Bendlerblock in Berlin.'

In [None]:
if not torch.cuda.is_available():
    print("Warning: No GPU found. Please add GPU to your notebook")


#We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
bi_encoder = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
bi_encoder.max_seq_length = 256     #Truncate long passages to 256 tokens
top_k = 32                          #Number of passages we want to retrieve with the bi-encoder

#The bi-encoder will retrieve 100 documents. We use a cross-encoder, to re-rank the results list to improve the quality
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# We encode all passages into our vector space. This takes about 5 minutes (depends on your GPU speed)
corpus_embeddings = bi_encoder.encode(passages, convert_to_tensor=True, show_progress_bar=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


.gitattributes:   0%|          | 0.00/737 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/11.5k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/383 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.8k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Batches:   0%|          | 0/382 [00:00<?, ?it/s]

In [None]:
passages[0]

'The Federal Ministry of Defence (BMVg) is a Ministry of the Federal Republic of Germany. At present the headquarters of the ministry are still in Bonn with 3230 working in the Hardthöhe. The second office employs about 500 people in the Bendlerblock in Berlin.'

In [None]:
corpus_embeddings[0], corpus_embeddings[0].shape

(tensor([ 4.8048e-02,  5.5979e-02, -1.1863e-01, -6.4569e-02,  4.3361e-02,
          1.0472e-01,  1.8832e-02,  1.2606e-01, -4.5151e-02,  2.1501e-02,
         -9.0600e-02, -6.1246e-02,  2.8600e-02, -1.1343e-02, -2.3748e-02,
          4.1271e-02,  2.1042e-02, -4.5739e-02,  1.6690e-02, -3.7229e-02,
         -3.0288e-02,  1.4004e-02,  5.6109e-02, -2.7961e-02,  2.8541e-02,
         -1.9813e-02, -1.9947e-02,  5.3365e-02,  9.2951e-02, -5.6196e-02,
         -1.2721e-02, -1.2962e-02,  1.4869e-02,  1.1327e-02,  8.7028e-02,
          8.4059e-02,  4.6508e-02, -3.3218e-02, -5.3665e-02,  4.8893e-02,
          8.9110e-02,  8.6201e-03, -4.6008e-04,  1.8595e-02,  9.7259e-03,
          7.5954e-02,  7.3336e-02, -3.3430e-02, -2.1883e-02, -7.0762e-02,
          8.2328e-02, -3.5604e-02, -3.1205e-03,  8.8906e-02,  1.9227e-02,
         -2.7538e-03, -1.4097e-02, -4.7318e-02,  4.8360e-02, -2.6323e-02,
         -2.9146e-02,  4.6869e-02, -4.7347e-02,  3.0356e-02,  1.1024e-02,
          8.9714e-03,  2.0443e-02,  6.

This function will search all wikipedia articles for passages that answer the query

In [None]:
def search(query):
    print("Input question:", query)

    ##### Bi-Encoder: Sematic Search #####
    # Encode the query using the bi-encoder and find potentially relevant passages
    question_embedding = bi_encoder.encode(query, convert_to_tensor=True)
    question_embedding = question_embedding.cuda()
    hits = util.semantic_search(question_embedding, corpus_embeddings, top_k=top_k)
    hits = hits[0]  # Get the hits for the first query

    ##### Cross-Encoder: Re-Ranking #####
    # Now, score all retrieved passages with the cross_encoder
    cross_inp = [[query, passages[hit['corpus_id']]] for hit in hits]
    cross_scores = cross_encoder.predict(cross_inp)

    # Sort results by the cross-encoder scores
    for idx in range(len(cross_scores)):
        hits[idx]['cross-score'] = cross_scores[idx]

    # Output of top-1 hits from bi-encoder
    print("\n-------------------------\n")
    print("Top Bi-Encoder Retrieval hit")
    hit = sorted(hits, key=lambda x: x['score'], reverse=True)[0]
    print("\t{:.3f}\t{}".format(hit['score'], passages[hit['corpus_id']].replace("\n", " ")))

    # Output of top-1 hits from re-ranker
    print("\n-------------------------\n")
    print("Top Cross-Encoder Re-ranker hit")
    hit = sorted(hits, key=lambda x: x['cross-score'], reverse=True)[0]
    print("\t{:.3f}\t{}".format(hit['cross-score'], passages[hit['corpus_id']].replace("\n", " ")))

In [None]:
search(query = "What is the capital of Switzerland?")

Input question: What is the capital of Switzerland?

-------------------------

Top Bi-Encoder Retrieval hit
	0.679	Bern is the capital city of Switzerland. It is also the capital of the canton of Bern.

-------------------------

Top Cross-Encoder Re-ranker hit
	8.157	Bern is the capital city of Switzerland. It is also the capital of the canton of Bern.


In [None]:
search(query = "What is the largest city of Switzerland?")

Input question: What is the largest city of Switzerland?

-------------------------

Top Bi-Encoder Retrieval hit
	0.752	Geneva (, , , , ) is the second biggest city in Switzerland. Only Zürich is bigger. Geneva is the biggest city in Romandy (the French-speaking part of Switzerland).

-------------------------

Top Cross-Encoder Re-ranker hit
	8.092	Geneva (, , , , ) is the second biggest city in Switzerland. Only Zürich is bigger. Geneva is the biggest city in Romandy (the French-speaking part of Switzerland).


In [None]:
search(query = "What is the capital of Germany?")

Input question: What is the capital of Germany?

-------------------------

Top Bi-Encoder Retrieval hit
	0.647	Berlin is the capital city of Germany. It is also the biggest city in Germany. About 3,700,000 people live there.

-------------------------

Top Cross-Encoder Re-ranker hit
	7.922	Berlin is the capital city of Germany. It is also the biggest city in Germany. About 3,700,000 people live there.


In [None]:
search(query = "What is the capital of India?")

Input question: What is the capital of India?

-------------------------

Top Bi-Encoder Retrieval hit
	0.598	Mumbai (previously known as Bombay until 1996) is a natural harbor on the west coast of India, and is the capital city of Maharashtra state. It is India's largest city, and one of the world's most populous cities. It is the financial capital of India. The city is the second most-populous in the world. It has approximately 13 million people. Along with the neighboring cities of Navi Mumbai and Thane, it forms the world's 4th largest urban agglomeration. They have around 19.1 million people.

-------------------------

Top Cross-Encoder Re-ranker hit
	7.078	New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7 km. New Delhi has a population of about

In [None]:
search(query = "Coldest place on earth?")

Input question: Coldest place on earth?

-------------------------

Top Bi-Encoder Retrieval hit
	0.598	East Antarctica, also called Greater Antarctica, is the largest part (two-thirds) of the Antarctic continent. It is on the Indian Ocean side of the Transantarctic Mountains. It is the coldest, windiest, and driest part of Earth. East Antarctica holds the record as the coldest place on earth.

-------------------------

Top Cross-Encoder Re-ranker hit
	7.080	East Antarctica, also called Greater Antarctica, is the largest part (two-thirds) of the Antarctic continent. It is on the Indian Ocean side of the Transantarctic Mountains. It is the coldest, windiest, and driest part of Earth. East Antarctica holds the record as the coldest place on earth.


In [None]:
search(query = "What is natural language processing?")

Input question: What is natural language processing?

-------------------------

Top Bi-Encoder Retrieval hit
	0.773	Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human languages. We say "Natural Language" to mean human language, and to indicate that we are not talking about computer (programming) languages.

-------------------------

Top Cross-Encoder Re-ranker hit
	10.696	Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human languages. We say "Natural Language" to mean human language, and to indicate that we are not talking about computer (programming) languages.


In [None]:
search(query = "Can fish fly?")

Input question: Can fish fly?

-------------------------

Top Bi-Encoder Retrieval hit
	0.748	Flying fish are marine oceanic fishes of the family Exocoetidae. They are about 50 species, and they live worldwide in warm waters. They are noted for their ability to glide. They are all small, with a maximum length of about 45 cm (18 inches), and have winglike, rigid fins and an unevenly forked tail.

-------------------------

Top Cross-Encoder Re-ranker hit
	6.971	Flying fish are marine oceanic fishes of the family Exocoetidae. They are about 50 species, and they live worldwide in warm waters. They are noted for their ability to glide. They are all small, with a maximum length of about 45 cm (18 inches), and have winglike, rigid fins and an unevenly forked tail.


In [None]:
search(query = "How do you train a machine learning model?")

Input question: How do you train a machine learning model?

-------------------------

Top Bi-Encoder Retrieval hit
	0.593	In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to these results correctly. Usually, such systems work with vectors. They get the training data and the result of the training as two vectors and produce a "classifier". Usually, the system uses inductive reasoning to generalize the training data.

-------------------------

Top Cross-Encoder Re-ranker hit
	3.050	In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to these results correctly. Usually, such systems work with vectors. They get the training data and the result of the training as two vectors and produce a "classifier". Usually,

# QA Search Engine using Large Language Models - ChatGPT

Here we use an Open AI LLM to generate contextual embeddings for each wikipedia article.

Then we use ChatGPT (GPT3.5) to answer questions just as a human would by searching for the most similar article based on our input queries.

The new model, `text-embedding-ada-002`, replaces five separate models for text search, text similarity, and code search, and outperforms Open AI's previous most capable model, Davinci, at most tasks, while being priced 99.8% lower.

GPT-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the GPT-3.5 family is `gpt-3.5-turbo` which has been optimized for chat using the Chat Completions API but works well for traditional completions tasks as well.

### Load Dependencies

In [None]:
!pip install sentence_transformers
!pip install "openai<1.0.0"
!pip install "langchain==0.0.333"
!pip install tiktoken
!pip install faiss-gpu

Collecting openai<1.0.0
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.[0m[31m
[0mSuccessfully installed openai-0.28.1
Collecting langchain==0.0.333
  Downloading langchain-0.0.333-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.0.333)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain==0.0.333)
  Downloading jsonpatch-1.33-p

### Load Wikipedia Data

### Load Open AI LLMs

## Load OpenAI API Credentials

Here we load it from a file so we don't explore the credentials on the internet by mistake

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
import yaml

with open('chatgpt_api_credentials.yml', 'r') as file:
    api_creds = yaml.safe_load(file)

In [None]:
api_creds.keys()

dict_keys(['openai_key'])

In [None]:
import os

os.environ['OPENAI_API_KEY'] = api_creds['openai_key']

In [None]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.0)

### Generate LLM Embeddings and store them in FAISS index

Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [None]:
passages[:3]

['The Federal Ministry of Defence (BMVg) is a Ministry of the Federal Republic of Germany. At present the headquarters of the ministry are still in Bonn with 3230 working in the Hardthöhe. The second office employs about 500 people in the Bendlerblock in Berlin.',
 'Oberrüti is a municipality of the district of Muri in the canton of Aargau in Switzerland.',
 'Obersiggenthal is a municipality of the district of Baden, in the canton of Aargau in Switzerland.']

In [None]:
# The vectorstore we'll be using
from langchain.vectorstores import FAISS

# The embedding engine that will convert our text to vectors
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
from langchain.docstore.document import Document

docs = [Document(page_content=doc) for doc in passages]

In [None]:
docs[:3]

[Document(page_content='The Federal Ministry of Defence (BMVg) is a Ministry of the Federal Republic of Germany. At present the headquarters of the ministry are still in Bonn with 3230 working in the Hardthöhe. The second office employs about 500 people in the Bendlerblock in Berlin.'),
 Document(page_content='Oberrüti is a municipality of the district of Muri in the canton of Aargau in Switzerland.'),
 Document(page_content='Obersiggenthal is a municipality of the district of Baden, in the canton of Aargau in Switzerland.')]

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunked_docs = splitter.split_documents(docs)

In [None]:
chunked_docs[:3]

[Document(page_content='The Federal Ministry of Defence (BMVg) is a Ministry of the Federal Republic of Germany. At present the headquarters of the ministry are still in Bonn with 3230 working in the Hardthöhe. The second office employs about 500 people in the Bendlerblock in Berlin.'),
 Document(page_content='Oberrüti is a municipality of the district of Muri in the canton of Aargau in Switzerland.'),
 Document(page_content='Obersiggenthal is a municipality of the district of Baden, in the canton of Aargau in Switzerland.')]

In [None]:
# Get your embeddings engine ready
gpt_embedding_model = OpenAIEmbeddings()

# Embed your documents and combine with the raw text in a db.
# This takes roughly 1 min
vector_db = FAISS.from_documents(chunked_docs, gpt_embedding_model)

In [None]:
# you can save your embeddings in a local vector database for future use
vector_db.save_local('wiki_faiss_index')

# loading your pre-generated embeddings for your corpus of wikipedia
new_db = FAISS.load_local("wiki_faiss_index", gpt_embedding_model) # faiss, weavite, pinecone,  chromadb

In [None]:
new_db

<langchain.vectorstores.faiss.FAISS at 0x7e398ee40a90>

### Build a QA Retriever Engine

In [None]:
!pip install langchainhub

Collecting langchainhub
  Downloading langchainhub-0.1.14-py3-none-any.whl (3.4 kB)
Collecting types-requests<3.0.0.0,>=2.31.0.2 (from langchainhub)
  Downloading types_requests-2.31.0.20240106-py3-none-any.whl (14 kB)
Installing collected packages: types-requests, langchainhub
Successfully installed langchainhub-0.1.14 types-requests-2.31.0.20240106


In [None]:
from langchain import hub

In [None]:
prompt = hub.pull("rlm/rag-prompt")
prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

In [None]:
from langchain.chains import RetrievalQA

In [None]:
qa_engine = RetrievalQA.from_chain_type(llm=llm,
                                        chain_type="stuff",
                                        retriever=new_db.as_retriever(),
                                        chain_type_kwargs={"prompt": prompt})

In [None]:
query = "What is the capital of Switzerland?"
qa_engine({"query": query})

{'query': 'What is the capital of Switzerland?',
 'result': 'The capital of Switzerland is Bern.'}

In [None]:
query = "What is the largest city of Switzerland?"
qa_engine.run(query)

'Zürich is the largest city of Switzerland.'

In [None]:
query = "What is the capital of India?"
qa_engine.run(query)

'The capital of India is New Delhi.'

In [None]:
query = "Tell me some facts about the largest city of Switzerland?"
qa_engine.run(query)

'Bern is the capital city of Switzerland and the canton of Bern. It has a population of about 947,000 people.'

In [None]:
query = "Can fish fly? What about flying fish?"
qa_engine.run(query)

"Fish cannot fly in the traditional sense as they do not have the ability to lift off from the ground and sustain flight. However, flying fish have the ability to glide above the water's surface for short distances using their winglike fins."

In [None]:
query = "Who was the killer in Murder on the Orient Express?"
qa_engine.run(query)

"I don't know."