## Dense retrieval example

 Let’s take a look at a dense retrieval example by using Cohere to search the
 Wikipedia page for the film Interstellar. In this example, we will do the
 following:
 1. Get the text we want to make searchable and apply some light
 processing to chunk it into sentences.
 2. Embed the sentences.
 3. Build the search index.
 4. Search and see the results.

In [1]:
#import
import numpy as np 
import cohere
import pandas as pd 
from tqdm import tqdm 

In [2]:
api_key= '0IAnuUYnGXylLb1C288TXaWLDJiENP32RqGbB9U8'

In [3]:
#create and retrive a cohere api key from os.cohere.ai
co= cohere.Client(api_key)

Getting the text archive and chunking it

Let’s use the first section of the Wikipedia article on the film Interstellar.
We’ll get the text, then break it into sentences:

In [4]:
text = """
 Interstellar is a 2014 epic science fiction film co-written, 
directed, and produced by Christopher Nolan. 
It stars Matthew McConaughey, Anne Hathaway, Jessica Chastain, 
Bill Irwin, Ellen Burstyn, Matt Damon, and Michael Caine. 
Set in a dystopian future where humanity is struggling to 
survive, the film follows a group of astronauts who travel 
through a wormhole near Saturn in search of a new home for 
mankind.
 Brothers Christopher and Jonathan Nolan wrote the screenplay, 
which had its origins in a script Jonathan developed in 2007. 
Caltech theoretical physicist and 2017 Nobel laureate in 
Physics[4] Kip Thorne was an executive producer, acted as a 
scientific consultant, and wrote a tie-in book, The Science of 
Interstellar. 
Cinematographer Hoyte van Hoytema shot it on 35 mm movie film in 
the Panavision anamorphic format and IMAX 70 mm. 
Principal photography began in late 2013 and took place in 
Alberta, Iceland, and Los Angeles. 
Interstellar uses extensive practical and miniature effects and 
the company Double Negative created additional digital effects.
 Interstellar premiered on October 26, 2014, in Los Angeles. 
In the United States, it was first released on film stock, 
expanding to venues using digital projectors. 
The film had a worldwide gross over $677 million (and $773 
million with subsequent re-releases), making it the tenth-highest 
grossing film of 2014. 
It received acclaim for its performances, direction, screenplay, 
musical score, visual effects, ambition, themes, and emotional 
weight. 
It has also received praise from many astronomers for its 
scientific accuracy and portrayal of theoretical astrophysics. 
Since its premiere, Interstellar gained a cult following,[5] and 
now is regarded by many sci-fi experts as one of the best 
science-fiction films of all time.
 Interstellar was nominated for five awards at the 87th Academy 
Awards, winning Best Visual Effects, and received numerous other 
accolades"""

In [5]:
#split into list of sentences
texts= text.split('.')

#clean up to remove emplty spaces
texts= [t.strip('\n') for t in texts]

Embedding the text chunks


 Let’s now embed the texts. We’ll send them to the Cohere API, and get back
 a vector for each text:

In [6]:
#get the embeddings
response= co.embed(
    texts= texts, 
    input_type= 'search_document',

).embeddings

embeds= np.array(response)
print(embeds.shape)

(15, 4096)


Building the search index

 Before we can search, we need to build a search index. An index stores the
 embeddings and is optimized to quickly retrieve the nearest neighbors even
 if we have a very large number of points:

In [7]:
import faiss #used for fast similarity search and clustering of dense vectors
dim=embeds.shape[1]
index= faiss.IndexFlatL2(dim)
print(index.is_trained)
index.add(np.float32(embeds))


True


Search the index

 We can now search the dataset using any query we want. We simply embed
 the query and present its embedding to the index, which will retrieve the
 most similar sentence from the Wikipedia article.

In [8]:
#function for searching

def search(query, number_of_results=5):

    #get the query embeds
    query_embed= co.embed(texts= [query], input_type='search_query').embeddings[0]

    #retrieve the nearest neighbour
    distnaces,similar_items_ids= index.search(np.float32([query_embed]), number_of_results)

    #get the reseults
    texts_np = np.array(texts)
    results=pd.DataFrame(data= {'texts': texts_np[similar_items_ids[0]],
                            'distances': distnaces[0]})
    
    #print nd return the results
    print(f"Query:'{query}' \n Nearest Neighbours:")
    return results

In [9]:
#testing
query ="how precise was the science"
results= search(query)
results 

Query:'how precise was the science' 
 Nearest Neighbours:


Unnamed: 0,texts,distances
0,\nIt has also received praise from many astro...,10267.427734
1,"\nSince its premiere, Interstellar gained a c...",12490.473633
2,\nCaltech theoretical physicist and 2017 Nobe...,12507.566406
3,\nInterstellar uses extensive practical and m...,12546.205078
4,\nCinematographer Hoyte van Hoytema shot it o...,13720.408203


Reranking example

 A reranker takes in the search query and a number of search results, and
 returns the optimal ordering of these documents so the most relevant ones
 to the query are higher in ranking. Cohere’s Rerank endpoint is a simple
 way to start using a first reranker. We simply pass it the query and texts and
 get the results back. We don’t need to train or tune it:

In [10]:
query ="how precise was the science"
results= co.rerank(query=query, documents=texts, top_n=3, return_documents=True)
results.results

for idx, result in enumerate(results.results):
    print(idx, result.relevance_score, result.document.text)

0 0.15239799  
It has also received praise from many astronomers for its 
scientific accuracy and portrayal of theoretical astrophysics
1 0.050354082  
The film had a worldwide gross over $677 million (and $773 
million with subsequent re-releases), making it the tenth-highest 
grossing film of 2014
2 0.0350424  Interstellar is a 2014 epic science fiction film co-written, 
directed, and produced by Christopher Nolan


# Grounded Generation with an LLM API

 Let’s now turn our search system into a RAG system. We do that by adding
 an LLM to the end of the search pipeline. We present the question and the
 top retrieved documents to the LLM, and ask it to answer the question
given the context provided by the search results

This generation step is called grounded generation because the retrieved
 relevant information we provide the LLM establishes a certain context that
 grounds the LLM in the domain we’re interested in

Let’s look at how to add a grounded generation step after the search results
 to create our first RAG system. For this example, we’ll use Cohere’s
 managed LLM, which builds on the search systems we’ve seen earlier in
 the chapter. We’ll use embedding search to retrieve the top documents, then
 we’ll pass those to the co.chat endpoint along with the questions to
 provide a grounded answer:

In [11]:
query= "income generated"

#1 - Retrival
#we will use the embedding search
results= search(query)

#Grounded generation

docs_dict= [{'text': text} for text in results['texts']]
response= co.chat(
    message = query, 
    documents= docs_dict
)

print(response.text)

Query:'income generated' 
 Nearest Neighbours:
The film Interstellar had a worldwide gross of over $677 million, and $773 million with subsequent re-releases, making it the tenth-highest grossing film of 2014.


## RAG with Local Models

In [19]:
from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
	filename="Phi-3-mini-4k-instruct-fp16.gguf",
)


Phi-3-mini-4k-instruct-fp16.gguf:   0%|          | 0.00/7.64G [00:00<?, ?B/s]

KeyboardInterrupt: 

In [None]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embedding_model= HuggingFaceEmbeddings(
    model_name='thenlper/gte-small'
)

: 

In [None]:
from langchain.vectorstores import FAISS

db= FAISS.from_texts(texts, embedding_model)

NameError: name 'texts' is not defined

In [None]:
from langchain import PromptTemplate
#create a prompt template
template = """<|user|>
 Relevant information:
 {context}
 Provide a concise answer the following question using the 
relevant information provided above:
 {question}<|end|>
 <|assistant|>"""

prompt = PromptTemplate(
    template= template,
    input_variables = ['context', 'question']
)


from langchain.chains import RetrievalQA

#Rag Pipeline

rag= RetrievalQA.from_chain_type(
    llm = llm, 
    chain_type= 'stuff',
    retriever= db.as_retriever(),
    chain_type_kwargs ={
        'prompt': prompt,
        'verbose': True
    }
)

NameError: name 'llm' is not defined

In [14]:
import os
print(os.path.exists("Phi-3-mini-4k-instruct-fp16.gguf"))  # should return True


False
