Code referenced from - https://towardsdatascience.com/a-beginners-guide-to-building-a-retrieval-augmented-generation-rag-application-from-scratch-e52921953a5d

To Do: perform Similarity measure



1.   Collect documents
2.   Define similarity measuring criteria - in this we use Jaccard similarity (intersection divided by union of the sets of words)
3.   find similarity and return a response
4.   To avoid bad similarity for negative examples use LLM (we use ollama locally)



# Step 1 - Collection of documents

In [1]:
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

# Step 2 - Define similarity measuring criteria


In [2]:
def jaccard_similarity(query, document):
  query = query.lower().split(" ")
  document = document.lower().split(" ")
  intersection = set(query).intersection(set(document))
  union = set(query).union(set(document))
  return (len(intersection)/ len(union))


In [3]:
def return_response(query, corpus):
  similarities = []
  for doc in corpus:
    similarity = jaccard_similarity(query, doc)
    similarities.append(similarity)

  return corpus_of_documents[similarities.index(max(similarities))]

In [4]:
user_prompt = "What is leisure activity that you like?"

In [5]:
user_input = "I like to hike"

In [6]:
return_response(user_input, corpus_of_documents)

'Go for a hike and admire the natural scenery.'

# 🙋‍♀️ Problem with this apporach --> it does not respond well to negative prompts

Why ? - because it has no semantics; it is just finding words which are available in both documents.

Thus for a negative prompt we get same result because that is the closest documnet.

In [7]:
user_input = "I do not like to hike"

In [8]:
return_response(user_input, corpus_of_documents)

'Go for a hike and admire the natural scenery.'

# ⭐️ Solution: Introduce LLM
Running ollama locally

To dos: 
1. Set up ollama (ollama.com)
2. Get user input
3. Fetch most similar doc using Jaccard similarity criteria
4. pass prompt to language model (LLM)
5. Return result

In [9]:
import json
import requests

# Step 2 and 3

In [10]:
user_input = "I like to hike"
relevant_document = return_response(user_input, corpus_of_documents) # response returned from similarity measure
full_response = []

# Step 4

In [11]:
prompt = """
You are a bot that makes recommendations for leisure activities. You answer in very short sentences. This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a reccomendation to the user based on the recomended activity and the user input.
"""

### Make an API call to ollama (llama2). Make sure ollama is running on your device locally by running ollama serve.

In [12]:
url = "http://localhost:11434/api/generate"
data = {
    "model": "llama2",
    "prompt": prompt.format(user_input = user_input, relevant_document = relevant_document)
}

In [13]:
# Posting response on url

headers = {'Content-Type': 'application/json'}
response = requests.post(url, headers=headers, data=json.dumps(data), stream = True)

try:
    count = 0
    for line in response.iter_lines():
        if line:
            decoded_line = json.loads(line.decode('utf-8'))
            full_response.append(decoded_line['response'])
finally:
    response.close()
print(' '.join(full_response))

 Great !  Based  on  your  interest  in  h ik ing ,  I  recommend  checking  out  the  nearby  state  par ks  or  nature  res erves  for  some  amaz ing  tra ils .  Don ' t  forget  to  bring  plenty  of  water  and  sn acks ,  and  enjoy  the  scen ery  along  the  way ! 


### let's try negative prompt

In [14]:
user_input = "I do not like to hike"
relevant_document = return_response(user_input, corpus_of_documents) # response returned from similarity measure
full_response = []

In [15]:
prompt = """
You are a bot that makes recommendations for leisure activities. You answer in very short sentences. This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a reccomendation to the user based on the recomended activity and the user input.
"""

In [16]:
url = "http://localhost:11434/api/generate"
data = {
    "model": "llama2",
    "prompt": prompt.format(user_input = user_input, relevant_document = relevant_document)
}

In [17]:
# Posting response on url

headers = {'Content-Type': 'application/json'}
response = requests.post(url, headers=headers, data=json.dumps(data), stream = True)

try:
    count = 0
    for line in response.iter_lines():
        if line:
            decoded_line = json.loads(line.decode('utf-8'))
            full_response.append(decoded_line['response'])
finally:
    response.close()
print(' '.join(full_response))

S orry  to  hear  that  you  don ' t  enjoy  h ik ing !  Here ' s  an  alternative  recommendation  for  you : 
 
 How  about  visit ing  a  local  museum  or  art  gallery ?  It ' s  a  great  way  to  appreciate  beautiful  art work  and  learn  something  new  in  a  relax ed  setting . 
