# Retreival Augmented Generation (RAG)
## Golden Retriever FAQ Retrieval System

This repository contains a Python-based FAQ retrieval system designed to answer common questions about Golden Retrievers. It utilizes fuzzy matching and semantic search techniques to handle various ways a question may be phrased, offering relevant answers from a predefined FAQ database.

## Table of Contents
* **Section 1:** [Install required packages](#section-1)
* **Section 2:**  [Create the Facts/ knowledge base to be augmented](#section-2)
* **Section 3:**  [Approach 1 - RapidFuzz](#section-3)
* **Section 4:**  [Approach 2 - Semantic Search using cosine_similarity](#section-4)
* **Section 5:**  [Approach 3 - A chatbot](#section-5)

## Section 1. Install required packages <a name="section-1"></a>

In [9]:
!pip install rapidfuzz



In [1]:
!pip install transformers sentence-transformers faiss-cpu datasets numpy pandas torch scikit-learn



## Section 2. Predefined FAQs <a name="section-2"></a>

In [2]:
faq = {
    # Lifespan and Growth
    "What is the average lifespan of a Golden Retriever?": "The average lifespan of a Golden Retriever is around 10 to 12 years.",
    "How long do Golden Retrievers live?": "Golden Retrievers typically live for 10 to 12 years, depending on their health and care.",
    "When do Golden Retrievers stop growing?": "Golden Retrievers usually stop growing around 18 to 24 months of age.",
    "How big do Golden Retrievers get?": "Male Golden Retrievers usually weigh between 65 to 75 pounds, while females weigh between 55 to 65 pounds.",

    # Exercise and Activity
    "How much exercise do Golden Retrievers need?": "Golden Retrievers need at least one hour of exercise per day. This can include walking, running, or playing fetch.",
    "What activities do Golden Retrievers enjoy?": "Golden Retrievers enjoy swimming, fetching, hiking, running, and playing games like tug-of-war.",
    "Are Golden Retrievers good for hiking?": "Yes, Golden Retrievers are excellent hiking companions due to their endurance and love for outdoor activities.",

    # Diet and Feeding
    "What should I feed my Golden Retriever?": "Golden Retrievers should be fed a balanced diet with high-quality dry dog food. The amount depends on the dog's age, size, and activity level.",
    "How much food does a Golden Retriever need?": "An adult Golden Retriever typically needs between 2 to 3 cups of high-quality dog food daily, divided into two meals.",
    "Can Golden Retrievers eat human food?": "Some human foods, like chicken, carrots, and apples, are safe for Golden Retrievers, but avoid toxic foods like chocolate, grapes, and onions.",

    # Grooming and Care
    "How often should I groom my Golden Retriever?": "Golden Retrievers should be brushed 3 to 5 times a week to prevent matting and reduce shedding. Regular grooming is essential to maintain their coat health.",
    "Do Golden Retrievers shed a lot?": "Yes, Golden Retrievers shed heavily, especially during spring and fall. Regular brushing can help control the shedding.",
    "How often should I bathe my Golden Retriever?": "Golden Retrievers should be bathed every 6 to 8 weeks, or more often if they get dirty or start to smell.",
    "Do Golden Retrievers need haircuts?": "Golden Retrievers don't need regular haircuts, but trimming around their ears, paws, and tail can keep them looking neat.",

    # Training and Behavior
    "Are Golden Retrievers easy to train?": "Yes, Golden Retrievers are intelligent and eager to please, making them relatively easy to train with positive reinforcement.",
    "How can I train my Golden Retriever?": "Use positive reinforcement, such as treats and praise, to train your Golden Retriever. They respond well to consistency and patience.",
    "Do Golden Retrievers bark a lot?": "Golden Retrievers are not known to be excessive barkers, but they will bark to alert their owners of strangers or unfamiliar noises.",
    "Are Golden Retrievers good with kids?": "Yes, Golden Retrievers are known for their friendly and gentle nature, making them great companions for children.",
    "Do Golden Retrievers get along with other pets?": "Golden Retrievers generally get along well with other pets due to their friendly and social temperament.",

    # Health and Common Issues
    "What are common health issues in Golden Retrievers?": "Common health issues in Golden Retrievers include hip dysplasia, elbow dysplasia, cataracts, and heart problems.",
    "How can I prevent hip dysplasia in my Golden Retriever?": "To reduce the risk of hip dysplasia, ensure your Golden Retriever maintains a healthy weight, exercises regularly, and avoids excessive jumping during puppyhood.",
    "Are Golden Retrievers prone to cancer?": "Unfortunately, Golden Retrievers have a higher risk of cancer than many other breeds. Regular vet check-ups and a healthy lifestyle can help catch issues early.",
    "How often should I take my Golden Retriever to the vet?": "Golden Retrievers should visit the vet for an annual check-up. Puppies and seniors may need more frequent visits to monitor their health.",

    # Puppy Care
    "How do I care for a Golden Retriever puppy?": "Golden Retriever puppies need a balanced diet, plenty of exercise, and socialization. Puppy training is crucial to ensure good behavior as they grow.",
    "When should I start training my Golden Retriever puppy?": "Training should start as early as 8 weeks old, focusing on basic commands and housebreaking. Early socialization is important too.",
    "How much sleep does a Golden Retriever puppy need?": "Golden Retriever puppies need around 18 to 20 hours of sleep a day to support their rapid growth and development.",

    # General Traits and Characteristics
    "What is the temperament of a Golden Retriever?": "Golden Retrievers are known for their friendly, gentle, and loyal temperament. They are great family pets and are good with children and other animals.",
    "Are Golden Retrievers good family pets?": "Yes, Golden Retrievers are excellent family pets due to their friendly, gentle, and loyal nature.",
    "Why are Golden Retrievers so friendly?": "Golden Retrievers were bred to be companion dogs and hunting partners, which required a friendly and cooperative temperament.",
    "How intelligent are Golden Retrievers?": "Golden Retrievers are highly intelligent and rank among the top 10 most intelligent dog breeds. They are quick learners and eager to please.",
    "Do Golden Retrievers have a strong sense of smell?": "Yes, Golden Retrievers have an excellent sense of smell, which is why they are often used as search-and-rescue and detection dogs.",

    # Special Uses and Roles
    "Can Golden Retrievers be service dogs?": "Yes, Golden Retrievers are commonly used as service dogs due to their intelligence, calm demeanor, and eagerness to help.",
    "Are Golden Retrievers good therapy dogs?": "Golden Retrievers make excellent therapy dogs because of their calm, gentle, and friendly nature, which makes them great for emotional support.",
    "Do Golden Retrievers make good hunting dogs?": "Yes, Golden Retrievers were originally bred as hunting dogs, specifically for retrieving waterfowl. They have a strong retrieving instinct and love water.",

    # Miscellaneous
    "Are Golden Retrievers good apartment dogs?": "Golden Retrievers can adapt to apartment living if they get enough exercise, but they generally do better in homes with yards.",
    "Do Golden Retrievers like water?": "Golden Retrievers love water and are excellent swimmers. They were originally bred to retrieve waterfowl, so swimming comes naturally to them.",
    "Are Golden Retrievers hypoallergenic?": "No, Golden Retrievers are not hypoallergenic. They shed a lot, which can trigger allergies in sensitive individuals."
}


## Section 3. Approach 1: RapidFuzz <a name="section-3"></a>

In [3]:
# Import required libraries
from rapidfuzz import fuzz, process

In [4]:
# Fuzzy matching function to retrieve the closest FAQ
def retrieve_with_fuzzy_faq(query, faq, threshold=70):
    result = process.extractOne(query, faq.keys(), scorer=fuzz.token_sort_ratio)
    if result:
        matched_question, score, *_ = result  # Unpack only the first two elements
        if score >= threshold:
            return faq[matched_question]
    return "I'm sorry, I don't have an answer for that."

In [5]:
# Predefined Queries
query_1 = "How long do Golden Retrievers live?"
query_2 = "What should I feed my Golden Retriever?"
query_3 = "Do Golden Retrievers need to play?"
query_4 = "Do Golden Retrievers need to exercise?"
query_5 = "What is a Golden Retrievers lifespan?"

# Retrieve using Fuzzy Matching
print("Fuzzy Matching Results:")
print(retrieve_with_fuzzy_faq(query_1, faq))
print(retrieve_with_fuzzy_faq(query_2, faq))
print(retrieve_with_fuzzy_faq(query_3, faq))
print(retrieve_with_fuzzy_faq(query_4, faq))
print(retrieve_with_fuzzy_faq(query_5, faq))

Fuzzy Matching Results:
Golden Retrievers typically live for 10 to 12 years, depending on their health and care.
Golden Retrievers should be fed a balanced diet with high-quality dry dog food. The amount depends on the dog's age, size, and activity level.
Yes, Golden Retrievers shed heavily, especially during spring and fall. Regular brushing can help control the shedding.
Golden Retrievers don't need regular haircuts, but trimming around their ears, paws, and tail can keep them looking neat.
The average lifespan of a Golden Retriever is around 10 to 12 years.


# Section 4. Approach 2: Semantic Search using cosine_similarity from sklearn <a name="section-4"></a>

In [6]:
# Import required libraries
import torch
from transformers import AutoTokenizer, AutoModel
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

In [14]:
# Define a class and relevant functions
class FAQRetriever:
    def __init__(self, faq_dict):
        self.faq_dict = faq_dict
        self.questions = list(faq_dict.keys())
        self.answers = list(faq_dict.values())
        self.model_name = "sentence-transformers/all-MiniLM-L6-v2"  # Pre-trained model for embeddings
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.model = AutoModel.from_pretrained(self.model_name)
        self.question_embeddings = self.embed_questions(self.questions)

    # Method to embed questions using pre-trained model
    def embed_questions(self, questions):
        inputs = self.tokenizer(questions, padding=True, truncation=True, return_tensors="pt")
        with torch.no_grad():
            model_output = self.model(**inputs)
        embeddings = model_output.last_hidden_state.mean(dim=1)
        return embeddings

    # Method to compute cosine similarity using sklearn
    def compute_similarity(self, query_embedding, question_embeddings):
        # Convert PyTorch tensors to numpy arrays for use with cosine_similarity from sklearn
        query_embedding = query_embedding.numpy()
        question_embeddings = question_embeddings.numpy()
        return cosine_similarity(query_embedding, question_embeddings)

    # Method to retrieve the most similar FAQ based on embedding
    def retrieve(self, query, top_k=1):
        inputs = self.tokenizer(query, return_tensors="pt", padding=True, truncation=True)
        with torch.no_grad():
            query_embedding = self.model(**inputs).last_hidden_state.mean(dim=1)
        similarities = self.compute_similarity(query_embedding, self.question_embeddings)
        top_k_indices = similarities.argsort()[0][-top_k:][::-1]
        # return [(self.questions[i], self.answers[i]) for i in top_k_indices]
        return self.answers[top_k_indices[0]]

In [15]:
 # Instantiate the FAQRetriever
faq_retriever = FAQRetriever(faq)

# Example Queries
query_1 = "How long do Golden Retrievers live?"
query_2 = "What should I feed my Golden Retriever?"
query_3 = "Do Golden Retrievers need to play?"
query_4 = "Do Golden Retrievers need to exercise?"
query_5 = "What is a Golden Retrievers lifespan?"

# Retrieve using Embedding-based Semantic Search with Cosine Similarity
print("\nEmbedding-based Search Results:")
print(faq_retriever.retrieve(query_1))
print(faq_retriever.retrieve(query_2))
print(faq_retriever.retrieve(query_3))
print(faq_retriever.retrieve(query_4))
print(faq_retriever.retrieve(query_5))




Embedding-based Search Results:
Golden Retrievers typically live for 10 to 12 years, depending on their health and care.
Golden Retrievers should be fed a balanced diet with high-quality dry dog food. The amount depends on the dog's age, size, and activity level.
Golden Retrievers generally get along well with other pets due to their friendly and social temperament.
Golden Retrievers need at least one hour of exercise per day. This can include walking, running, or playing fetch.
The average lifespan of a Golden Retriever is around 10 to 12 years.


# Section 5. Approach 3: Chatbot<a name="section-5"></a>
### Semantic Search with LLM as a fallback

In [16]:
import random
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

In [28]:
# Load Flan-T5 model
model_name = "google/flan-t5-base"  # You can also try "google/flan-t5-large" or "google/flan-t5-xl" if resources allow
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Fallback pipeline for Flan-T5
llm_pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

# Predefined responses for greeting, thanks, and farewells
greetings = ["Hello! How can I help you today?", "Hi! What would you like to know about Golden Retrievers?", "Hey! Ask me anything about Golden Retrievers!"]
farewells = ["Goodbye! Have a great day!", "Bye! Hope I answered your questions!", "See you later!"]
thanks_responses = ["You're welcome!", "Glad I could help!", "No problem!", "Happy to assist!"]

# Generic chit-chat responses
chit_chat_responses = {
    "how are you": "I'm just a chatbot, but I'm doing great! How about you?",
    "what's your name": "I'm the Golden Retriever chatbot! Ask me anything about Golden Retrievers.",
    "what is your favorite color": "I don't have a favorite color, but Golden Retrievers come in beautiful shades of gold!"
}

# Vectorizer to match user queries with FAQ
vectorizer = TfidfVectorizer().fit(faq.keys())

# Function to retrieve an answer from the FAQ or chit-chat dictionary
def get_faq_response(user_input):
    user_input = user_input.lower().strip()

    # Handle predefined greetings, farewells, and thanks
    if any(greet in user_input for greet in ["hi", "hello", "hey"]):
        return random.choice(greetings)
    if any(bye in user_input for bye in ["bye", "goodbye", "see you"]):
        return random.choice(farewells)
    if "thank" in user_input:
        return random.choice(thanks_responses)

    # Handle chit-chat
    for phrase, response in chit_chat_responses.items():
        if phrase in user_input:
            return response

    # Vectorize input and match against FAQ
    query_vec = vectorizer.transform([user_input])
    faq_vec = vectorizer.transform(faq.keys())
    similarity = cosine_similarity(query_vec, faq_vec).flatten()

    # Find the most similar question
    best_match_idx = similarity.argmax()
    best_match_score = similarity[best_match_idx]

    if best_match_score >= 0.6:  # Threshold for similarity
        matched_question = list(faq.keys())[best_match_idx]
        # print(best_match_score)
        # print(matched_question)
        return faq[matched_question]

    return None  # If no match is found

# Fallback to Flan-T5 for unrelated questions
def llm_fallback(query):
    prompt = f"This is a question-answering task. Answer the following question: {query}"
    response = llm_pipeline(prompt, max_length=100, num_return_sequences=1)
    return response[0]['generated_text']

# This is a previous attempt, which can be pretty unstable/ random
# def llm_fallback(query):
#     response = llm_pipeline(query, max_length=100)
#     return response[0]['generated_text']


# Chatbot loop
def chatbot():
    print("Welcome to the Golden Retriever chatbot! Ask me anything about Golden Retrievers.")
    print("Type 'exit' to end the conversation.")

    while True:
        user_input = input("\nYou: ").strip()

        if user_input.lower() == "exit":
            print(random.choice(farewells))
            break

        # Get chatbot response
        response = get_faq_response(user_input)

        if response:
            print(f"\nBot: {response}")
        else:
            # Forward unrelated questions to Flan-T5
            llm_response = llm_fallback(user_input)
            print(f"\nBot: {llm_response} ")
            print(f"\nBot: Also, you should ask me about golden retrievers!")

# Run the chatbot
chatbot()


Welcome to the Golden Retriever chatbot! Ask me anything about Golden Retrievers.
Type 'exit' to end the conversation.

You: Hey hey

Bot: Hey! Ask me anything about Golden Retrievers!

You: How are you?

Bot: I'm just a chatbot, but I'm doing great! How about you?

You: I want to know how long my golden retriever will live for

Bot: Golden Retrievers typically live for 10 to 12 years, depending on their health and care.

You: ok, do golden retrievers shed alot?

Bot: Yes, Golden Retrievers shed heavily, especially during spring and fall. Regular brushing can help control the shedding.

You: Can I have my dog in the house?

Bot: Yes 

Bot: Also, you should ask me about golden retrievers!

You: can I have my golden retrienver as an indoor pet?

Bot: yes 

Bot: Also, you should ask me about golden retrievers!

You: exit
See you later!
