## Increasingly Complex RAG Implementations

--- 

### The Simplest Rag Setup

The ordered steps of a querying RAG system.
1. User sends query
2. System performs a similarity comparison between query and corpus
3. Post-process the user input and the fetched document(s), usually with an LLM

### Define the Corpus of "Documents"

In [None]:
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

### Preprocessing & Similarity Measure

* <ins>Preprocessing</ins>: We need to preprocess our strings into a set to allow for the comparisons. An overly simple way to do this is to lowercase the strings and split on spaces
* <ins>Similarity Measure</ins>: For this simple example we'll use Jaccard similarity (the intersection divided by the union of the "sets" of words), one of the simplest similarity measurements.

In [None]:
def jaccard_similarity(query, document):
    query = query.lower().split(" ")
    document = document.lower().split(" ")
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

### Query Comparison Match

In [None]:
def return_response(query, corpus):
    """_summary_

    Args:
        query (set): The user's query
        corpus (set): The set of documents to check 

    Returns:
        index match: 
    """
    similarities = []
    for doc in corpus:
        similarity = jaccard_similarity(user_input, doc)
        similarities.append(similarity)
    return corpus_of_documents[similarities.index(max(similarities))]