<a href="https://colab.research.google.com/github/arnavj007/startup-decision-support-system/blob/main/redisvl_gemma_quant_business.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

# RAG from scratch with the Redis Vector Library


We will go through the same initial setup and data prep stage, then dive into building an **end-to-end RAG system from scratch**, including the technique:
- Dense content representation



## Environment Setup

### Pull Github Materials
For **Google Colab**, we need to first
pull the necessary dataset and materials directly from GitHub.

In [None]:
# This clones the supporting git repository into a directory named 'temp_repo'.
!git clone https://github.com/arnavj007/redisvl-business.git temp_repo

!mv temp_repo/requirements.txt .

# This deletes the 'temp_repo' directory, cleaning up the unwanted files.
!rm -rf temp_repo

##Setting Notebook Display Properties
Setting notebook output cell display settings to wrap text, so generated text displays properly

In [None]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

### Install Python Dependencies

In [None]:
!pip install -r requirements.txt



In [None]:
import torch             # allows Tensor computation with strong GPU acceleration
from transformers import AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM
import os

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

## Huggingface login to use InferenceClient for Dense Content Representation

In [None]:
from huggingface_hub import InferenceClient, notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
llm_client = InferenceClient(model="meta-llama/Meta-Llama-3-8B-Instruct")

In [None]:
# load model
model_id="arnavj007/gemma-business-instruct-finetune"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [None]:
import warnings

warnings.filterwarnings("ignore")

### Install Redis Stack

Redis will be used to store, index, and query vector
embeddings created from document chunks. **We need to make sure we have a Redis
instance available.**

#### Localized Redis Stack
Use the shell script below to download, extract, and install [Redis Stack](https://redis.io/docs/getting-started/install-stack/) directly
from the Redis package archive.

In [None]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


gpg: cannot open '/dev/tty': No such device or address
curl: (23) Failed writing body


### Define the Redis Connection URL

By default this notebook connects to the local instance of Redis Stack. **If you have your own Redis Enterprise instance** - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [None]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

## Simplified Vector Search with RedisVL

In [None]:
from langchain_community.document_loaders import HuggingFaceDatasetLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

dataset_name = "suku9/business_news_sentiment"
page_content_column = "news"

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)

data = loader.load()

# data = data[:3000] # use 3000 cases

### Text embedding generation with RedisVL
RedisVL has built-in extensions and utilities to aid the GenAI development process.

In [None]:
from redisvl.utils.vectorize import HFTextVectorizer

hf = HFTextVectorizer("sentence-transformers/all-MiniLM-L6-v2")
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Embed each chunk content
embeddings = hf.embed_many([rec.page_content for rec in data])

# Check to make sure we've created enough embeddings, 1 per document chunk
len(embeddings) == len(data)

11:03:15 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: cuda
11:03:15 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/29 [00:00<?, ?it/s]

True

### Define a schema and create an index

Below we connect to Redis and create an index that contains a text field, tag field, and vector field.

In [None]:
from redis import Redis
from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex


index_name = "redisvl"

redis_schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
    "prefix": "record"
  },
  "fields": [
    {
        "name": "doc_id",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "content",
        "type": "text"
    },
    {
        "name": "text_embedding",
        "type": "vector",
        "attrs": {
            "dims": hf.dims,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
})

schema = {
  "index": {
    "name": index_name,
    "prefix": "record"
  },
  "fields": [
    {
        "name": "doc_id",
        "type": "tag",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "content",
        "type": "text"
    },
    {
        "name": "text_embedding",
        "type": "vector",
        "attrs": {
            "dims": hf.dims,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
  ]
}

In [None]:
# connect to redis
client = Redis.from_url(REDIS_URL)

# create an index from schema and the client
index = SearchIndex(redis_schema, client)
index.create(overwrite=True, drop=True)

In [None]:
# use the RedisVL CLI tool to list all indices
!rvl index listall

[32m11:07:25[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m11:07:25[0m [34m[RedisVL][0m [1;30mINFO[0m   1. llmcache
[32m11:07:25[0m [34m[RedisVL][0m [1;30mINFO[0m   2. redisvl


In [None]:
# get info about the index
!rvl index info -i redisvl



Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ redisvl      │ HASH           │ ['record'] │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮
│ Name           │ Attribute      │ Type   │ Field Option   │ Option Value   │ Field Option   │ Option Value   │ Field Option   │   Option Value │ Field Option    │ Option Value   │ Field Option   │   Option Value │ Field Option    │   Option Value │
├────────────────┼────────────────┼────────┼────────────────┼────────────

### Process and load dataset
Below we use the RedisVL index to simply load the list of document chunks to Redis db.

In [None]:
# load expects an iterable of dictionaries
from redisvl.redis.utils import array_to_buffer
import numpy as np

# print(data)

dics = [
    {
        'doc_id': f'{i}',
        # 'content': rec.news + f'\n sentiment: {rec.sentiment}',
        'content': rec.page_content + f'\n actual sentiment: {rec.metadata.get("actual_sentiment")}',
        # For HASH -- must convert embeddings to bytes
        # 'text_embedding': embeddings[i]
        'text_embedding': array_to_buffer(embeddings[i], dtype='float32')
    } for i, rec in enumerate(data)
]

# RedisVL handles batching automatically
keys = index.load(dics, id_field="doc_id")

### Query the database
Now we can use the RedisVL index to perform similarity search operations with Redis

In [None]:
from redisvl.query import VectorQuery

query = "my business is drowning"

query_embedding = hf.embed(query)

vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=3,
    return_fields=["doc_id", "content"],
    return_score=True
)

# show the raw redis query
str(vector_query)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

'*=>[KNN 3 @text_embedding $vector AS vector_distance] RETURN 3 doc_id content vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 3'

In [None]:
# paginate through results
for result in index.paginate(vector_query, page_size=1):
    print(result[0]["doc_id"], result[0]["vector_distance"], flush=True)

100 0.683950483799
765 0.692428588867
131 0.693985462189


### Sort by alternative fields

In [None]:
# Sort by doc_id field after vector search limits to topK
vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="text_embedding",
    num_results=4,
    return_fields=["doc_id"],
    return_score=True
)

# Decompose vector_query into the core query and the params
query = vector_query.query
params = vector_query.params

# Pass query and params direct to index.search()
result = index.search(
    query.sort_by("doc_id", asc=True),
    params
)

print(result.docs)

[doc.__dict__ for doc in result.docs]

[Document {'id': 'record:100', 'payload': None, 'vector_distance': '0.683950483799', 'doc_id': '100'}, Document {'id': 'record:131', 'payload': None, 'vector_distance': '0.693985462189', 'doc_id': '131'}, Document {'id': 'record:720', 'payload': None, 'vector_distance': '0.695949018002', 'doc_id': '720'}, Document {'id': 'record:765', 'payload': None, 'vector_distance': '0.692428588867', 'doc_id': '765'}]


[{'id': 'record:100',
  'payload': None,
  'vector_distance': '0.683950483799',
  'doc_id': '100'},
 {'id': 'record:131',
  'payload': None,
  'vector_distance': '0.693985462189',
  'doc_id': '131'},
 {'id': 'record:720',
  'payload': None,
  'vector_distance': '0.695949018002',
  'doc_id': '720'},
 {'id': 'record:765',
  'payload': None,
  'vector_distance': '0.692428588867',
  'doc_id': '765'}]

### Add filters to vector queries

### Range queries in RedisVL

## Building a RAG Pipeline from Scratch
We're going to build a complete RAG pipeline from scratch incorporating the following components:

- Use VSS to retrieve context data
- Semantic caching to improve performance
- Pre-processing the context retreived to create dense content propositions


### Setup RedisVL AsyncSearchIndex

In [None]:
from redis.asyncio import Redis
from redisvl.index import AsyncSearchIndex

client = Redis.from_url(REDIS_URL)

index = AsyncSearchIndex.from_dict(schema)
await index.set_client(client)

<redisvl.index.index.AsyncSearchIndex at 0x7f8fde5620b0>

### Retrieval Augmented Generation with Semantic Caching and Dense Propositions

In [None]:
from redisvl.extensions.llmcache import SemanticCache

llmcache = SemanticCache(
    name="llmcache",
    vectorizer=hf,
    redis_url=REDIS_URL,
    ttl=120,
    distance_threshold=0.2
)

11:07:50 redisvl.index.index INFO   Index already exists, not overwriting.


In [None]:
from functools import wraps

async def answer_question(index: AsyncSearchIndex, query: str, kpi_list: str, **kwargs):
    """Answer the user's question"""
    device = "cuda:0"


    query_vector = hf.embed(query)
    # Fetch context from Redis using vector search
    context = await retrieve_context(index, query_vector)
    prompt = promptify(query, context, kpi_list)

    encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)

    model_inputs = encodeds.to(device)

    generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
    decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

    return decoded[decoded.rfind('~'):]

async def retrieve_context(index: AsyncSearchIndex, query_vector) -> str:
    """Fetch the relevant context from Redis using vector search"""
    results = await index.query(
        VectorQuery(
            vector=query_vector,
            vector_field_name="text_embedding",
            return_fields=["content"],
            num_results=1
        )
    )

    content = "\n".join([result["content"] for result in results])

    DENSE_PROPS_PROMPT = """
    You are a helpful extractor tool. You will be presented with a long legal case
    with various facts and rulings provided.

    Decompose and summarize the raw content into clear and simple propositions,
    ensuring they are interpretable out of context. Consider the following rules:
    1. Split compound sentences into simpler dense phrases that retain existing
    meaning.
    2. Simplify technical jargon or wording if possible while retaining existing
    meaning.
    2. For any named entity that is accompanied by additional descriptive information,
    separate this information into its own distinct proposition.
    3. Decontextualize the proposition by adding necessary modifier to nouns or
    entire sentences and replacing pronouns (e.g., "it", "he", "she", "they", "this", "that")
    with the full name of the entities they refer to.
    4. Provide these propositions as a list of points

    Content to convert to simple propositions:
    {content}

    Provide the propositions HERE as a list of points:

    """

    props = llm_client.text_generation(
        prompt=DENSE_PROPS_PROMPT.format(content=content),
        max_new_tokens=500,
    ),

    return props


def promptify(query: str, context: str, kpi_list: str) -> str:
    return f'''
    <start_of_turn>user
    You are a helpful business advisor assistant that has access
    to business news along with the sentiment of the news, aiding you in providing
    advice to entreprenuers with different situations. Provide exactly 10 steps the
    user (entrepreneur) must take to grow their business, with a key focus on mentioned
    kpis in query in different bullet points and a brief explanation about each point
    for relevant kpis. Response should include 10 different points, and where relevant
    PLEASE MENTION HOW A MENTIONED KPI IS IMPACTED BY SAID ACTION. Also try to keep
    the actions tailored to specific kpis rather than general statements about improving
    all kpis. A crucial point is, that if the user provides specific kpis, the actions
    advised must be tailored to enhance those specific kpis, along with mentioning
    how the action would improve said kpi and by how much, over how long. it is
    imperative that you tackle the user concerns regarding these kpis. Further actions
    can be tailored to the user industry, stage of business etc. Conclude the recommendations
    properly, do not stop generation midway.You MUST generate 10 steps, no more,
    no less Keep each point at most 2-3 lines.

    Use the provided context below derived from relevant business news articles
    and the general sentiment to answer the user's question. If you can't answer the user's
    question, based on the context; do not guess.

    query:

    {query}

    kpis to optimize:

    {kpi_list}

    Helpful context propostions:

    {context}

    <end_of_turn>\n<start_of_turn>model~
    '''

### Let's test it out...

In [None]:
query = '''I'm the co-founder of a 3-year-old e-commerce startup specializing in sustainable home goods. We're currently experiencing challenges with our key performance indicators:

    Our customer acquisition cost (CAC) has increased by 30% in the last quarter
    Our average order value (AOV) has remained stagnant for the past 6 months
    Our website conversion rate is hovering around 1.5%, below the industry average
    Our customer lifetime value (CLV) to CAC ratio has dropped to 2:1
    Our inventory turnover rate has decreased, leading to higher storage costs

Our target market is environmentally conscious millennials and Gen Z consumers in urban areas. We've seen some success with social media marketing, but our email campaigns have underperformed. What are the top 10 actions we can take to improve these metrics and strengthen our startup's position in the competitive e-commerce landscape?'''

In [None]:
# kpis selected through qliksense dashboard
kpi_list = 'customer acquisition cost, average order value, customer activation rate, customer lifetime value, inventory turnover rate'

In [None]:
import asyncio

advice = await asyncio.gather(*[
    answer_question(index, query, kpi_list)
])

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


In [None]:
import pandas as pd

output = advice[0].encode('utf-8').decode('unicode_escape')
print(output)

~
     1. **Analyze Marketing Efforts**
   - **Action:** Conduct a comprehensive audit of your current marketing strategies, focusing on email campaigns. Identify underperforming segments and areas, as well as any ineffective tactics.
   - **KPI Enhancement:** Assess the impact of social media marketing on conversions and sales within a specified timeframe. Utilize tools such as Google Analytics to track the volume and effectiveness of your emails. 

   **KPI Impact:** Improving marketing techniques to enhance conversions and sales can directly increase the CAC, AOV, and CLV.

2. **Enhance Customer Engagement**
   - **Action:** Implement regular follow-up emails based on customer purchase patterns, offering additional resources or engaging content. Use automation tools for efficiency, to maintain engagement.
   - **KPI Enhancement:** Establish metrics on email engagement (open rate, click-through rate) and respond promptly to inquiries.

   **KPI Impact:** A strong customer engagement 

In [None]:
import re
from IPython.display import display, HTML

# General pattern to extract action steps (compute regex pattern separately)
pattern = r'\d+\. \*\*.*?\n\s*- \*\*Action:\*\*.*?\n\s*- \*\*KPI Enhancement:\*\*.*?(?:\n\s*\*\*KPI Impact:\*\*.*?\n)+'

# Extract action steps using regex
action_steps = re.findall(pattern, output, re.DOTALL)
action_steps = [step.strip() for step in action_steps]

# Generate HTML content dynamically from extracted action steps
html_content = '<style>' \
               '.action-step {font-family: Arial, sans-serif; background-color: #f9f9f9; border-left: 5px solid #4CAF50;' \
               'padding: 10px; margin: 10px 0; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);}' \
               '.action-step h2 {color: #4CAF50; font-size: 18px; margin-bottom: 5px;}' \
               '.action-step p {font-size: 14px; color: #333;}' \
               '.kpi-impact {font-style: italic; color: #555;} </style>'

# Iterate through each action step and convert it into HTML format
for i, step in enumerate(action_steps):
    title = step.split('**')[1]
    action = re.search(r'\*\*Action:\*\*(.*?)\n', step).group(1).strip()
    kpi_enhancement = re.search(r'\*\*KPI Enhancement:\*\*(.*?)\n', step).group(1).strip()
    kpi_impact = re.search(r'\*\*KPI Impact:\*\*(.*?)$', step).group(1).strip()

    step_html = f"""
    <div class="action-step">
        <h2>{i + 1}. {title}</h2>
        <p><strong>Action:</strong> {action}</p>
        <p><strong>KPI Enhancement:</strong> {kpi_enhancement}</p>
        <p class="kpi-impact"><strong>KPI Impact:</strong> {kpi_impact}</p>
    </div>
    """
    html_content += step_html

# Display formatted HTML content in Jupyter Notebook
display(HTML(html_content))


## Cleanup

Clean up the database.

In [None]:
# await index.client.flushall()