<a href="https://colab.research.google.com/github/denisabrantesredis/denisd-GenAI-Workshop/blob/main/Labs/02-LLM/02_Redis_Gemini_LLM.ipynb" target="_newt">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<div style="display:flex;width=100%;">
<img src="https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120" alt="Redis" width="90"/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="https://www.gstatic.com/devrel-devsite/prod/v0e0f589edd85502a40d78d7d0825db8ea5ef3b99ab4070381ee86977c9168730/cloud/images/cloud-logo.svg" alt="Google Cloud" width="140"/>
</div>

# LLM Memory with Redis & Google Cloud

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/redis_gcp.png?raw=true" alt="Redis and Google Cloud" align="center"/>

[Try a similar app with an always-on demo](https://antonum-redis-vss-streamlit-streamlit-app-p4z5th.streamlit.app/)

In this notebook, we will build a RAG use case using data from a web page. Redis will be used as the Vector Database and Cache for our use case, while Google Gemini is the LLM that will help generate the answers to the user's questions.

## Installing the Pre-Reqs

In [None]:
# !pip install -q sentence-transformers==3.0.1 >> /.tmp
!pip install -q redis==5.0.8 >> /.tmp
!pip install -q langchain==0.2.16 >> /.tmp
!pip install -q langchain-core==0.3.6 >> /.tmp
!pip install -q langchain-redis==0.0.4 >> /.tmp
!pip install -q langchain-google-genai==2.0.0 >> /.tmp

In [None]:
# # patch an issue with RedisVL
# !wget https://github.com/denisabrantesredis/denisd-GenAI-Workshop/raw/refs/heads/main/_assets/files/semantic.py
# !rm /usr/local/lib/python3.10/dist-packages/redisvl/extensions/llmcache/semantic.py
# !cp semantic.py /usr/local/lib/python3.10/dist-packages/redisvl/extensions/llmcache/

## Part 1 - Load and Configure the Model

In this lab, we will use the [Unstructured](https://docs.unstructured.io/open-source/core-functionality/partitioning#partition-html) API to load data from a web page, parse it and break into chunks.

A web page can have multiple different types of content; this class will help us identify the type of content being collected from the page, so we can make sure we're only getting the text from the page.

In [None]:
import os
from google.colab import userdata

from langchain_redis import RedisChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI

from IPython.display import Markdown

### Step 1: Setting Up Connection String

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_secrets.png?raw=true" alt="Callout - Use Google Colab secrets instead"/>

In [None]:
if "GOOGLE_API_KEY" not in os.environ:
    if userdata.get('GOOGLE_API_KEY'):
      os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
    else:
      os.environ["GOOGLE_API_KEY"] = "<insert API key here>"

if userdata.get('REDIS_HOST'):
  REDIS_HOST = userdata.get('REDIS_HOST')
else:
  REDIS_HOST="127.0.0.1"

if userdata.get('REDIS_PORT'):
  REDIS_PORT = userdata.get('REDIS_PORT')
else:
  REDIS_PORT=12000

if userdata.get('REDIS_PASSWORD'):
  REDIS_PASSWORD = userdata.get('REDIS_PASSWORD')
else:
  REDIS_PASSWORD="password"

REDIS_URL = f"redis://default:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

### Step 2 - Load the Model

In this step, we prepare a list of JSON objects contanining the data from our chunks. Here is where we can map the metadata fields we want to store in Redis to be used in hybrid searches. Notice how we are not generating the vectors manually as part of the step; this is fully automated by the Langchain package, based on the embedding model we've selected.

In [None]:
user_session = "default"

In [None]:
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0.5,
    top_p=0.95,
    top_k=64,
    max_output_tokens=8192
    )

In [None]:
# Create a conversational chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])
chain = prompt | llm

In [None]:
# Function to get or create a RedisChatMessageHistory instance
def get_redis_history(session_id: str):
    return RedisChatMessageHistory(session_id, redis_url=REDIS_URL, ttl=3600)

In [None]:
# Create a runnable with message history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_redis_history,
    input_messages_key="input",
    history_messages_key="history"
)

In [None]:
def generate_response(input_text, user_session):
    response = chain_with_history.invoke({"input": input_text}, config={"configurable": {"session_id": user_session}})
    return response.content

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_save.png?raw=true" alt="Callout - Saving to Redis"/>

In [None]:
chat_input = "What is the capital of Canada?"

List the IDs of all documents saved to Redis:

In [None]:
response = generate_response(chat_input, user_session)

In [None]:
Markdown(response)

&nbsp;

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_insight.png?raw=true" alt="Callout - Check Redis Insight"/>

Open Redis Insight and confirm that all documents were generated. Notice how each document contains the vector that was automatically generated by the Langchain package. You may also notice that the vectors are not presented as a list; this is due to the fact that they are stored as binary strings, which is more efficient for retrieval and storage.

You can also go to the **Workbench** and get a list of indexes using the command:

```
FT._list
```

Finally, you can get more details about the index that was automatically generated by Langchain with this command:
```
FT.info "idx:web"
```
&nbsp;

&nbsp;

### Running a Semantic Search

The Langchain integration greatly simplifies the process of running a semantic search. A single function call is enough. Notice how we do not need to generate a vector for our question manually; this is handled automatically by the function, based on the embedding model we've selected before.

For more details on the different ways to run vector searches, check the [Langchain documentation page](https://python.langchain.com/docs/integrations/vectorstores/redis/#query-vector-store).

&nbsp;


In [None]:
chat_input = "How come it's not Toronto?"

### Visualizing the search results with the score for each result

In [None]:
response = generate_response(chat_input, user_session)

In [None]:
Markdown(response)

In [None]:
chat_input = "What other cities would be good candidates?"

In [None]:
response = generate_response(chat_input, user_session)

In [None]:
Markdown(response)

&nbsp;

## Part 5: Using a LLM

In this lab, we will use the Gemini Pro 1.5 model from Google to generate a response to the user, based on the documents retrieved from Redis. The GCP API Key that we set before is required to allow access to the model.

### Step 1: Load the Model

In [None]:
get_redis_history(user_session).messages

In [None]:
import redis

In [None]:
r = redis.Redis(host=redis_host, port=redis_port, username=redis_user, password=redis_pw, decode_responses=True)

if r.ping():
    print("Connection successful!")
else:
    print("Connection issue!")

&nbsp;


&nbsp;



# Congrats, this is the end of the lab!!