<div align="center">
    <div><img src="../assets/redis_logo.svg" style="width: 130px"> </div>
    <div style="display: inline-block; text-align: center; margin-bottom: 10px;">
        <span style="font-size: 36px;"><b>Ask From Your Structured Data </b></span>
        <br />
    </div>
    <br />
</div>



Most of the data in finance are actually structured. However, traditionally extracting the right information from tables and other structured forms of representations meant writing the right query in a query language like SQL or Redis Query language. Building chatbots and Q&A systems does involve getting the right relevant data from multiple sources and some of those sources are indeed structured data stores. Now if user asks about her/his financial information using a natural language this task comes down to the right translation from natural language - such as English - to a query language like SQL or Redis Query language. People used to train Seq2Seq models or other methods for tasks such as Text2SQL. With the advent of LLM and now code-LLMs we can ask the LLM to do this translation for us. Once we form the right translation and retrieved the right data, we can ask another LLM to create a proper response for the user or we can create a larger context and prompt (possibly from information retrieved from regular RAG scenarios) and answer user's question. In the following first we represent a case where we get the right data from Redis and represent it in a proper JSON object and leverage LangChain's Json agent to answer user's question. Then we showcase Redis copilot and propose alternate path for natural language translation to Redis Query Language.     

## Environment Setup

In [1]:
%pip install python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/usr/local/bin/python3.12 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import sys
import os
import warnings
import dotenv
# load env vars from .env file
dotenv.load_dotenv()

warnings.filterwarnings('ignore')
dir_path = os.getcwd()
parent_directory = os.path.dirname(dir_path)
sys.path.insert(0, f'{parent_directory}/helpers')
os.environ["ROOT_DIR"] = parent_directory
REDIS_URL = os.getenv("REDIS_URL")

print("========== ENVIRONMENT VARIABLES ==========")
print(f"Current Directory={dir_path}")
print(f"Parent Directory={parent_directory}")
print(f"System path={sys.path}")
print("---------------------------------")
print(f'LLM Engine: {os.getenv("LOCAL_LLM_ENGINE")}')
print(f'LOCAL_VLLM_MODEL: {os.getenv("LOCAL_VLLM_MODEL")}')
print(f'LOCAL_OLLAMA_MODEL: {os.getenv("LOCAL_OLLAMA_MODEL")}')
print(f'VLLM_URL: {os.getenv("VLLM_URL")}')
print("---------------------------------")
print(f"NLTK_DATA={os.getenv('NLTK_DATA')}")


Current Directory=/Users/rouzbeh.farahmand/PycharmProjects/boa-financial-rag-workshop/2_RAG_patterns_with_redis
Parent Directory=/Users/rouzbeh.farahmand/PycharmProjects/boa-financial-rag-workshop
System path=['/Users/rouzbeh.farahmand/PycharmProjects/boa-financial-rag-workshop/helpers', '/Users/rouzbeh.farahmand/PycharmProjects/boa-financial-rag-workshop', '/Library/Frameworks/Python.framework/Versions/3.12/lib/python312.zip', '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12', '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload', '', '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages']
---------------------------------
LLM Engine: ollama
LOCAL_VLLM_MODEL: meta-llama/Meta-Llama-3-8B-Instruct
LOCAL_OLLAMA_MODEL: llama3:8b
VLLM_URL: http://localhost:8000/v1
---------------------------------
NLTK_DATA=


### Install Python Dependencies

In [3]:
%%capture
%pip install -r $ROOT_DIR/requirements.txt

In [4]:
from utils import *
from ingestion import *
from custom_ners import *

 ✅ Loaded doc info for  111 tickers...


### SentenceTransformerEmbeddings Models Cache folder
We are using `SentenceTransformerEmbeddings` in this demo and here we specify the cache folder. If you already downloaded the models in a local file system, set this folder here, otherwise the library tries to download the models in this folder if is not locally available.

In particular, this models will be downloaded if not present in the cache folder:

models/models--sentence-transformers--all-MiniLM-L6-v2

In [5]:
#setting the local downloaded sentence transformer models folder
os.environ["TRANSFORMERS_CACHE"] = f"{parent_directory}/models"

In [6]:
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

embeddings = SentenceTransformerEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2",
                                           cache_folder=os.getenv("TRANSFORMERS_CACHE", f"{parent_directory}/models"))

### Build your Redis index 
Skip this section if you have already built your index in previous notebook.


In [7]:
from redisvl.index import SearchIndex
from redisvl.schema import IndexSchema
from redis import Redis
index_name = 'russell-3000'
prefix = 'russell'
schema = IndexSchema.from_yaml('russell_index.yaml')
client = Redis.from_url(REDIS_URL)
# create an index from schema and the client
index = SearchIndex(schema, client)
index.delete(index_name)
index.create(overwrite=True, drop=True)

In [8]:
# Skip if you have already done populated your index.
sec_data = get_sec_data()

 ✅ Loaded doc info for  111 tickers...


In [9]:
def redis_upload_json(data_dict, index, tickers=None):
    if tickers is None:
        tickers = list(data_dict.keys())
    counter = 1
    objs_to_load =[]
    for ticker in tickers:
        if len(data_dict[ticker]["metadata_file"]) > 0:
            shared_metadata = load_json_metadata(data_dict[ticker]["metadata_file"][0])
            obj_to_load = shared_metadata.copy()
            obj_to_load['doc_id'] = f"{ticker}"
            objs_to_load.append(obj_to_load)
            counter = counter + 1
    keys = index.load(objs_to_load, id_field="doc_id")
    print(
        f"✅✅✅Loaded a total of {len(keys)} documents for {len(tickers)} tickers in <{index.name}> index")

In [10]:
redis_upload_json(sec_data, index)

✅✅✅Loaded a total of 108 documents for 111 tickers in <russell-3000> index


In [11]:
from redisvl.query.filter import Num
from redisvl.query import FilterQuery
import json
from redisvl.query.filter import Tag
numeric_filter = Num("market_value") > 150000
results = index.search(FilterQuery(filter_expression=numeric_filter).query.sort_by("market_value", asc=False))
companies = []
for doc in results.docs:
    companies.append(json.loads(doc.__dict__['json']))

data = {
    "companies": companies
}    

print(data)

{'companies': [{'ticker': 'AAPL', 'company_name': 'APPLE INC', 'sector': 'Information Technology', 'asset_class': 'Equity', 'market_value': 559365151.11, 'weight': 5.16, 'notional_value': 559365151.11, 'shares': 4305127.0, 'location': 'United States', 'price': 129.93, 'exchange': 'NASDAQ', 'currency': 'USD', 'fx_rate': 1.0, 'market_currency': 'USD', 'accrual_date': '-', 'doc_id': 'AAPL'}, {'ticker': 'MSFT', 'company_name': 'MICROSOFT CORP', 'sector': 'Information Technology', 'asset_class': 'Equity', 'market_value': 513917712.42, 'weight': 4.74, 'notional_value': 513917712.42, 'shares': 2142931.0, 'location': 'United States', 'price': 239.82, 'exchange': 'NASDAQ', 'currency': 'USD', 'fx_rate': 1.0, 'market_currency': 'USD', 'accrual_date': '-', 'doc_id': 'MSFT'}, {'ticker': 'AMZN', 'company_name': 'AMAZON COM INC', 'sector': 'Consumer Discretionary', 'asset_class': 'Equity', 'market_value': 213823596.0, 'weight': 1.97, 'notional_value': 213823596.0, 'shares': 2545519.0, 'location': 'Un

In [12]:
from langchain_community.agent_toolkits import JsonToolkit, create_json_agent
from langchain_community.tools.json.tool import JsonSpec
from langchain_openai import OpenAI

In [13]:
json_llm = get_chat_llm( 
        local_llm_engine=os.getenv("LOCAL_LLM_ENGINE"),
        vllm_url=os.getenv("VLLM_URL"),
        vllm_model=os.getenv("LOCAL_VLLM_MODEL"),
        ollama_model=os.getenv("LOCAL_OLLAMA_MODEL"),
        temperature=0)
openai_json =OpenAI(temperature=0)

Created ChatOllama using llama3:8b served locally


In [14]:
json_spec = JsonSpec(dict_=data, max_value_length=4000)
json_toolkit = JsonToolkit(spec=json_spec)

json_agent_executor = create_json_agent(
    llm=openai_json, toolkit=json_toolkit, verbose=True
)
json_agent_executor.to_json()

{'lc': 1,
 'type': 'not_implemented',
 'id': ['langchain', 'agents', 'agent', 'AgentExecutor'],
 'repr': 'AgentExecutor(verbose=True, agent=ZeroShotAgent(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=[\'agent_scratchpad\', \'input\'], template=\'You are an agent designed to interact with JSON.\\nYour goal is to return a final answer by interacting with the JSON.\\nYou have access to the following tools which help you learn more about the JSON you are interacting with.\\nOnly use the below tools. Only use the information returned by the below tools to construct your final answer.\\nDo not make up any information that is not contained in the JSON.\\nYour input to the tools should be in the form of `data["key"][0]` where `data` is the JSON blob you are interacting with, and the syntax used is Python. \\nYou should only use keys that you know for a fact exist. You must validate that a key exists by seeing it previously when calling `json_spec_list_keys`. \\nIf you have not seen 

In [17]:
json_agent_executor.run(
    "What is the company name with the highest the market value?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: json_spec_list_keys
Action Input: data[0m
Observation: [36;1m[1;3m['companies'][0m
Thought:[32;1m[1;3m I should look at the keys that exist in data["companies"] to see what I have access to
Action: json_spec_list_keys
Action Input: data["companies"][0m
Observation: [36;1m[1;3mValueError('Value at path `data["companies"]` is not a dict, get the value directly.')[0m
Thought:[32;1m[1;3m I should use json_spec_get_value to see what the value is at data["companies"]
Action: json_spec_get_value
Action Input: data["companies"][0m
Observation: [33;1m[1;3m[{'ticker': 'AAPL', 'company_name': 'APPLE INC', 'sector': 'Information Technology', 'asset_class': 'Equity', 'market_value': 559365151.11, 'weight': 5.16, 'notional_value': 559365151.11, 'shares': 4305127.0, 'location': 'United States', 'price': 129.93, 'exchange': 'NASDAQ', 'currency': 'USD', 'fx_rate': 1.0, 'market_currency': 'USD', 'accrual_date': '-', 'doc

'APPLE INC'