### KX Advanced RAG with CUDF, NeMo and NIM

We have developed an KX Advanced RAG using NVIDIA's RAPIDS, NeMo and NIM Framework.  
This RAG demostration uses KX software integrated with latest NVIDIA GPU's and tech stack to answer some of the most industry curated queries with utmost speed and accuracy also providing references.

Process:
1. The data is from over 2.5 million documents which can be downloaded in the JSON format from here - https://www.kaggle.com/datasets/Cornell-University/arxiv
2. Data processing has been done using RAPIDS cuDF.
3. Embedding have been generated using NeMo Retriever and ingested it into our KDB.AI Vector DB.
4. Additionally we have used NIM (NVIDIA Inferencing Microservices) LLM's to for the RAG Q&A.

Prerequisites:  
1. KDB.AI - https://code.kx.com/kdbai/latest
2. NVIDIA GPU with Drivers and toolkit - https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
3. NVIDIA RAPIDS - https://docs.rapids.ai/install/
4. NVIDIA NIM LLM Setup - https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html
5. NVIDIA NeMo Retriever Setup - https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/getting-started.html

Models:
1. Emebedding - nv-embedqa-e5-v5 (pre fine-tuned by NVIDIA)
2. LLM - llama-3.1-70b-instruct (inferenced and optimized by NVIDIA NIM)

A detailed architecture and setup instruction document is provided separately as well for reference.

### Loading modules

In case KDB+ along with qflat is being used in place of KDB.AI, please make sure the files qflat.so and qflat.q are present in the pwd.  
Below there is a reference to ollama and sentence_tranformers in case anyone wants to use this RAG with open source embedding and LLM models

In [None]:
import json
import os
import shutil
import time
import pandas as pd
import numpy as np
import pyarrow.parquet as pq

import cudf
import kdbai_client as kdbai

from openai import OpenAI
from openai import AsyncOpenAI

import asyncio
import nest_asyncio
nest_asyncio.apply()

from typing import List
from pprint import pprint
from tqdm import tqdm
from datetime import timedelta

### Defining Arguments

The input dataset from arxiv needs to be present in the data folder in the current directory.  
Emebdding and LLM models along with the preferred batch size and other details are preset here.  

In [2]:
DATASET = 'data/arxiv-metadata-oai-snapshot.json'
LIMIT = None
EMBEDDING = 'nvidia/nv-embedqa-e5-v5'
DEVICE = 'cuda'
BATCH_SIZE = 8191
LLM = 'meta/llama-3.1-70b-instruct'
TEMPERATURE = 0

### Resetting the knowledge database if needed

RESET parameter is used to recreate the data and embeddings from the input json data and store it in the database, when set to True.  
If the data and embeddings are already created and stored, this parameter can be set to False to avoid recreating the data which takes time.

In [3]:
RESET = True

### Load the ArXiv research paper dataset from JSON into a Pandas dataframe

Here we have used RAPIDS cuDF to create a dataframe for the input data which enhances the processing speed by three times.  
We have encoded the string columns to make it more memory efficient.

In [17]:
%%time

if RESET:
    COLS = ['id', 'update_date', 'submitter', 'authors', 'title', 'categories', 'abstract']
    df = cudf.read_json(DATASET, orient='records', lines=True).sort_values('update_date', ascending=False)[COLS]
    df['update_date'] = cudf.to_datetime(df['update_date'])
    texts = []
    for record in df.to_dict('records'):
        texts.append("Title: %s\nDate: %s\nAuthors: %s\nAbstract: %s\n" % (record['title'], record['update_date'], record['authors'], record['abstract']))
    df['text'] = texts
    df = df.iloc[:LIMIT].reset_index(drop=True)
    df = df.to_pandas()
    for c in df.columns:
        if type(df[c].iloc[0]) is str:
            df[c] = df[c].str.encode('utf-8')

CPU times: user 37 s, sys: 10.7 s, total: 47.7 s
Wall time: 47.5 s


In [18]:
# Create the 'year' column from 'update_date'
df['year'] = df['update_date'].dt.year

# Reorder the columns to place 'year' as the 3rd column
cols = list(df.columns)
cols.insert(2, cols.pop(cols.index('year')))
df = df[cols]

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2530760 entries, 0 to 2530759
Data columns (total 9 columns):
 #   Column       Dtype         
---  ------       -----         
 0   id           object        
 1   update_date  datetime64[ns]
 2   year         int32         
 3   submitter    object        
 4   authors      object        
 5   title        object        
 6   categories   object        
 7   abstract     object        
 8   text         object        
dtypes: datetime64[ns](1), int32(1), object(7)
memory usage: 164.1+ MB
None


In [19]:
df.head()

Unnamed: 0,id,update_date,year,submitter,authors,title,categories,abstract,text
0,b'1209.2995',2024-07-26,2024,b'Leonid Positselski',b'Leonid Positselski',b'Contraherent cosheaves on schemes',b'math.CT math.AG',b' Contraherent cosheaves are globalizations ...,b'Title: Contraherent cosheaves on schemes\nDa...
1,b'1301.6261',2024-07-26,2024,b'Ruslan Maksimau',b'Ruslan Maksimau',"b'Canonical basis, KLR-algebras and parity she...",b'math.RT',b' We give a construction of a basis of the p...,"b'Title: Canonical basis, KLR-algebras and par..."
2,b'1307.6013',2024-07-26,2024,b'Ruslan Maksimau',b'Ruslan Maksimau',b'Quiver Schur algebras and Koszul duality',b'math.RT',b' We prove that the category of graded finit...,b'Title: Quiver Schur algebras and Koszul dual...
3,b'1511.03484',2024-07-26,2024,b'Oleg Andreev',b'Oleg Andreev',b'Some Aspects of Three-Quark Potentials',b'hep-ph hep-lat hep-th nucl-th',b' We analytically evaluate the expectation v...,b'Title: Some Aspects of Three-Quark Potential...
4,b'1607.01080',2024-07-26,2024,b'Robert Szczelina',"b""Robert Szczelina and Piotr Zgliczy\\'nski""",b'Algorithm for rigorous integration of Delay ...,b'math.DS',"b"" We present an algorithm for the rigorous i...","b""Title: Algorithm for rigorous integration of..."


### Setup for NeMo Retriever Embedding

Here we create a function to call the NeMo Retriever Embedding model which is prehosted locally and calling the API reference  
The model used NV-EmbedQA-E5-v5 which is one of the most advanced pre-tuned model from NVIDIA with an average reacall of 62.07%

In [20]:
nemoClient = OpenAI(
  api_key="no-key-required",
  base_url="http://0.0.0.0:8001/v1"
)

def get_embedding(text, model="nvidia/nv-embedqa-e5-v5"):
   return nemoClient.embeddings.create(input = [text], model=model, extra_body={"input_type": "query", "truncate": "END"}).data[0].embedding

### Embed the text (title, authors, date, abstract) for each research paper

Below we generate the dense embeddings using the above model.

In [21]:
%%time

df['text']= df['text'].str.decode('utf-8')

# This step decodes the text column from thich embeddings need to be created before sending it to the Nemo Retriver

CPU times: user 1.34 s, sys: 0 ns, total: 1.34 s
Wall time: 1.33 s


#### Creating Function to generate embeddings in batches

Here we create a function to generate embeddings from the text column in batches

In [22]:
class EmbeddingProcessor:
    def __init__(self, 
                 batch_size: int = 2,
                 concurrency: int = 32,
                 api_key="no-key-required",
                 base_url: str = "http://0.0.0.0:8001/v1"):
        self.batch_size = batch_size
        self.semaphore = asyncio.Semaphore(concurrency)
        self.client = AsyncOpenAI(base_url=base_url,api_key=api_key)
        self.start_time = time.time()
        self.processed_count = 0
        self.latencies = []

    async def process_batch(self, batch: List[str]):
        
        async with self.semaphore:  # Control concurrent requests
            start_time = time.time()
            try:
                response = await self.client.embeddings.create(
                    input=batch,
                    model="nvidia/nv-embedqa-e5-v5",
                    encoding_format="float",
                    extra_body={"input_type": "query", "truncate": "END"}
                )
                end_time = time.time()
                self.latencies.append((end_time - start_time) * 1000)
                self.processed_count += len(batch)
                
                if self.processed_count % 100000 == 0:
                    elapsed = time.time() - self.start_time
                    throughput = self.processed_count / elapsed
                    print(f"Processed {self.processed_count} texts. Throughput: {throughput:.2f} texts/sec")
                
                return [emb.embedding for emb in response.data]
            except Exception as e:
                print(f"Error processing batch: {e}")
                return None

    async def process_dataframe(self, df: pd.DataFrame):
        batches = [df[i:i + self.batch_size]['text'].tolist() 
                  for i in range(0, len(df), self.batch_size)]

        tasks = [self.process_batch(batch) for batch in batches]
        results = await asyncio.gather(*tasks)
        
        elapsed_time = time.time() - self.start_time
        throughput = self.processed_count / elapsed_time
        
        print("\nPerformance Summary:")
        print(f"Total texts processed: {self.processed_count}")
        print(f"Total time: {elapsed_time:.2f} seconds")
        print(f"Average throughput: {throughput:.2f} texts/second")
        print(f"Average latency: {np.mean(self.latencies):.2f} ms")
        
        return results

async def main():
    processor = EmbeddingProcessor(
        batch_size=32,
        concurrency=64,
        api_key="no-key-required",
        base_url="http://0.0.0.0:8001/v1"
    )
    
    embeddings = await processor.process_dataframe(df)
    return embeddings

In [23]:
%%time

results = asyncio.run(main())

Processed 100000 texts. Throughput: 363.81 texts/sec
Processed 200000 texts. Throughput: 367.02 texts/sec
Processed 300000 texts. Throughput: 367.54 texts/sec
Processed 400000 texts. Throughput: 367.12 texts/sec
Processed 500000 texts. Throughput: 366.48 texts/sec
Processed 600000 texts. Throughput: 365.63 texts/sec
Processed 700000 texts. Throughput: 364.61 texts/sec
Processed 800000 texts. Throughput: 364.85 texts/sec
Processed 900000 texts. Throughput: 363.74 texts/sec
Processed 1000000 texts. Throughput: 363.78 texts/sec
Processed 1100000 texts. Throughput: 362.59 texts/sec
Processed 1200000 texts. Throughput: 362.22 texts/sec
Processed 1300000 texts. Throughput: 360.57 texts/sec
Processed 1400000 texts. Throughput: 360.29 texts/sec
Processed 1500000 texts. Throughput: 358.67 texts/sec
Processed 1600000 texts. Throughput: 358.41 texts/sec
Processed 1700000 texts. Throughput: 358.15 texts/sec
Processed 1800000 texts. Throughput: 357.89 texts/sec
Processed 1900000 texts. Throughput: 

#### Storing embeddings in the dataframe

Here we save the embeddings created above into the dataframe as a embeddings column

In [25]:
%%time

df['embeddings'] = [item for sublist in results for item in sublist]
df['embeddings'] = [np.array(x) for x in df['embeddings']]

CPU times: user 1min 9s, sys: 6.91 s, total: 1min 16s
Wall time: 1min 16s


#### Encoding back the text column in the dataframe

In [26]:
%%time

df['text'] = df['text'].str.encode('utf-8')
df.head()

CPU times: user 1.34 s, sys: 674 ms, total: 2.02 s
Wall time: 2 s


Unnamed: 0,id,update_date,year,submitter,authors,title,categories,abstract,text,embeddings
0,b'1209.2995',2024-07-26,2024,b'Leonid Positselski',b'Leonid Positselski',b'Contraherent cosheaves on schemes',b'math.CT math.AG',b' Contraherent cosheaves are globalizations ...,b'Title: Contraherent cosheaves on schemes\nDa...,"[-0.0279541015625, 0.03924560546875, 0.0154342..."
1,b'1301.6261',2024-07-26,2024,b'Ruslan Maksimau',b'Ruslan Maksimau',"b'Canonical basis, KLR-algebras and parity she...",b'math.RT',b' We give a construction of a basis of the p...,"b'Title: Canonical basis, KLR-algebras and par...","[0.009613037109375, -0.037689208984375, -0.026..."
2,b'1307.6013',2024-07-26,2024,b'Ruslan Maksimau',b'Ruslan Maksimau',b'Quiver Schur algebras and Koszul duality',b'math.RT',b' We prove that the category of graded finit...,b'Title: Quiver Schur algebras and Koszul dual...,"[-0.0078887939453125, 0.005535125732421875, 0...."
3,b'1511.03484',2024-07-26,2024,b'Oleg Andreev',b'Oleg Andreev',b'Some Aspects of Three-Quark Potentials',b'hep-ph hep-lat hep-th nucl-th',b' We analytically evaluate the expectation v...,b'Title: Some Aspects of Three-Quark Potential...,"[0.0433349609375, -0.030487060546875, 0.041198..."
4,b'1607.01080',2024-07-26,2024,b'Robert Szczelina',"b""Robert Szczelina and Piotr Zgliczy\\'nski""",b'Algorithm for rigorous integration of Delay ...,b'math.DS',"b"" We present an algorithm for the rigorous i...","b""Title: Algorithm for rigorous integration of...","[0.0171356201171875, -0.00780487060546875, -0...."


### Saving / Loading Parquet File Holding DF Embeddings

Following methods are used to save the dataframe with embeddings into parquet file created and also to load the saved parquet file reference.  
Since the dataset is huge and embedding creation using the most advanced models takes time, so this helps save time to recreate the same dataset all over again

#### Saving the DF into parquet

In [29]:
%%time

file_path = 'data/KXA_NeMo_Embeddings.parquet'
df.to_parquet(file_path)

CPU times: user 1min 34s, sys: 30.2 s, total: 2min 4s
Wall time: 2min 2s


#### Loading the Parquet File

In [7]:
%%time

import pyarrow.parquet as pq
parquet_file_name = "data/KXA_NeMo_Embeddings.parquet"
df = pq.read_table(parquet_file_name).to_pandas()

CPU times: user 1min 4s, sys: 15.1 s, total: 1min 19s
Wall time: 41 s


### Setup for KDB.AI

Defining and Loading the KDB.AI session  
Creating the schema KX_RAG_NVIDIA for generating the tables

In [None]:
session = kdbai.Session(endpoint='http://localhost:8083')
db = session.database('KX_RAG_NVIDIA')
print(db.tables)

[KDBAI table "kxRagDocs_qflat_full", KDBAI table "kxRagDocs_qhnsw_full", KDBAI table "kxRagDocs_qhnsw"]


In [None]:
# Ensure no database called KX_RAG_NVIDIA exists
if RESET:
    try:
        session.database("KX_RAG_NVIDIA").drop()
    except kdbai.KDBAIException:
        pass

In [None]:
# Create the database
if RESET:
    db = session.create_database("KX_RAG_NVIDIA")
    print(db.tables)

[]


### Defining Schema and Creating KDB.AI Table

Here we define the schema of the table and the qflat index details  
Next we check for the pre-existing table and drop in case we want to re-create it based on the RESET parameter
Finally we generate the table with the defined schema and index

In [38]:
# Defining the schema
kxRagDocs_schema = [
            {"name": "id", "type": "bytes"},
            {"name": "update_date", "type": "datetime64[ns]"},
            {"name": "year", "type": "int16"},
            {"name": "submitter", "type": "bytes"},
            {"name": "authors", "type": "bytes"},
            {"name": "title", "type": "bytes"},
            {"name": "categories", "type": "bytes"},
            {"name": "abstract", "type": "bytes"},
            {"name": "text", "type": "str"},
            {"name": "embeddings", "type": "float32s"},
         ]

# Define the qflat index
qflat_ind = [
    {
        'name': 'qflat_index',
        'type': 'qFlat',
        'column': 'embeddings',
        'params': {'dims': 1024, 'metric': 'L2'},
    }
]

In [46]:
# First ensure the table does not already exist
if RESET:
    try:
        db.table("kxRagDocs_qflat_full").drop()
    except kdbai.KDBAIException:
        pass

In [47]:
# Create the table and print the schema
if RESET:
    table_flat = db.create_table(table="kxRagDocs_qflat_full", schema=kxRagDocs_schema, indexes=qflat_ind)
    print(table_qflat.query())

Empty DataFrame
Columns: [id, update_date, year, submitter, authors, title, categories, abstract, text, embeddings]
Index: []


### Loading the data into KDB.AI table

In [48]:
%%time

if RESET:
    batch_size = 1_000_000
    for i in tqdm(range(0, len(df), batch_size)):
        batch = df[i:i + batch_size]
        table_qflat.insert(batch)

100%|██████████| 3/3 [02:37<00:00, 52.45s/it]

CPU times: user 2min 2s, sys: 36.2 s, total: 2min 38s
Wall time: 2min 37s





### Calling the KDB.AI table into the session

In [50]:
table = db.table("kxRagDocs_qflat_full")

### Creating Embeddings and querying the table

First we create embeddings from a text we pass and then search the table with nearest neighbours set to 10

In [51]:
emb = get_embedding("I am running a Hedge Fund, tell me how to generate alpha with Streaming Analytics and AI applied on structured data (market tick data) and unstructured data (news feeds, twitter, 10-K reports...) ?")

In [72]:
%%time

table.search(vectors={'qflat_index': [emb]}, n=10, filter=[["like","text","*trading*"],["like","text","*finance*"],[">","year",2023]])[0]

CPU times: user 15.4 s, sys: 2.59 s, total: 18 s
Wall time: 17.9 s


Unnamed: 0,__nn_distance,id,update_date,year,submitter,authors,title,categories,abstract,text,embeddings
0,1.005645,b'2401.05337',2024-01-12,2024,b'Pierre Renucci',b'Pierre Renucci',b'Optimal Linear Signal: An Unsupervised Machi...,b'q-fin.ST cs.LG',"b"" This study presents an unsupervised machin...",Title: Optimal Linear Signal: An Unsupervised ...,"[0.040924072, 0.020080566, -0.030319214, -0.02..."
1,1.014038,b'2310.05551',2024-04-25,2024,b'Yushi Cao',"b'Zhiming Li, Junzhe Jiang, Yushi Cao, Aixin C...",b'PST: Improving Quantitative Trading via Prog...,b'cs.CE cs.AI cs.PL',b' Deep reinforcement learning (DRL) has revo...,Title: PST: Improving Quantitative Trading via...,"[0.016921997, 0.009155273, -0.040527344, 0.012..."
2,1.019814,b'2405.02302',2024-05-07,2024,b'Ravi Kashyap',b'Ravi Kashyap',b'The Democratization of Wealth Management: He...,b'cs.CR q-fin.CP q-fin.PM q-fin.RM q-fin.TR',b' We develop several innovations designed to...,Title: The Democratization of Wealth Managemen...,"[0.008140564, -0.0058517456, 0.013183594, -0.0..."
3,1.023954,b'2402.18485',2024-07-01,2024,b'Wentao Zhang',"b'Wentao Zhang, Lingxuan Zhao, Haochong Xia, S...",b'A Multimodal Foundation Agent for Financial ...,b'q-fin.TR cs.AI',"b"" Financial trading is a crucial component o...",Title: A Multimodal Foundation Agent for Finan...,"[0.029037476, -0.0009021759, -0.00024068356, 0..."
4,1.032185,b'2403.12285',2024-03-20,2024,b'Giorgos Iacovides',"b'Thanos Konstantinidis, Giorgos Iacovides, Mi...",b'FinLlama: Financial Sentiment Classification...,b'cs.CL cs.LG q-fin.ST q-fin.TR',"b"" There are multiple sources of financial ne...",Title: FinLlama: Financial Sentiment Classific...,"[0.0047569275, 0.0149383545, -0.016433716, -0...."
5,1.039879,b'2402.12659',2024-06-21,2024,b'Qianqian Xie',"b'Qianqian Xie, Weiguang Han, Zhengyu Chen, Ru...",b'FinBen: A Holistic Financial Benchmark for L...,b'cs.CL cs.AI cs.CE',"b"" LLMs have transformed NLP and shown promis...",Title: FinBen: A Holistic Financial Benchmark ...,"[0.0137786865, 0.022064209, -0.03677368, -0.01..."
6,1.047519,b'2406.02604',2024-06-06,2024,b'Bivas Dinda Ph. D.',b'Bivas Dinda',b'Gated recurrent neural network with TPE Baye...,b'cs.LG cs.AI cs.NE q-fin.CP',"b"" The recent advancement of deep learning ar...",Title: Gated recurrent neural network with TPE...,"[-0.011642456, 0.02758789, -0.0015649796, 0.02..."
7,1.049817,b'2405.18936',2024-06-05,2024,"b""Zolt\\'an Eisler""",b'Zoltan Eisler and Johannes Muhle-Karbe',b'Optimizing Broker Performance Evaluation thr...,b'q-fin.TR q-fin.MF',b' Minimizing execution costs for large order...,Title: Optimizing Broker Performance Evaluatio...,"[0.016784668, -0.01927185, -0.020446777, -0.02..."
8,1.050184,b'2407.09557',2024-07-16,2024,b'Amir Mirzaeinia',"b'Alireza Mohammadshafie, Akram Mirzaeinia, Ha...",b'Deep Reinforcement Learning Strategies in Fi...,b'q-fin.TR cs.AI cs.LG',b' Recent deep reinforcement learning (DRL) m...,Title: Deep Reinforcement Learning Strategies ...,"[-0.02670288, -0.009063721, -0.0011358261, -0...."
9,1.068121,b'2401.14199',2024-02-07,2024,b'Junwei Su',"b'Junwei Su, Shan Wu, Jinhui Li'",b'MTRGL:Effective Temporal Correlation Discern...,b'cs.LG econ.GN q-fin.EC q-fin.TR',"b' In this study, we explore the synergy of d...",Title: MTRGL:Effective Temporal Correlation Di...,"[0.02407837, -0.0057868958, 0.016357422, 0.017..."


In [115]:
print(table.query(aggs={'CountDocs': ('count', 'year')}))

   CountDocs
0    2530760


### Setup for NIM LLM

Here we create a fucntion to call the NIM LLM model which is prehosted locally and calling the API reference  
The model used is Llama 3.1 70b Instruct model which is already NVIDIA optimized

In [57]:
nimClient = OpenAI(
    api_key="no-key-required",
    base_url = "http://0.0.0.0:8000/v1"
)

### Defining function to record the time in generating responses

In [59]:
def h_time(message,start,end):
    time_diff = end - start
    td = timedelta(seconds=time_diff)
    hours = td.seconds // 3600
    minutes = (td.seconds % 3600) // 60
    seconds = time_diff % 60
    print(f'{message}: {hours}h {minutes}m {seconds:.4f}s \n')
    return

### Defining functions for calling LLM and retrieving results

In [60]:
def llm(prompt, model=LLM, temperature=TEMPERATURE):
    prompt = [dict(role='user', content=prompt)]
    tstart = time.perf_counter()
    response = nimClient.chat.completions.create(model=model, messages=prompt, temperature=temperature, max_tokens=2048, stream=False)
    h_time('LLM Response',tstart,time.perf_counter())
    return response

def retrieve(q, k=10, filters=[]):
    tstart = time.perf_counter()
    emb = get_embedding(q)
    h_time('Embedding the User Query',tstart,time.perf_counter())
    
    tstart = time.perf_counter()
    ctx = table.search(vectors={'qflat_index': [emb]}, n=10, filter=filters)[0]
    h_time('KDB.AI Similarity Search',tstart,time.perf_counter())
    return ctx

### Defining Prompt and RAG Function

In [63]:
PROMPT_TEMPLATE = '''
JSON context: %s

Question: %s

You are a quantitative trader. Answer the question using the provided JSON context only.
Justify your answer with clear examples and precise references (date, authors, title) from the JSON context only.

Answer:'''

HISTORY = ''

def prompt(q, k=10, model=LLM, temperature=TEMPERATURE, filter=[], augment=False, history=''):
    ragS = time.perf_counter()
    if augment:
        tips = llm('Answer the question: %s' % q, model=LLM, temperature=TEMPERATURE).choices[0].message.content
        q = q + '\n\n' + tips
        
    ctx = retrieve(q, k=k, filters=filter)[['title', 'update_date', 'authors', 'text']].to_json(orient='records', date_format='iso')
    
    p = PROMPT_TEMPLATE % (ctx, q)
    
    answer = llm(history + '\n\n\n' + p, model=model, temperature=temperature)

    h_time('Total RAG response time',ragS,time.perf_counter())
    print('---------------------------------------------- \n')
    print('Assistant: ' + answer.choices[0].message.content)

    return answer, ctx

### Try to answer a strategic business problem with RAG applied on the ArXiv dataset (beware of LLM hallucinations and fake references...)

Based on the above parameters we ask the LLM the below query and filter the responses based on document data only related to tradin by passing the trading keyword.
As a result we can see the curated response from the LLM also generating the references which were considered fot this response.

In [70]:
%%time

Q = 'I am running a Hedge Fund, tell me how to generate alpha with Streaming Analytics and AI applied on structured data (market tick data) and unstructured data (news feeds, twitter, 10-K reports...) ?'
K = 20

answer, ctx = prompt(Q, k=K, filter=[["like","text","*trading*"],["like","text","*finance*"],[">","year",2023]])

Embedding the User Query: 0h 0m 0.0118s 

KDB.AI Similarity Search: 0h 0m 18.0739s 

LLM Response: 0h 0m 12.1843s 

Total RAG response time: 0h 0m 30.2710s 

---------------------------------------------- 

Assistant: To generate alpha with Streaming Analytics and AI applied on structured data (market tick data) and unstructured data (news feeds, twitter, 10-K reports...), I would recommend the following approaches:

1. **Utilize a multimodal foundation agent**: Implement a multimodal foundation agent like FinAgent (Wentao Zhang et al., "A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist", 2024-07-01) that can process a diverse range of data, including numerical, textual, and visual data. This agent can accurately analyze the financial market and make informed decisions.

2. **Apply sentiment analysis on unstructured data**: Use a finance-specific Large Language Model (LLM) framework like FinLlama (Thanos Konstantinidis et al., "FinLlama: F

In [71]:
print(ctx)

[{"title":"Optimal Linear Signal: An Unsupervised Machine Learning Framework to\n  Optimize PnL with Linear Signals","update_date":"2024-01-12T00:00:00.000","authors":"Pierre Renucci","text":"Title: Optimal Linear Signal: An Unsupervised Machine Learning Framework to\n  Optimize PnL with Linear Signals\nDate: 2024-01-12 00:00:00\nAuthors: Pierre Renucci\nAbstract:   This study presents an unsupervised machine learning approach for optimizing\nProfit and Loss (PnL) in quantitative finance. Our algorithm, akin to an\nunsupervised variant of linear regression, maximizes the Sharpe Ratio of PnL\ngenerated from signals constructed linearly from exogenous variables. The\nmethodology employs a linear relationship between exogenous variables and the\ntrading signal, with the objective of maximizing the Sharpe Ratio through\nparameter optimization. Empirical application on an ETF representing U.S.\nTreasury bonds demonstrates the model's effectiveness, supported by\nregularization techniques to

### Let's break the complex question in multiple sub questions

Next we have broken the above query asked into set of simpler questions which can be asked separately to get more detailed responses

In [73]:
%%time

Q = 'I am running a Hedge Fund, tell me how to generate alpha with Streaming Analytics and AI applied on structured data (market tick data) and unstructured data (news feeds, twitter, 10-K reports...) ?'

answer = llm('''
Break the following complex question in multiple simpler questions:
%s
''' % Q, model=LLM, temperature=TEMPERATURE).choices[0].message.content

print(answer)

LLM Response: 0h 0m 6.6926s 

Here are the simpler questions that can help break down the complex question:

**Section 1: Understanding the Context**

1. What is the investment strategy of your Hedge Fund (e.g., long-short equity, global macro, event-driven)?
2. What are your current data sources for making investment decisions?
3. What are your goals for using Streaming Analytics and AI (e.g., improve predictive models, increase trading efficiency, enhance risk management)?

**Section 2: Structured Data (Market Tick Data)**

1. What type of market tick data do you have access to (e.g., stocks, options, futures, forex)?
2. How do you currently process and analyze market tick data (e.g., time-series analysis, statistical models)?
3. What specific insights or alpha-generating opportunities do you hope to uncover from market tick data using Streaming Analytics and AI?

**Section 3: Unstructured Data (News Feeds, Twitter, 10-K Reports)**

1. What types of unstructured data do you want to i

### Convert the list of sub questions to JSON so we can process it

The above list of sub questions are converted into a JSON list to be processed.  
Please note that choosing the right context differs from one LLM model to another to get the desired results

In [74]:
%%time

questions = llm('''
Return the questions below only as a JSON list of strings with no other context:
%s
''' % answer, model=LLM, temperature=TEMPERATURE).choices[0].message.content

LLM Response: 0h 0m 5.3478s 

CPU times: user 19.5 ms, sys: 16.5 ms, total: 36 ms
Wall time: 5.35 s


In [75]:
questions = json.loads(questions)

print(questions)

['What is the investment strategy of your Hedge Fund (e.g., long-short equity, global macro, event-driven)?', 'What are your current data sources for making investment decisions?', 'What are your goals for using Streaming Analytics and AI (e.g., improve predictive models, increase trading efficiency, enhance risk management)?', 'What type of market tick data do you have access to (e.g., stocks, options, futures, forex)?', 'How do you currently process and analyze market tick data (e.g., time-series analysis, statistical models)?', 'What specific insights or alpha-generating opportunities do you hope to uncover from market tick data using Streaming Analytics and AI?', 'What types of unstructured data do you want to incorporate into your analysis (e.g., news articles, social media posts, company filings)?', 'How do you plan to preprocess and normalize the unstructured data for analysis (e.g., text processing, sentiment analysis)?', 'What specific insights or alpha-generating opportunitie

In [76]:
questions[0]

'What is the investment strategy of your Hedge Fund (e.g., long-short equity, global macro, event-driven)?'

### Take the first sub question and augment it to improve the accuracy of the context retrieval

In [77]:
%%time

q = questions[0]

answer = llm('''
Answer the question:
%s
''' % q, model=LLM, temperature=TEMPERATURE).choices[0].message.content

print(q + '\n\n' + answer)

LLM Response: 0h 0m 7.6494s 

What is the investment strategy of your Hedge Fund (e.g., long-short equity, global macro, event-driven)?

Our hedge fund, which we'll call "AlphaQuest," employs a multi-strategy approach that combines elements of long-short equity, global macro, and event-driven investing. Our investment objective is to generate absolute returns with a focus on capital preservation, while also providing a hedge against market downturns.

**Long-Short Equity:**
We maintain a long-short equity portfolio that seeks to capitalize on mispricings in the market. Our long book focuses on high-conviction, growth-oriented stocks with strong fundamentals, while our short book targets companies with deteriorating fundamentals, excessive valuations, or other negative catalysts. We use a combination of quantitative models and fundamental research to identify attractive long and short opportunities.

**Global Macro:**
Our global macro strategy involves making directional bets on various

### Try Augmented RAG on the first sub question

Here we have tried Augmented RAG on the first question itself by passing the augment value as True

In [78]:
%%time

K = 10

answer, ctx = prompt(q, k=K, filter=[["like","text","*trading*"]], augment=True)

LLM Response: 0h 0m 8.2693s 

Embedding the User Query: 0h 0m 0.0138s 

KDB.AI Similarity Search: 0h 0m 13.0958s 

LLM Response: 0h 0m 8.2159s 

Total RAG response time: 0h 0m 29.5958s 

---------------------------------------------- 

Assistant: Based on the provided JSON context, I can infer that the investment strategy of our hedge fund is a combination of long-short equity, global macro, and event-driven investing, with a focus on generating absolute returns and minimizing risk.

For example, the paper "Combining Independent Smart Beta Strategies for Portfolio Optimization" (2018-08-13, Phil Maguire, Karl Moffett, Rebecca Maguire) discusses the idea of combining multiple smart beta strategies to generate beta-neutral returns. This approach is similar to our hedge fund's long-short equity strategy, which aims to identify mispricings in the market and generate absolute returns.

Another example is the paper "Intelligent Systematic Investment Agent: an ensemble of deep learning and ev

In [79]:
pprint(ctx)

('[{"title":"A Deep Deterministic Policy Gradient-based Strategy for Stocks '
 'Portfolio\\n  '
 'Management","update_date":"2021-03-23T00:00:00.000","authors":"Huanming '
 'Zhang, Zhengyong Jiang, Jionglong Su","text":"Title: A Deep Deterministic '
 'Policy Gradient-based Strategy for Stocks Portfolio\\n  Management\\nDate: '
 '2021-03-23 00:00:00\\nAuthors: Huanming Zhang, Zhengyong Jiang, Jionglong '
 'Su\\nAbstract:   With the improvement of computer performance and the '
 'development of\\nGPU-accelerated technology, trading with machine learning '
 'algorithms has\\nattracted the attention of many researchers and '
 'practitioners. In this\\nresearch, we propose a novel portfolio management '
 'strategy based on the\\nframework of Deep Deterministic Policy Gradient, a '
 'policy-based reinforcement\\nlearning framework, and compare its performance '
 'to that of other trading\\nstrategies. In our framework, two Long Short-Term '
 'Memory neural networks and\\ntwo fully connected 

### Answer all sub questions with Augmented RAG and keep track of the history of the conversation

Now we try to answer all the sub-questions with Augmented RAG and generate the result together by saving the answers of each question repeated and then displaying the result in the end

In [82]:
%%time

K = 10

for q in questions:
    print('User: ' + q + '\n')
    answer, ctx = prompt(q, k=K, filter=[["like","text","*trading*"]], augment=True)
    HISTORY += 'User: %s\n\nAssistant: %s\n\n' % (q, answer.choices[0].message.content)
    print('\n\n')

User: What is the investment strategy of your Hedge Fund (e.g., long-short equity, global macro, event-driven)?

LLM Response: 0h 0m 8.2130s 

Embedding the User Query: 0h 0m 0.0120s 

KDB.AI Similarity Search: 0h 0m 13.0806s 

LLM Response: 0h 0m 8.1677s 

Total RAG response time: 0h 0m 29.4742s 

---------------------------------------------- 

Assistant: Based on the provided JSON context, I can infer that the investment strategy of our hedge fund is a combination of long-short equity, global macro, and event-driven investing, with a focus on generating absolute returns and minimizing risk.

For example, the paper "Combining Independent Smart Beta Strategies for Portfolio Optimization" (2018-08-13, Phil Maguire, Karl Moffett, Rebecca Maguire) discusses the idea of combining multiple smart beta strategies to generate beta-neutral returns. This approach is similar to our hedge fund's long-short equity strategy, which aims to identify mispricings in the market and generate absolute ret

### Executive summary

Next based on the above answers we ask the LLM to create an Executive Summary of the result obtained.

In [81]:
%%time

answer = llm(HISTORY + 'Summarize the conversation in an Executive Summary.', model=LLM, temperature=TEMPERATURE)

print(answer.choices[0].message.content)

LLM Response: 0h 0m 7.6237s 

**Executive Summary**

This conversation revolves around the integration of Streaming Analytics and AI in a hedge fund to generate alpha. The discussion covers various aspects, including the investment strategy, data sources, market tick data analysis, unstructured data analysis, AI techniques, and resource allocation.

**Key Takeaways**

1. **Investment Strategy**: The hedge fund aims to combine long-short equity, global macro, and event-driven investing strategies to generate absolute returns and minimize risk.
2. **Data Sources**: The fund will utilize market tick data, news articles, social media posts, and company filings to inform investment decisions.
3. **Market Tick Data Analysis**: The fund will apply time-series analysis, statistical models, and machine learning techniques to analyze market tick data and identify trading opportunities.
4. **Unstructured Data Analysis**: The fund will use natural language processing (NLP) and sentiment analysis t

### Thank You