# V2.2 RAG with Strategy Template

Instead of chunking documents, treat each python code template file as one document to be store in the vector database. 

Hypothesis: By storing embeddings of code templates as a whole rather than splitting them, we avoid missing context or dependencies. 

We will set the model ```temperature = 0``` to get a more deterministic result. 

## Takeaways
1. Indexing the code template as a single document lowers the performance
    - This is test across all available search_types. The retriever might not always retrieve the correct documents that is relevant to the query (ie. Aroon rather than RSI)
    - One possible explanation for this issue could be the similarity between the code templates. Because most code templates share similar, if not overlapping structures (ie. "class Strategy(StrategyBase)....def __init__(self)"), there might be too much noise when embedding the documents. And so the retriever fails to identify relevance between the query and the documents, in this case the keyword "RSI".
2. The ParentDocumentRetriever could be an option when we want the retriever to return the full code template
    - It performs the chunking process, but retrieve information on a document (parent) level rather than chunks.
    - There are also other advanced retriever from Langchain that is worth investigating, such as the MultiVector (adding summaries and hypothetical questions on top of chunking for each document) and Self-Query (fetch documents based on metadata)

In [1]:
%pip install --user -qU langchain
%pip install --user -qU langchain_community
%pip install --user -qU langchain_chroma
%pip install --user -qU langchain-openai
%pip install --user -qU langchainhub

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass()

In [3]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import PythonLoader
from langchain_text_splitters import ( Language, RecursiveCharacterTextSplitter)

In [4]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

In [5]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

In [6]:
folder_path = "strategy-template"

documents = []
for filename in os.listdir(folder_path):
    file_path = os.path.join(folder_path, filename)
    if os.path.isfile(file_path):
        loader = PythonLoader(file_path)
        loaded_docs = loader.load()
        # Add source metadata to each document
        for doc in loaded_docs:
            doc.metadata["source"] = filename
        documents.extend(loaded_docs)

In [7]:
print(documents[0])

page_content='class Strategy(StrategyBase):
    def __init__(self):
        # strategy property
        self.subscribed_books = {}
        self.period = 45 * 60
        self.options = {}

        self.divide_quote = 0
        self.proportion = 0.2
        self.aroon_period = 14

    def on_order_state_change(self,  order):
        pass

    def trade(self, candles):
        exchange, pair, base, quote = CA.get_exchange_pair()
        
        close_price_history = [candle['close'] for candle in candles[exchange][pair]]
        high_price_history = [candle['high'] for candle in candles[exchange][pair]]
        low_price_history = [candle['low'] for candle in candles[exchange][pair]]
        open_price_history = [candle['open'] for candle in candles[exchange][pair]]

        # convert to chronological order for talib
        close_price_history.reverse()
        high_price_history.reverse()
        low_price_history.reverse()
        open_price_history.reverse()

        # convert to np.

## Search_type: "similarity"

In [8]:
vectorstore = Chroma.from_documents(documents=documents, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_type= "similarity",
                                     search_kwargs={'k': 4})

In [9]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [10]:
response_1 = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response_1["answer"])

To create a strategy where you buy Bitcoin if its RSI drops below 30 and sell if it goes above 70, you can modify the existing `Strategy` class provided in the documentation. Below is the modified `trade` method of the `Strategy` class that implements the desired strategy:

```python
import numpy as np
import talib

class Strategy(StrategyBase):
    def __init__(self):
        # strategy property
        self.subscribed_books = {}
        self.period = 30 * 60
        self.options = {}

        self.rsi_lower_band = 30
        self.rsi_upper_band = 70

    def on_order_state_change(self, order):
        pass

    # called every self.period
    def trade(self, candles):
        exchange, pair, base, quote = CA.get_exchange_pair()

        close_price_history = [candle['close'] for candle in candles[exchange][pair]]

        # convert to chronological order for talib
        close_price_history.reverse()

        # convert np.array
        close_price_history = np.array(close_price_histo

In [11]:
for document in response_1["context"]:
    print(f"Source: {document.metadata['source']}")

Source: RSI-en.py
Source: DMI-en.py
Source: MFI-en.py
Source: Double_Bottom-en.py


## Search_type = "mmr" (Maximum Marginal Relevance)

In [12]:
retriever = vectorstore.as_retriever(search_type= "mmr",
                                     search_kwargs={'k': 1})

In [13]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [14]:
response_1 = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response_1["answer"])

To create a strategy where you buy Bitcoin if its RSI drops below 30 and sell if it goes above 70, you can modify the existing `trade` method in the provided `Strategy` class. You will need to calculate the RSI (Relative Strength Index) for Bitcoin's price history and then implement the buying and selling logic based on the RSI values.

Here is the modified `trade` method for the given `Strategy` class:

```python
import numpy as np
import talib

class Strategy(StrategyBase):
    def __init__(self):
        # strategy property
        self.subscribed_books = {}
        self.period = 45 * 60
        self.options = {}

        self.divide_quote = 0
        self.proportion = 0.2
        self.rsi_period = 14

    def on_order_state_change(self, order):
        pass

    def trade(self, candles):
        exchange, pair, base, quote = CA.get_exchange_pair()

        close_price_history = [candle['close'] for candle in candles[exchange][pair]]
        close_price_history.reverse()

        cl

In [15]:
for document in response_1["context"]:
    print(f"Source: {document.metadata['source']}")

Source: Aroon-en.py


## Search_type = "similarity_score_threshold"

In [16]:
retriever = vectorstore.as_retriever(search_type= "similarity_score_threshold",
                                     search_kwargs={"score_threshold": 0.5, "k":1})

In [17]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [18]:
response_1 = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response_1["answer"])

To create a strategy where you buy Bitcoin if its RSI drops below 30 and sell if it goes above 70, you can modify the existing `trade` method in the provided `Strategy` class. Here's how you can adjust the existing code:

1. Update the `rsi_lower_band` to 30 and `rsi_upper_band` to 70 in the `__init__` method of the `Strategy` class.
2. Modify the `trade` method to reflect the new buy and sell conditions based on RSI values.

Here is the modified code snippet for the `trade` method:

```python
def trade(self, candles):
    exchange, pair, base, quote = CA.get_exchange_pair()
    
    close_price_history = [candle['close'] for candle in candles[exchange][pair]]
    
    # convert to chronological order for talib
    close_price_history.reverse()
    
    # convert np.array
    close_price_history = np.array(close_price_history)

    close_price = close_price_history[-1]

    rsi = talib.RSI(close_price_history, self.long_period)
    
    if len(close_price_history) < self.long_period + 

In [19]:
for document in response_1["context"]:
    print(f"Source: {document.metadata['source']}")

Source: RSI-en.py


## Parent Document Retriever
https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/parent_document_retriever/

Motives of Parent Document Retriever
-   We may want to have small documents such that their embeddings can most accurately reflect their meaning. Embeddings can lose meaning in long documents.
- We also want to have long enough documents so that the context is retained.
- During retrieval, the ```ParentDocumentRetriever``` first fetches the small chunks but then looks up the parents ids for those chunks and returns those larger documents.

In [20]:
# Reseting Chroma vectorstore collection
vectorstore.reset_collection()

In [21]:
from langchain.retrievers import ParentDocumentRetriever

In [22]:
from langchain.storage import InMemoryStore
from langchain_chroma import Chroma
from langchain_community.document_loaders import PythonLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import ( Language, RecursiveCharacterTextSplitter)

In [23]:
text_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, 
    chunk_size=2000, 
    chunk_overlap=500,
    length_function=len,
    add_start_index=True
)

In [24]:
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
    collection_name="full_documents", embedding_function=OpenAIEmbeddings()
)

In [25]:
# The storage layer for the parent documents
store = InMemoryStore()
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=text_splitter,
    search_type="similarity",
    search_kwargs={'k': 1}
)

In [26]:
retriever.add_documents(documents, ids=None)

In [27]:
list(store.yield_keys())

['6c9b4ed2-3544-4508-ab5e-be894a77c7b0',
 'ef9374f4-3a85-49e4-923e-8ee2b53005c6',
 '222b9e3e-ecd7-42b6-ba3a-01fd6bef4057',
 '748c4131-b4b7-4a26-b529-857e331718db',
 '136895ad-959e-4099-96e0-c53ac96cec28',
 'a70f0f70-b228-4f59-b84a-fd577aaf44f8',
 '411110ab-13f0-40a3-95cb-933c724470c9',
 '21ccd7ac-3faa-44c6-8c62-80d1f4078bfa',
 'df20fac6-4398-4a66-9b70-1f9389902992',
 '32ff63ab-da5a-41cc-8c76-8c2e52f17a48',
 '00be5981-0d0f-4ddf-b718-4a5704ccaf73',
 'f28c5c12-6c90-480b-a6ed-e75a9090d0d5',
 '0b0cd0cd-ec88-4833-a977-d46cd1b92580',
 '114a587f-09a7-4753-93c1-9bd2aa16aea2',
 'b3763dda-a024-4776-970d-4975af8b56da',
 '03957d11-b2a9-453e-868b-dfffa27a0f5c',
 '23019f7b-2878-41e7-9b2e-75410f6d3a5c',
 '4388409e-9f9b-4a7c-9f7c-20375c7ab9bf',
 'e31570b6-beba-4165-ba35-20690c2fce0c',
 '044d0ba6-57b3-4b80-b8cd-bae033a58022',
 '0663e17c-79f8-49ca-a960-06760f195972',
 '57d5fd62-5bcf-48c6-8737-f92a4e9579a4']

In [28]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [29]:
response_1 = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response_1["answer"])

To create a strategy where you buy Bitcoin if its RSI drops below 30 and sell if it goes above 70, you can modify the existing `trade` method in the provided `Strategy` class. Here's how you can adjust the existing code:

1. Update the `rsi_lower_band` to 30 and `rsi_upper_band` to 70 in the `__init__` method of the `Strategy` class.
2. Modify the logic in the `trade` method to buy when RSI drops below 30 and sell when RSI goes above 70.

Here's the modified code snippet for the `trade` method:

```python
def trade(self, candles):
    exchange, pair, base, quote = CA.get_exchange_pair()
    
    close_price_history = [candle['close'] for candle in candles[exchange][pair]]
    
    # convert to chronological order for talib
    close_price_history.reverse()
    
    # convert np.array
    close_price_history = np.array(close_price_history)

    close_price = close_price_history[-1]

    rsi = talib.RSI(close_price_history, self.long_period)
    
    if len(close_price_history) < self.lo

In [30]:
for document in response_1["context"]:
    print(f"Source: {document.metadata['source']}")

Source: RSI-en.py
