# V2.0 RAG with Strategy Template
Replace CA markdown files with Python Strategy templates for Data Ingestion

## Takeaways
1. Adjustment to using .py files from markdown files
    - For Data ingestion, we have to use ```PythonLoader``` to properly load the python source code
    - We specify Python as our language to get a tailored list of separators for text splitting

2. Chroma vectorstores.reset_Collection()
    - When rerunning the code
     ```vectorstore = Chroma.from_documents(documents=text_chunks, embedding=OpenAIEmbeddings())```
     , remember to reset the vectorstore collection to avoid duplicate documents

3. Text Splitter Arguments
    - Different ```chunk_size``` and ```chunk_overlap``` can impact the LLM output generation drastically. While the model temperature (default 0.7) could be a key factor to output variation, the two parameters help create 'relationships' between each text chunks
    - If the ```chunk_size``` is too big, the retriever may be forced to retrieve more documents than the number of documents that are actually relevant
    - If ```chunk_overlap``` is too small (or 0), retriever may fail to retrieve chunks that are actually relevant to the query.

### Issues / Possible Improvements
1. LLM model temperature
    - The default model temperature is 0.7. This means that the model is being allowed to generate more random outputs. As a result, it is worth investigating which set of temperature (as well as other key configurations such as top_p) works well for code generation tasks.

2. Retrieval results changes given the same query
    - While rerunning the same code cells, sometimes the retrieval results changes. Given that we are using vector store retriever with default similarity search, what are factors that affect the retrieval results?

In [1]:
%pip install --user -qU langchain
%pip install --user -qU langchain_community
%pip install --user -qU langchain_chroma
%pip install --user -qU langchain-openai
%pip install --user -qU langchainhub

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass()

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

To load the Python files, we will use the PythonLoader from langchain_community.document_loaders

This is different from the previous version where we load the markdown files using UnstructuredMarkdownLoader

In [3]:
from langchain import hub
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import PythonLoader
from langchain_text_splitters import ( Language, RecursiveCharacterTextSplitter)
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [4]:
folder_path = "strategy-template"

documents = []
for filename in os.listdir(folder_path):
    file_path = os.path.join(folder_path, filename)
    if os.path.isfile(file_path):
        loader = PythonLoader(file_path)
        loaded_docs = loader.load()
        # Add source metadata to each document
        for doc in loaded_docs:
            doc.metadata["source"] = filename
        documents.extend(loaded_docs)

In [5]:
print(documents[0])

page_content='class Strategy(StrategyBase):
    def __init__(self):
        # strategy property
        self.subscribed_books = {}
        self.period = 45 * 60
        self.options = {}

        self.divide_quote = 0
        self.proportion = 0.2
        self.aroon_period = 14

    def on_order_state_change(self,  order):
        pass

    def trade(self, candles):
        exchange, pair, base, quote = CA.get_exchange_pair()
        
        close_price_history = [candle['close'] for candle in candles[exchange][pair]]
        high_price_history = [candle['high'] for candle in candles[exchange][pair]]
        low_price_history = [candle['low'] for candle in candles[exchange][pair]]
        open_price_history = [candle['open'] for candle in candles[exchange][pair]]

        # convert to chronological order for talib
        close_price_history.reverse()
        high_price_history.reverse()
        low_price_history.reverse()
        open_price_history.reverse()

        # convert to np.

In addition, we add an extra parameter to the RecursiveCharacterTextSplitter to specify the language as PYTHON

After the initial test, we found out that different argument values for the text splitter will lead to different output generation. 

## Text Splitter 1

- Chunk_size = 500
- Chunk_overlap = 0

In [6]:
text_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, 
    chunk_size=500, 
    chunk_overlap=0,
    length_function=len,
    add_start_index=True
)
text_chunks = text_splitter.split_documents(documents)

In [7]:
text_chunks[0].metadata

{'source': 'Aroon-en.py', 'start_index': 0}

In [8]:
print(text_chunks[0].page_content)

class Strategy(StrategyBase):
    def __init__(self):
        # strategy property
        self.subscribed_books = {}
        self.period = 45 * 60
        self.options = {}

        self.divide_quote = 0
        self.proportion = 0.2
        self.aroon_period = 14

    def on_order_state_change(self,  order):
        pass


In [9]:
# View the chunks that has the source of 'RSI-en.py'
rsi_chunks = [chunk for chunk in text_chunks if chunk.metadata['source'] == 'RSI-en.py']
for chunk in rsi_chunks:
    print(chunk.metadata)

{'source': 'RSI-en.py', 'start_index': 0}
{'source': 'RSI-en.py', 'start_index': 458}
{'source': 'RSI-en.py', 'start_index': 846}
{'source': 'RSI-en.py', 'start_index': 1302}
{'source': 'RSI-en.py', 'start_index': 1786}
{'source': 'RSI-en.py', 'start_index': 2162}
{'source': 'RSI-en.py', 'start_index': 2626}
{'source': 'RSI-en.py', 'start_index': 2917}
{'source': 'RSI-en.py', 'start_index': 3406}
{'source': 'RSI-en.py', 'start_index': 3675}
{'source': 'RSI-en.py', 'start_index': 4176}
{'source': 'RSI-en.py', 'start_index': 4201}
{'source': 'RSI-en.py', 'start_index': 4671}
{'source': 'RSI-en.py', 'start_index': 4748}


In [10]:
vectorstore = Chroma.from_documents(documents=text_chunks, embedding=OpenAIEmbeddings())

In [11]:
retriever = vectorstore.as_retriever()

In [12]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

For prompting, we will be using Prompt 1 from v1.0_rag_with_custom_prompt.ipynb 

We will compare the LLM output to see the effects of different ingested data.

In [13]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [14]:
response = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response["answer"])

To create a strategy where you buy Bitcoin if its Relative Strength Index (RSI) drops below 30 and sell if it goes above 70, you can use the following Python code snippet based on the provided Crypto Arsenal's documentation:

```python
# Define the RSI thresholds
rsi_lower_threshold = 30
rsi_upper_threshold = 70

# Assume you have access to the following variables: curr_rsi_short, curr_rsi_long

# Initialize the signal to 0
signal = 0

# Buy signal if RSI drops below 30
if curr_rsi_short < rsi_lower_threshold:
    signal = 1

# Sell signal if RSI goes above 70
if curr_rsi_short > rsi_upper_threshold:
    signal = -1

# Print the signal
print("Signal:", signal)
```

In this code snippet:
- `rsi_lower_threshold` is set to 30, indicating the threshold below which you want to buy Bitcoin.
- `rsi_upper_threshold` is set to 70, indicating the threshold above which you want to sell Bitcoin.
- The strategy checks the current RSI (`curr_rsi_short`) against these thresholds and generates a signa

In [15]:
for document in response["context"]:
    print(f"Source: {document.metadata['source']}, Start_index: {document.metadata['start_index']}")
    #print('#############################################################')

Source: RSI-en.py
Source: RSI-en.py
Source: Pressure_Line-en.py
Source: RSI-en.py


To inspect the actual similarity score of the retriever results, we implement a function referenced from LangChain (https://python.langchain.com/v0.2/docs/how_to/add_scores_retriever/)

In [16]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import chain

@chain
def retriever(query: str) -> List[Document]:
    docs, scores = zip(*vectorstore.similarity_search_with_score(query))
    for doc, score in zip(docs, scores):
        doc.metadata["score"] = score

    return docs

In [17]:
result = retriever.invoke("I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70.")
result

(Document(metadata={'source': 'RSI-en.py', 'start_index': 2626, 'score': 0.45297005772590637}, page_content='# holding long position \n        elif available_base_amount > self.divide_quote/high_price:\n            if curr_rsi_short < curr_rsi_long and prev_rsi_short > prev_rsi_long:\n                signal = 2\n            if curr_rsi_short >  self.rsi_upper_band:\n                signal = 2'),
 Document(metadata={'source': 'RSI-en.py', 'start_index': 2162, 'score': 0.46861037611961365}, page_content='# initialize signal to be 0\n        signal = 0\n        if available_base_amount< self.divide_quote/high_price and available_base_amount > -self.divide_quote/high_price:\n            # open long position\n            if curr_rsi_short > curr_rsi_long and prev_rsi_short < prev_rsi_long:\n                signal = 1\n            # open short position\n            if curr_rsi_short < curr_rsi_long and prev_rsi_short > prev_rsi_long:\n                signal = -1'),
 Document(metadata={'sourc

## Text Splitter 2
- Chunk_size = 1000
- Chunk_overlap = 0

In [18]:
# Reseting Chroma vectorstore collection
vectorstore.reset_collection()

In [19]:
text_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, 
    chunk_size=1000, 
    chunk_overlap=0,
    length_function=len,
    add_start_index=True
)
text_chunks = text_splitter.split_documents(documents)

In [20]:
# View the chunks that has the source of 'RSI-en.py'
rsi_chunks = [chunk for chunk in text_chunks if chunk.metadata['source'] == 'RSI-en.py']
for chunk in rsi_chunks:
    print(chunk.metadata)

{'source': 'RSI-en.py', 'start_index': 0}
{'source': 'RSI-en.py', 'start_index': 1009}
{'source': 'RSI-en.py', 'start_index': 1786}
{'source': 'RSI-en.py', 'start_index': 2626}
{'source': 'RSI-en.py', 'start_index': 2917}
{'source': 'RSI-en.py', 'start_index': 3675}
{'source': 'RSI-en.py', 'start_index': 4201}


In [21]:
vectorstore = Chroma.from_documents(documents=text_chunks, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

In [22]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

In [23]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [24]:
response = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response["answer"])

To create a strategy where you buy Bitcoin when its Relative Strength Index (RSI) drops below 30 and sell when it goes above 70, you can use the following Python code snippet based on the provided Crypto Arsenal documentation:

```python
# Define RSI thresholds
rsi_lower_band = 30
rsi_upper_band = 70

# Simulated RSI values for demonstration
curr_rsi_short = 25
curr_rsi_long = 70
prev_rsi_short = 20
prev_rsi_long = 65

# Initialize signal
signal = 0

# Buy signal when RSI drops below 30
if curr_rsi_short < rsi_lower_band and prev_rsi_short > prev_rsi_long:
    signal = 1
# Sell signal when RSI goes above 70
elif curr_rsi_short > rsi_upper_band:
    signal = -1

# Output the signal
print("Signal:", signal)
```

In this code snippet:
- The RSI thresholds are set to 30 (lower band) and 70 (upper band).
- Simulated RSI values are provided for demonstration purposes.
- The code checks if the current RSI is below 30 and the previous short RSI was higher than the previous long RSI to trigger 

In [26]:
for document in response["context"]:
    print(f"Source: {document.metadata['source']}, Start_index: {document.metadata['start_index']}")
    #print(document.page_content)
    #print('#############################################################')

Source: RSI-en.py, Start_index: 2626
Source: Double_Bottom-en.py, Start_index: 4269
Source: RSI-en.py, Start_index: 2917
Source: RSI-en.py, Start_index: 1786


In [27]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import chain

@chain
def retriever(query: str) -> List[Document]:
    docs, scores = zip(*vectorstore.similarity_search_with_score(query))
    for doc, score in zip(docs, scores):
        doc.metadata["score"] = score

    return docs

In [28]:
result = retriever.invoke("I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70.")
result

(Document(metadata={'source': 'RSI-en.py', 'start_index': 2626, 'score': 0.45302292704582214}, page_content='# holding long position \n        elif available_base_amount > self.divide_quote/high_price:\n            if curr_rsi_short < curr_rsi_long and prev_rsi_short > prev_rsi_long:\n                signal = 2\n            if curr_rsi_short >  self.rsi_upper_band:\n                signal = 2'),
 Document(metadata={'source': 'Double_Bottom-en.py', 'start_index': 4269, 'score': 0.47116613388061523}, page_content='# signal = 1 then buy, signal = -1 then sell\n        signal = 0        \n        if self.double_bottom:\n            curr_close = close_price_history[-1]\n            # price goes above neckline\n            if curr_close > self.neckline:\n                # open position\n                if available_base_amount >= -0.0001 and available_base_amount <= 0.0001: \n                    signal = 1\n\n                # already holding position\n                elif available_base_amo

## Text Splitter 3
- Chunk_size = 3000 
- Chunk_overlap = 0

In [29]:
# Reseting Chroma vectorstore collection
vectorstore.reset_collection()

In [30]:
text_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, 
    chunk_size=3000, 
    chunk_overlap=0,
    length_function=len,
    add_start_index=True
)
text_chunks = text_splitter.split_documents(documents)

In [31]:
# View the chunks that has the source of 'RSI-en.py'
rsi_chunks = [chunk for chunk in text_chunks if chunk.metadata['source'] == 'RSI-en.py']
for chunk in rsi_chunks:
    print(chunk.metadata)

{'source': 'RSI-en.py', 'start_index': 0}
{'source': 'RSI-en.py', 'start_index': 2917}


In [32]:
vectorstore = Chroma.from_documents(documents=text_chunks, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

In [33]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [34]:
response = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response["answer"])

To create a strategy where you buy Bitcoin if its RSI drops below 30 and sell if it goes above 70, you can modify the existing `trade` method in the provided documentation. Here's a modified version of the `trade` method for the given `Strategy` class:

```python
import numpy as np
import talib

class Strategy(StrategyBase):
    def __init__(self):
        # Initialize your strategy properties here
        pass

    def on_order_state_change(self, order):
        pass

    def trade(self, candles):
        exchange, pair, base, quote = CA.get_exchange_pair()

        close_price_history = [candle['close'] for candle in candles[exchange][pair]]
        close_price_history.reverse()
        close_price_history = np.array(close_price_history)

        rsi = talib.RSI(close_price_history, timeperiod=14)  # Calculate RSI with a time period of 14

        if len(rsi) < 2:
            return []

        curr_rsi = rsi[-1]
        prev_rsi = rsi[-2]

        # Get available balance
        bas

In [35]:
for document in response["context"]:
    print(f"Source: {document.metadata['source']}, Start_index: {document.metadata['start_index']}")
    #print(document.page_content)
    #print('#############################################################')

Source: RSI-en.py, Start_index: 0
Source: Bollinger_Bands-en.py, Start_index: 0
Source: 跳空-en.py, Start_index: 0
Source: RSI-en.py, Start_index: 2917


In [36]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import chain

@chain
def retriever(query: str) -> List[Document]:
    docs, scores = zip(*vectorstore.similarity_search_with_score(query))
    for doc, score in zip(docs, scores):
        doc.metadata["score"] = score

    return docs

In [37]:
result = retriever.invoke("I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70.")
result

(Document(metadata={'source': 'RSI-en.py', 'start_index': 0, 'score': 0.4575255811214447}, page_content="class Strategy(StrategyBase):\n    def __init__(self):\n        # strategy property\n        self.subscribed_books = {}\n        self.period = 30 * 60\n        self.options = {}\n\n        self.last_type = 'sell'\n        self.short_period = 5\n        self.long_period = 10\n        self.divide_quote = 0\n        self.proportion = 0.2\n\n        self.rsi_upper_band = 80\n        self.rsi_lower_band = 20\n\n\n    def on_order_state_change(self,  order):\n        pass\n\n    # called every self.period\n    def trade(self, candles):\n        exchange, pair, base, quote = CA.get_exchange_pair()\n        \n        close_price_history = [candle['close'] for candle in candles[exchange][pair]]\n        high_price_history = [candle['high'] for candle in candles[exchange][pair]]\n        low_price_history = [candle['low'] for candle in candles[exchange][pair]]\n\n        # convert to chronolo

In this 3rd attempt with ```chunk_size = 3000```, the code generated by the LLM is much similar to the code template than previous iterations. It is particularly interesting to see the second RSI chunk has weaker similarity score than the Bollinger_bands and Gap chunks.

## Text Splitter 4
- Chunk_size = 3000
- Chunk_overlap = 500

In this iteration we want to see the effects of Chunk_overlap on the LLM output generation

In [38]:
# Reseting Chroma vectorstore collection
vectorstore.reset_collection()

In [39]:
text_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, 
    chunk_size=3000, 
    chunk_overlap=500,
    length_function=len,
    add_start_index=True
)
text_chunks = text_splitter.split_documents(documents)

In [40]:
# View the chunks that has the source of 'RSI-en.py'
rsi_chunks = [chunk for chunk in text_chunks if chunk.metadata['source'] == 'RSI-en.py']
for chunk in rsi_chunks:
    print(chunk.metadata)

{'source': 'RSI-en.py', 'start_index': 0}
{'source': 'RSI-en.py', 'start_index': 2626}


In [41]:
vectorstore = Chroma.from_documents(documents=text_chunks, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

In [42]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [43]:
response = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response["answer"])

To create a strategy where you buy Bitcoin if its Relative Strength Index (RSI) drops below 30 and sell if it goes above 70, you can modify the provided Python code to include the RSI conditions for buying and selling. Here's a modified version of the `trade` method in the `Strategy` class that implements this strategy:

```python
import numpy as np
import talib

class Strategy(StrategyBase):
    def __init__(self):
        # strategy property
        self.subscribed_books = {}
        self.period = 30 * 60
        self.options = {}

        self.last_type = 'sell'
        self.divide_quote = 0
        self.proportion = 0.2
        self.rsi_lower_bound = 30
        self.rsi_upper_bound = 70

    def on_order_state_change(self, order):
        pass

    # called every self.period
    def trade(self, candles):
        exchange, pair, base, quote = CA.get_exchange_pair()

        close_price_history = [candle['close'] for candle in candles[exchange][pair]]

        # convert to chronologi

In [44]:
for document in response["context"]:
    print(f"Source: {document.metadata['source']}, Start_index: {document.metadata['start_index']}")
    #print(document.page_content)
    #print('#############################################################')

Source: RSI-en.py, Start_index: 0
Source: RSI-en.py, Start_index: 2626
Source: Bollinger_Bands-en.py, Start_index: 0
Source: DMI-en.py, Start_index: 0


In [45]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import chain

@chain
def retriever(query: str) -> List[Document]:
    docs, scores = zip(*vectorstore.similarity_search_with_score(query))
    for doc, score in zip(docs, scores):
        doc.metadata["score"] = score

    return docs

In [46]:
result = retriever.invoke("I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70.")
result

(Document(metadata={'source': 'RSI-en.py', 'start_index': 0, 'score': 0.4575255811214447}, page_content="class Strategy(StrategyBase):\n    def __init__(self):\n        # strategy property\n        self.subscribed_books = {}\n        self.period = 30 * 60\n        self.options = {}\n\n        self.last_type = 'sell'\n        self.short_period = 5\n        self.long_period = 10\n        self.divide_quote = 0\n        self.proportion = 0.2\n\n        self.rsi_upper_band = 80\n        self.rsi_lower_band = 20\n\n\n    def on_order_state_change(self,  order):\n        pass\n\n    # called every self.period\n    def trade(self, candles):\n        exchange, pair, base, quote = CA.get_exchange_pair()\n        \n        close_price_history = [candle['close'] for candle in candles[exchange][pair]]\n        high_price_history = [candle['high'] for candle in candles[exchange][pair]]\n        low_price_history = [candle['low'] for candle in candles[exchange][pair]]\n\n        # convert to chronolo

From this iteration, we can see that by passing a non-zero value to ```chunk_overlap```, there is an improvement in retriever results. Both RSI chunks are being identified correctly as the most relevant context to the query.

## Text Splitter 5
- Chunk_size = 2000
- Chunk_overlap = 500

In [47]:
# Reseting Chroma vectorstore collection
vectorstore.reset_collection()

In [48]:
text_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, 
    chunk_size=2000, 
    chunk_overlap=500,
    length_function=len,
    add_start_index=True
)
text_chunks = text_splitter.split_documents(documents)

In [49]:
# View the chunks that has the source of 'RSI-en.py'
rsi_chunks = [chunk for chunk in text_chunks if chunk.metadata['source'] == 'RSI-en.py']
for chunk in rsi_chunks:
    print(chunk.metadata)

{'source': 'RSI-en.py', 'start_index': 0}
{'source': 'RSI-en.py', 'start_index': 1302}
{'source': 'RSI-en.py', 'start_index': 2626}
{'source': 'RSI-en.py', 'start_index': 4201}


In [50]:
vectorstore = Chroma.from_documents(documents=text_chunks, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

In [51]:
system_prompt = (
    """
    You are a coding assistant with expertise in Crypto Arsenal's documentation.Here is a set of Crypto Arsenal's documentation retrieved to answer the question: \n -------- \n {context} \n -------- \n Ensure any code you provide can be executed with all the required imports and variables defined. If you do not know the answer or require further clarification, just say that you do not know.
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [52]:
response = rag_chain.invoke({"input": "I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70."})
print(response["answer"])

To create a strategy where you buy Bitcoin if its Relative Strength Index (RSI) drops below 30 and sell if it goes above 70, you can modify the existing strategy code provided by incorporating the RSI thresholds of 30 and 70. Below is the modified code snippet for this specific strategy:

```python
class RsiStrategy(StrategyBase):
    def __init__(self):
        # strategy property
        self.short_period = 14
        self.long_period = 28
        self.rsi_lower_band = 30
        self.rsi_upper_band = 70

    def trade(self, candles):
        exchange, pair, base, quote = CA.get_exchange_pair()

        close_price_history = [candle['close'] for candle in candles[exchange][pair]]
        close_price_history = np.array(close_price_history)

        rsi_short = talib.RSI(close_price_history, self.short_period)
        rsi_long = talib.RSI(close_price_history, self.long_period)

        curr_rsi_short = rsi_short[-1]
        curr_rsi_long = rsi_long[-1]

        if curr_rsi_short < self

In [53]:
for document in response["context"]:
    print(f"Source: {document.metadata['source']}, Start_index: {document.metadata['start_index']}")
    #print(document.page_content)
    #print('#############################################################')

Source: RSI-en.py, Start_index: 2626
Source: RSI-en.py, Start_index: 1302
Source: RSI-en.py, Start_index: 0
Source: Bollinger_Bands-en.py, Start_index: 2045


In [54]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import chain

@chain
def retriever(query: str) -> List[Document]:
    docs, scores = zip(*vectorstore.similarity_search_with_score(query))
    for doc, score in zip(docs, scores):
        doc.metadata["score"] = score

    return docs

In [55]:
result = retriever.invoke("I want to create a strategy where I buy Bitcoin if its RSI drops below 30 and sell if it goes above 70.")
result

(Document(metadata={'source': 'RSI-en.py', 'start_index': 2626, 'score': 0.4426209330558777}, page_content="# holding long position \n        elif available_base_amount > self.divide_quote/high_price:\n            if curr_rsi_short < curr_rsi_long and prev_rsi_short > prev_rsi_long:\n                signal = 2\n            if curr_rsi_short >  self.rsi_upper_band:\n                signal = 2\n\n        # holding short position\n        elif available_base_amount < -self.divide_quote/high_price:\n            if curr_rsi_short > curr_rsi_long and prev_rsi_short < prev_rsi_long:\n                signal = -2\n            if curr_rsi_short < self.rsi_lower_band:\n                signal = -2\n            \n        # Sell short\n        if signal == -1:\n            self['is_shorting'] = 'true'\n            CA.log('Sell short ' + str(base))\n            return [\n                {\n                    'exchange': exchange,\n                    'amount': -self.divide_quote/high_price*1.1,\n   

## (Tool) Check duplicate documents in vectorstore

In [None]:
# Assuming `vectorstore` is your Chroma vectorstore instance
all_documents = vectorstore.get(include=["metadatas"])

print(all_documents['metadatas'])

In [None]:
meta = all_documents['metadatas']
check = [frozenset(d.items()) for d in meta]

from collections import Counter
counter = Counter(check)

duplicates = [dict(fset) for fset, count in counter.items() if count > 1]

print("Duplicate dictionaries:", duplicates)