# Building and Executing a Local RAG Pipeline from Scratch

## What is RAG?

RAG stands for Retrieval Augmented Generation.

In the context of finance, the goal of RAG is to gather relevant financial data and pass it to a Large Language Model (LLM) to generate informed outputs based on that data.

* **Retrieval** - Find relevant financial information given a query, e.g., "What are the key factors influencing stock prices?" -> retrieves passages related to stock market dynamics from financial reports or textbooks.
* **Augmented** - We take the relevant financial data and augment our input (prompt) to the LLM with that specific information.
* **Generation** - Combine the first two steps and pass them to an LLM to produce insightful, finance-specific outputs.

For more background on RAG, you can refer to the paper from Facebook AI: [https://arxiv.org/abs/2005.11401](https://arxiv.org/abs/2005.11401).

## Why RAG?

The primary objective of using RAG in finance is to enhance the accuracy and relevance of the outputs generated by Large Language Models (LLMs).

1. **Preventing Misinterpretations** - LLMs are highly skilled at generating text that appears accurate, but this doesn't always mean the content is factually correct. In finance, where precision is crucial, RAG helps ensure that the information generated is based on reliable and relevant financial data, reducing the risk of errors.

2. **Working with Customized Financial Data** - Most LLMs are trained on large-scale, general data from the internet, giving them a broad understanding of language. However, this often leads to generic responses. RAG allows LLMs to generate more tailored outputs by leveraging specific financial documents, such as your company's financial reports or market analysis, ensuring that the responses are relevant and precise to the financial context.

## What We'll Build

We're going to create a Finance Chat system that allows you to "chat with finance textbooks" using the following two books:

1. [Core Course Financial Accounting](https://www.drnishikantjha.com/booksCollection/CoreCourseFinancialAccounting%20.pdf)
2. [Principles of Finance](https://assets.openstax.org/oscms-prodcms/media/documents/PrinciplesofFinance-WEB.pdf)

Here's the plan:

1. Open both PDF textbooks.
2. Format the text from these PDFs to prepare it for an embedding model.
3. Break down the text into chunks, embed them, and convert these chunks into numerical representations (embeddings) that can be stored for later use.
4. Build a retrieval system that employs vector search to find relevant text chunks based on a user query.
5. Create a prompt that integrates the retrieved text.
6. Generate an answer to the query using the retrieved passages from the textbooks with an LLM.

And the best part? It will all be done locally!

## Models We'll Use

1. **Embedding Model**: [all-mpnet-base-v2](https://www.sbert.net/docs/installation.html)
2. **LLM Model**: [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)

### Import PDF Document

In [1]:
import os
import requests

# List of URLs for the PDF files
url_list = [
    "https://www.drnishikantjha.com/booksCollection/CoreCourseFinancialAccounting%20.pdf",
    "https://assets.openstax.org/oscms-prodcms/media/documents/PrinciplesofFinance-WEB.pdf"
    # "https://baou.edu.in/assets/pdf/PGDF_102_slm.pdf"
]

# Iterate through each URL to download the file
for url in url_list:
    # Extract the original file name from the URL
    filename = url.split('/')[-1]

    # Replace any special URL characters in filename if necessary
    filename = requests.utils.unquote(filename)

    # Replace spaces with underscores in the filename
    filename = filename.replace(" ", "")

    # Check if file already exists
    if not os.path.exists(filename):
        print(f"[INFO] File '{filename}' doesn't exist, downloading...")
        
        # Send a GET request to download the PDF
        response = requests.get(url)

        # Check if the request was successful
        if response.status_code == 200:
            # Open the file and save it
            with open(filename, "wb") as file:
                file.write(response.content)
            print(f"[INFO] The file has been downloaded and saved as {filename}")
        else:
            print(f"[INFO] Failed to download the file. Status code: {response.status_code}")
    else:
        print(f"File '{filename}' already exists.")


File 'CoreCourseFinancialAccounting.pdf' already exists.
File 'PrinciplesofFinance-WEB.pdf' already exists.


Let's open that PDF and dive in!

In [2]:
import fitz #see: https://github.com/pymupdf/PyMuPDF     # requires: !pip install PyMuPDF
from tqdm.auto import tqdm                               # pip install tqdm

def text_formatter(text: str) -> str:
    """Performs minor formatting on text."""
    cleaned_text = text.replace("\n", " ").strip()
    return cleaned_text

def open_and_read_pdf(pdf_path: str, book_name: str) -> list[dict]:
    doc = fitz.open(pdf_path)
    pages_and_texts = []
    for page_number, page in tqdm(enumerate(doc), total=len(doc)):
        if(page_number >startPageNumber[book_name][0] and page_number<startPageNumber[book_name][1]):
            text = page.get_text()
            text = text_formatter(text=text)
            pages_and_texts.append({
                "book_name": book_name,
                "page_number": page_number - startPageNumber[book_name][0],
                "page_char_count": len(text),
                "page_word_count": len(text.split(" ")),
                "page_sentence_count_raw": len(text.split(". ")),
                "page_token_count": len(text) / 4,  # 1 token = ~4 characters
                "text": text
            })
    doc.close()
    return pages_and_texts

# List of PDF file paths and their names
pdf_files = [
    ("CoreCourseFinancialAccounting.pdf", "CoreCourseFinancialAccounting"),
    ("PrinciplesofFinance-WEB.pdf", "PrinciplesofFinance")
]

all_pages = []

#starting page and ending page of page that is relevent
startPageNumber = {"CoreCourseFinancialAccounting":[15,956],
                   "PrinciplesofFinance":[13,633]}
for pdf_path, book_name in pdf_files:
    all_pages.extend(open_and_read_pdf(pdf_path, book_name))

# Display the first few entries from the extracted data
all_pages[:2]


  0%|          | 0/997 [00:00<?, ?it/s]

  0%|          | 0/643 [00:00<?, ?it/s]

[{'book_name': 'CoreCourseFinancialAccounting',
  'page_number': 1,
  'page_char_count': 1817,
  'page_word_count': 316,
  'page_sentence_count_raw': 19,
  'page_token_count': 454.25,
  'text': 'Chapter 1 WHAT IS CORPORATE FINANCE? To whet your appetite . . . The primary role of the financial manager is to ensure that his company has a sufficient  supply of capital. The financial manager is at the crossroads of the real economy, with its industries  and services, and the world of finance, with its various financial markets and structures. There are two ways of looking at the financial manager’s role: t a buyer of capital who seeks to minimise its cost, i.e. the traditional view; t a seller of financial securities who tries to maximise their value. This is the view we  will develop throughout this book. It corresponds, to a greater or lesser extent, to  the situation that exists in a capital market economy, as opposed to a credit-based  economy. At the risk of oversimplifying, we will u

In [3]:
all_pages[-1]

{'book_name': 'PrinciplesofFinance',
 'page_number': 619,
 'page_char_count': 2504,
 'page_word_count': 428,
 'page_sentence_count_raw': 27,
 'page_token_count': 626.0,
 'text': 'natural hedge when a company offsets the risk that something will decrease in value by having a company activity that would increase in value at the same time option an agreement that gives the owner the right, but not the obligation, to purchase or sell an asset at a specified price on some future date option writer seller of a call or put option premium the price a buyer of an option pays for the option contract put option an option that gives the owner the right, but not the obligation, to sell the underlying asset at a specified price on some future date speculating attempting to profit by betting on the uncertain future, knowing that a risk of loss is involved spot rate the current market exchange rate strike price (exercise price) the price an option holder pays for the underlying asset when exercising t

In [4]:
import random

random.sample(all_pages, k=3)

[{'book_name': 'PrinciplesofFinance',
  'page_number': 208,
  'page_char_count': 843,
  'page_word_count': 147,
  'page_sentence_count_raw': 6,
  'page_token_count': 210.75,
  'text': 'Figure 7.9 New Dialog Box for PV Function Arguments Figure 7.10 shows the completed data input for the function arguments. Note that once again, cell addresses are used in this example. This allows the spreadsheet to still be useful if you decide to change any of the variables. As in the FV function example, you may also type values directly in the Function Arguments dialog box, but if you do this and you have to change any of your input later, you will have to reenter the new information. Remember that using cell addresses is always a preferable method of entering the function argument data. Figure 7.10 Completed Dialog Box for PV Function Arguments Again, similar to our FV function example, the Function Arguments dialog box shows values off to the 208 7 • Time Value of Money I: Single Payment Value Acc

In [5]:
import pandas as pd

df = pd.DataFrame(all_pages)
df.head()

Unnamed: 0,book_name,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,text
0,CoreCourseFinancialAccounting,1,1817,316,19,454.25,Chapter 1 WHAT IS CORPORATE FINANCE? To whet y...
1,CoreCourseFinancialAccounting,2,2163,377,13,540.75,CORPORATE FINANCE 2 Transactions that take pla...
2,CoreCourseFinancialAccounting,3,3294,567,21,823.5,Chapter 1 WHAT IS CORPORATE FINANCE? 3 Dependi...
3,CoreCourseFinancialAccounting,4,2568,461,40,642.0,CORPORATE FINANCE 4 We will develop this theme...
4,CoreCourseFinancialAccounting,5,3197,556,34,799.25,Chapter 1 WHAT IS CORPORATE FINANCE? 5 . . . a...


In [6]:
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count
count,1559.0,1559.0,1559.0,1559.0,1559.0
mean,406.77,2485.29,424.06,19.14,621.32
std,251.56,872.91,146.75,10.28,218.23
min,1.0,0.0,1.0,1.0,0.0
25%,195.5,1996.5,341.0,13.0,499.12
50%,390.0,2555.0,442.0,18.0,638.75
75%,585.0,3096.5,527.5,24.0,774.12
max,940.0,4643.0,814.0,81.0,1160.75


Why would we care about token count? 

- Token count matters because embedding models and LLMs have limits on the number of tokens they can process, so it's crucial to use tokens efficiently to ensure accurate and cost-effective results.

In [7]:
from spacy.lang.en import English

nlp = English()

# Add a sentencizer pipeline, see https://spacy.io/api/sentencizer 
nlp.add_pipe("sentencizer")

# Create document instance as an example
doc = nlp("This is a sentence. This another sentence. I like elephants.")
assert len(list(doc.sents)) == 3

# Print out our sentences split
list(doc.sents)

[This is a sentence., This another sentence., I like elephants.]

In [8]:
all_pages[600]

{'book_name': 'CoreCourseFinancialAccounting',
 'page_number': 601,
 'page_char_count': 2414,
 'page_word_count': 431,
 'page_sentence_count_raw': 12,
 'page_token_count': 603.5,
 'text': 'Chapter 32 CAPITAL STRUCTURE AND THE THEORY OF PERFECT CAPITAL MARKETS 601 SECTION 4 The summary of this chapter can be downloaded from www.vernimmen.com. Is there such a thing as an optimal capital structure, i.e. a way of splitting the ﬁnancing of  operating assets between debt and equity which would enhance the value of the operating  assets and minimise the company’s cost of capital? This is the central question that this  chapter attempts to answer. The real-world camp says yes, but without being able to prove it, or to set an ideal level of  net debt and equity. Modigliani and Miller said no in 1958, and showed how, if it were so, there would be arbi- trages that re-established the balance. For an investor with a perfectly diversiﬁed portfolio, and in a tax-free universe, there is no  optimal c

In [9]:
for item in tqdm(all_pages):
    item["sentences"] = list(nlp(item["text"]).sents)

    # Make sure all sentences are strings (the default type is a spaCy datatype)
    item["sentences"] = [str(sentence) for sentence in item["sentences"]]

    # Count the sentences
    item["page_sentence_count_spacy"] = len(item["sentences"])

  0%|          | 0/1559 [00:00<?, ?it/s]

In [10]:
random.sample(all_pages, k=1)

[{'book_name': 'CoreCourseFinancialAccounting',
  'page_number': 101,
  'page_char_count': 2271,
  'page_word_count': 396,
  'page_sentence_count_raw': 15,
  'page_token_count': 567.75,
  'text': 'Chapter 7 HOW TO COPE WITH THE MOST COMPLEX POINTS IN FINANCIAL ACCOUNTS 101 SECTION 1 US rules are very similar to the IASB’s. (c) How should ﬁnancial analysts treat them? Some analysts, especially those working for lending banks, regard brands as having nil  value from a financial standpoint. Such a view leads to deducting these items perempto- rily from shareholders’ equity. We beg to differ with this approach. These items usually add considerably to a company’s valuation, even though they  may be intangible. For instance, what value would a top fashion house or a consumer  goods company have without its brands? 4/ CONCLUSION To sum up, our approach to intangible fixed items is as follows: the higher the book value  of intangibles, the lower their market value is likely to be; and the lowe

In [11]:
import re  # Regular expression library for splitting text into sentences
min_word_count = 20 # Define the minimum word count for a sentence to be kept

def remove_short_sentences_and_spaces(data):
    # Iterate through each dictionary in the list
    for entry in data:
        # Extract the sentences list from the dictionary
        sentences = entry['sentences']

        # Filter sentences with less than 15 words
        filtered_sentences = [sentence.replace("  "," ").strip() for sentence in sentences if len(sentence.split()) >= min_word_count]

        # Update the sentences in the dictionary
        entry['filtered_sentences'] = filtered_sentences

        # Optionally, update the sentence count if needed
        entry['filtered_page_sentences_count_raw'] = len(filtered_sentences)


        
remove_short_sentences_and_spaces(all_pages)

#remove the pages that does not contains any sentances
all_pages = [item for item in all_pages if item['filtered_page_sentences_count_raw']>1]

In [14]:
#check the sentances per page should be greater then or equal to atleast 1
temp = [item for item in all_pages if item['filtered_page_sentences_count_raw']<1]
len(temp)

0

In [15]:
df = pd.DataFrame(all_pages)
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,page_sentence_count_spacy,filtered_page_sentences_count_raw
count,1479.0,1479.0,1479.0,1479.0,1479.0,1479.0,1479.0
mean,407.43,2582.72,440.42,20.01,645.68,19.76,9.23
std,250.23,746.57,124.85,9.78,186.64,8.05,3.66
min,1.0,309.0,49.0,1.0,77.25,2.0,2.0
25%,198.0,2082.5,354.5,14.0,520.62,14.0,7.0
50%,393.0,2620.0,450.0,19.0,655.0,19.0,9.0
75%,584.0,3121.0,529.5,24.5,780.25,25.0,12.0
max,937.0,4643.0,814.0,81.0,1160.75,53.0,23.0


### Chunking Text

We'll split larger text into smaller chunks, grouping them into sets of 8 sentences. This helps in filtering, fitting within the embedding model's token limit, and ensuring more focused LLM contexts.

In [17]:
# Define split size to turn groups of sentences into chunks
num_sentence_chunk_size = 10

# Create a function to split lists of texts recursively into chunk size
# e.g. [20] -> [10, 10] or [25] -> [10, 10, 5]
def split_list(input_list: list[str],
               slice_size: int=num_sentence_chunk_size) -> list[list[str]]:
    return [input_list[i:i+slice_size] for i in range(0, len(input_list), slice_size)]

In [18]:
# Loop through pages and texts and split sentences into chunks
for item in tqdm(all_pages):
    item["sentence_chunks"] = split_list(input_list=item["filtered_sentences"],
                                         slice_size=num_sentence_chunk_size)
    item["num_chunks"] = len(item["sentence_chunks"])

  0%|          | 0/1479 [00:00<?, ?it/s]

In [20]:
random.sample(all_pages, k=1)

[{'book_name': 'PrinciplesofFinance',
  'page_number': 515,
  'page_char_count': 4227,
  'page_word_count': 719,
  'page_sentence_count_raw': 33,
  'page_token_count': 1056.75,
  'text': 'The range of the equity cost of capital estimates for each of the firms is significant. Consider, for example, Goodyear Tire and Rubber. According to MarketWatch, the beta for the company is 1.24, resulting in an estimated cost of equity capital between 9.20% and 12.92%. The beta provided by Yahoo! Finance is much higher, at 2.26. Using this higher beta results in an estimated equity cost of capital for Goodyear Tire and Rubber between 14.30% and 21.08%. This leaves the financial managers of Goodyear Tire and Rubber with an estimate of the equity cost of capital between 9.20% and 21.08%, using a range of reasonable assumptions. What is a financial manager to do when one estimate is more than twice as large as another estimate? A financial manager who believes the equity cost of capital is close to 9% 

In [21]:
df = pd.DataFrame(all_pages)
df.describe().round(2)

Unnamed: 0,page_number,page_char_count,page_word_count,page_sentence_count_raw,page_token_count,page_sentence_count_spacy,filtered_page_sentences_count_raw,num_chunks
count,1479.0,1479.0,1479.0,1479.0,1479.0,1479.0,1479.0,1479.0
mean,407.43,2582.72,440.42,20.01,645.68,19.76,9.23,1.37
std,250.23,746.57,124.85,9.78,186.64,8.05,3.66,0.48
min,1.0,309.0,49.0,1.0,77.25,2.0,2.0,1.0
25%,198.0,2082.5,354.5,14.0,520.62,14.0,7.0,1.0
50%,393.0,2620.0,450.0,19.0,655.0,19.0,9.0,1.0
75%,584.0,3121.0,529.5,24.5,780.25,25.0,12.0,2.0
max,937.0,4643.0,814.0,81.0,1160.75,53.0,23.0,3.0


### Separating Each Chunk into Individual Items

We’ll embed each chunk of sentences into its own numerical representation, providing a fine level of granularity. This approach allows us to pinpoint and analyze the specific text sample used in our model, ensuring more precise and targeted insights.

In [22]:
import re

# Split each chunk into its own item
pages_and_chunks = []
for item in tqdm(all_pages): 
    for sentence_chunk in item["sentence_chunks"]: 
        chunk_dict = {}
        chunk_dict["book_name"] = item["book_name"]
        chunk_dict["page_number"] = item["page_number"]
        # Join the sentences together into a paragraph-like structure, aka join the list of sentences into one paragraph
        joined_sentence_chunk = "".join(sentence_chunk).replace("  ", " ").strip()
        joined_sentence_chunk = re.sub(r'\.([A-Z])', r'. \1', joined_sentence_chunk) # ".A" => ". A" (will work for any captial letter)

        chunk_dict["sentence_chunk"] = joined_sentence_chunk

        # Get some stats on our chunks
        chunk_dict["chunk_char_count"] = len(joined_sentence_chunk)
        chunk_dict["chunk_word_count"] = len([word for word in joined_sentence_chunk.split(" ")])
        chunk_dict["chunk_token_count"] = len(joined_sentence_chunk) / 4 # 1 token = ~4 chars

        pages_and_chunks.append(chunk_dict) 

len(pages_and_chunks)

  0%|          | 0/1479 [00:00<?, ?it/s]

2029

In [23]:
random.sample(pages_and_chunks, k=1)

[{'book_name': 'CoreCourseFinancialAccounting',
  'page_number': 292,
  'sentence_chunk': 'INVESTMENT DECISION RULES 292 SECTION 2 As a result, the loan in the first case costs more than a loan at 10% with interest due annually. If the interest rate is 10%, with interest payable every six months, then the interest rate is 5% for six months. We then have to calculate an effective annual rate (and not for six months), which is our point of reference and our constant concern. Two rates referring to two different maturities are said to be equivalent if the future value of the same amount at the same date is the same with the two rates. In our example, the lender receives €5 on 1 July which, compounded over six months, becomes 5 + (10% × 5) / 2 = €5.25 on the following 1 January, the date on which he receives the second €5 interest payment. This is the real cost of the loan, since the return for the lender is equal to the cost for the borrower. (1 ) (1 / ) + + t r n a n = Formula for conver

In [24]:
df = pd.DataFrame(pages_and_chunks)
df.describe().round(2)

Unnamed: 0,page_number,chunk_char_count,chunk_word_count,chunk_token_count
count,2029.0,2029.0,2029.0,2029.0
mean,415.99,1312.59,220.86,328.15
std,252.78,640.86,108.01,160.22
min,1.0,100.0,20.0,25.0
25%,205.0,761.0,126.0,190.25
50%,404.0,1465.0,246.0,366.25
75%,596.0,1787.0,301.0,446.75
max,937.0,4043.0,582.0,1010.75


In [25]:
df.head()

Unnamed: 0,book_name,page_number,sentence_chunk,chunk_char_count,chunk_word_count,chunk_token_count
0,CoreCourseFinancialAccounting,1,The financial manager is at the crossroads of ...,1354,222,338.5
1,CoreCourseFinancialAccounting,2,CORPORATE FINANCE 2 Transactions that take pla...,1978,334,494.5
2,CoreCourseFinancialAccounting,3,"3 Depending on your point of view, i.e. tradit...",1891,311,472.75
3,CoreCourseFinancialAccounting,3,"For instance, choosing between a capital incre...",1034,175,258.5
4,CoreCourseFinancialAccounting,4,CORPORATE FINANCE 4 We will develop this theme...,1575,263,393.75


In [26]:
random.sample(pages_and_chunks, k=1)

[{'book_name': 'CoreCourseFinancialAccounting',
  'page_number': 759,
  'sentence_chunk': '2/ CONTROLLING SHAREHOLDER CHANGES (a) Right of approval The right of approval, written into a company’s articles of association, enables a company to avoid “undesirable” shareholders. The right of approval governs the relationship between partners or shareholders of the company; be careful not to confuse it with the type of approval required to purchase certain companies (see below). Technically, the right of approval clause requires all partners to obtain the approval of the company prior to selling any of their shares.',
  'chunk_char_count': 527,
  'chunk_word_count': 81,
  'chunk_token_count': 131.75}]

### Creating Embeddings for Our Text Chunks

Embeddings are a versatile and powerful tool that bridge the gap between human-readable text and machine-readable numbers.

Our goal:
- Convert our text chunks into numerical representations known as embeddings.

These embeddings are valuable because they are *learned* representations, capturing the meaning and context of the text in a format that machines can effectively process.

```
"be": 0,
"the": 1,
...
```

In [27]:
#Download the embedding model and load it
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer(model_name_or_path="all-mpnet-base-v2",
                                      device="cuda")



In [28]:
embedding = embedding_model.encode("How does inflation affect interest rates and savings?")
embedding

array([-2.47319043e-02, -2.87073050e-02, -2.35914029e-02, -5.78843663e-03,
       -7.71680288e-03,  7.62785450e-02, -3.83099057e-02, -1.20870508e-02,
       -9.90529805e-02,  2.15044431e-02,  3.26565979e-03,  2.26372592e-02,
        3.05338111e-02,  1.19919721e-02, -2.81815398e-02,  1.36070373e-02,
        3.47765498e-02, -5.25242798e-02,  3.04409303e-02,  8.29569064e-03,
        8.17593187e-03, -2.59059705e-02,  2.93227602e-02, -2.61969101e-02,
       -5.56816161e-02,  1.36346754e-03, -9.62208211e-03,  1.02311065e-02,
       -3.14573385e-02,  3.95188741e-02, -5.52890589e-03, -2.22937646e-03,
        2.15764865e-02,  3.03358585e-03,  1.21948688e-06, -4.10688017e-03,
       -3.71919796e-02,  1.93042308e-02, -3.00269052e-02, -3.53902802e-02,
        1.04994103e-02, -8.87881126e-03,  4.54500243e-02,  2.82767434e-02,
       -3.67120616e-02,  8.73267651e-03,  5.28054088e-02,  2.01229035e-04,
       -3.12242168e-03,  2.64733471e-02,  3.08203716e-02, -9.08908620e-03,
       -5.47862239e-02, -

In [29]:
embedding_model.to("cuda")

for item in tqdm(pages_and_chunks):
    item["embedding"] = embedding_model.encode(item["sentence_chunk"])

  0%|          | 0/2029 [00:00<?, ?it/s]

### Save embeddings to file



In [30]:
## example
pages_and_chunks[419]

{'book_name': 'CoreCourseFinancialAccounting',
 'page_number': 342,
 'sentence_chunk': 'THE RISK OF SECURITIES AND THE REQUIRED RATE OF RETURN 342 SECTION 2 3/What is the rate of return required by the shareholder equal to?5/A shareholder requires a rate of return that is twice as high on a share with a β coef- ficient that is twice as high as that of another share.11/The standard deviation of the earnings on State Bank of India shares is 40%, while for Siemens it is only 28%.12/Explain why an investor would be prepared to require a return lower than the risk-free rate for a share with a negative β.13/How do you explain the fact that rates of return required by investors may be identical for two groups of totally different activities (oil and IT services, for example) as long as they have the same β?14/An experiment was recently carried out where a child, an astrologer and a financial ana- lyst were each given €10 000 to invest for eight years.15/Mid-2013 we could see that large food p

In [31]:
# Save embeddings to file
text_chunks_and_embeddings_df = pd.DataFrame(pages_and_chunks)
embeddings_df_save_path = "finance_text_chunks_and_embeddings_df.csv"
text_chunks_and_embeddings_df.to_csv(embeddings_df_save_path, index=False)

In [32]:
# Import saved file and view 
text_chunks_and_embedding_df_load = pd.read_csv(embeddings_df_save_path)
text_chunks_and_embedding_df_load.head()

Unnamed: 0,book_name,page_number,sentence_chunk,chunk_char_count,chunk_word_count,chunk_token_count,embedding
0,CoreCourseFinancialAccounting,1,The financial manager is at the crossroads of ...,1354,222,338.5,[ 3.32317054e-02 -4.39433642e-02 -1.24765048e-...
1,CoreCourseFinancialAccounting,2,CORPORATE FINANCE 2 Transactions that take pla...,1978,334,494.5,[ 3.50867622e-02 -4.44612168e-02 6.88060233e-...
2,CoreCourseFinancialAccounting,3,"3 Depending on your point of view, i.e. tradit...",1891,311,472.75,[ 1.93496868e-02 -7.10342154e-02 -1.78794115e-...
3,CoreCourseFinancialAccounting,3,"For instance, choosing between a capital incre...",1034,175,258.5,[ 8.77247564e-03 1.61859822e-02 6.33759610e-...
4,CoreCourseFinancialAccounting,4,CORPORATE FINANCE 4 We will develop this theme...,1575,263,393.75,[-1.25726759e-02 -1.79538578e-02 -5.54556260e-...


## 2. RAG - Search and Generate

RAG's goal is to retrieve relevant passages based on a query and use them to enhance the input to an LLM, enabling it to generate an informed response grounded in those specific passages.

### Similarity Search

Embeddings can represent various data types—images, sounds, text, etc. Comparing these embeddings is called similarity or vector search.

In our case, we'll query finance textbook passages based on meaning. For example, searching "financial risk management" should return relevant passages, even if they don’t contain that exact phrase.

This differs from keyword search, which only finds passages with the specific word searched.

In [1]:
import random

import torch
import numpy as np
import pandas as pd

device = "cuda" if torch.cuda.is_available() else "cpu"

# Import texts and embedding df
text_chunks_and_embedding_df = pd.read_csv("finance_text_chunks_and_embeddings_df.csv")

# Convert embedding column back to np.array (it got converted to string when it saved to CSV)
text_chunks_and_embedding_df["embedding"] = text_chunks_and_embedding_df["embedding"].apply(lambda x: np.fromstring(x.strip("[]"), sep=" "))

# Convert our embeddings into a torch.tensor
embeddings = torch.tensor(np.stack(text_chunks_and_embedding_df["embedding"].tolist(), axis=0), dtype=torch.float32).to(device)
print(embeddings)

# Convert texts and embedding df to list of dicts
pages_and_chunks = text_chunks_and_embedding_df.to_dict(orient="records")

text_chunks_and_embedding_df

tensor([[ 0.0332, -0.0439, -0.0125,  ...,  0.0136, -0.0233,  0.0066],
        [ 0.0351, -0.0445,  0.0069,  ...,  0.0128, -0.0371, -0.0135],
        [ 0.0193, -0.0710, -0.0002,  ...,  0.0212,  0.0054, -0.0080],
        ...,
        [ 0.0437, -0.0673, -0.0028,  ..., -0.0198,  0.0059,  0.0180],
        [-0.0136, -0.0851, -0.0099,  ...,  0.0335, -0.0070, -0.0303],
        [-0.0102, -0.0314, -0.0119,  ...,  0.0408,  0.0110, -0.0369]],
       device='cuda:0')


Unnamed: 0,book_name,page_number,sentence_chunk,chunk_char_count,chunk_word_count,chunk_token_count,embedding
0,CoreCourseFinancialAccounting,1,The financial manager is at the crossroads of ...,1354,222,338.50,"[0.0332317054, -0.0439433642, -0.0124765048, 0..."
1,CoreCourseFinancialAccounting,2,CORPORATE FINANCE 2 Transactions that take pla...,1978,334,494.50,"[0.0350867622, -0.0444612168, 0.00688060233, -..."
2,CoreCourseFinancialAccounting,3,"3 Depending on your point of view, i.e. tradit...",1891,311,472.75,"[0.0193496868, -0.0710342154, -0.000178794115,..."
3,CoreCourseFinancialAccounting,3,"For instance, choosing between a capital incre...",1034,175,258.50,"[0.00877247564, 0.0161859822, 0.0063375961, -0..."
4,CoreCourseFinancialAccounting,4,CORPORATE FINANCE 4 We will develop this theme...,1575,263,393.75,"[-0.0125726759, -0.0179538578, -0.0055455626, ..."
...,...,...,...,...,...,...,...
2024,PrinciplesofFinance,616,changes; • the duration of a bond will be high...,1282,238,320.50,"[0.0633114204, -0.0404183678, 7.10027598e-05, ..."
2025,PrinciplesofFinance,617,Alpha Beta Alpha pays Beta floating rate -LIBO...,1489,265,372.25,"[0.0244190078, -0.0396583267, -0.0151689639, 0..."
2026,PrinciplesofFinance,617,The 6.25% Beta pays as a result of this arrang...,150,30,37.50,"[0.0437360406, -0.0673275068, -0.00284562842, ..."
2027,PrinciplesofFinance,618,"The riskier a firm’s cash flows are, the highe...",2185,379,546.25,"[-0.0136207528, -0.0850690305, -0.0098789474, ..."


In [2]:
embeddings.shape

torch.Size([2029, 768])

In [3]:
# Create model
from sentence_transformers import util, SentenceTransformer

embedding_model = SentenceTransformer(model_name_or_path="all-mpnet-base-v2",
                                      device=device)



**Semantic Search Pipeline for Finance Texts**

1. **Define a Query**: Choose a finance-related query, like "equity valuation techniques".
2. **Generate Embedding**: Transform the query into a numerical embedding.
3. **Calculate Similarity**: Use cosine similarity to compare the query embedding with financial text embeddings.
4. **Rank Results**: Sort results by similarity in descending order to identify the most relevant passages.


**Important: When using the dot product for comparison, make sure the vectors are of the same dimension (e.g., 768) and that both tensors or vectors use the same data type (e.g., torch.float32).**

In [4]:
embeddings.shape

torch.Size([2029, 768])

In [5]:
# 1. Define the query
query = "How does inflation affect interest rates and savings?"
print(f"Query: {query}")

# 2. Embed the query
# Note: it's import to embed you query with the same model you embedding your passages
query_embedding = embedding_model.encode(query, convert_to_tensor=True).to("cuda")

# 3. Get similarity scores with the dot product (use cosine similarity if outputs of model aren't normalized)
dot_scores = util.dot_score(a=query_embedding, b=embeddings)[0]

# 4. Get the top-k results (we'll keep top 5)
top_results_dot_product = torch.topk(dot_scores, k=5)
top_results_dot_product 

Query: How does inflation affect interest rates and savings?


torch.return_types.topk(
values=tensor([0.6803, 0.6695, 0.5989, 0.5985, 0.5792], device='cuda:0'),
indices=tensor([1514, 1513, 1353, 1354, 1516], device='cuda:0'))

In [6]:
import textwrap

def print_wrapped(text, wrap_length=80):
    wrapped_text = textwrap.fill(text, wrap_length)
    print(wrapped_text)

In [7]:
query = "How does inflation affect interest rates and savings?"
print(f"Query: '{query}'\n")
print("Results:")
# Loop through zipped together scores and indices from torch.topk
for score, idx in zip(top_results_dot_product[0], top_results_dot_product[1]):
    print(f"Score: {score:.4f}")
    print("Text:")
    print_wrapped(pages_and_chunks[idx]["sentence_chunk"])
    print(f"Page number: {pages_and_chunks[idx]['page_number']}")
    print("\n")

Query: 'How does inflation affect interest rates and savings?'

Results:
Score: 0.6803
Text:
Rational investors who set money aside for the future will demand higher
interest rates to compensate them for such periods of inflation. However,
investors who save for future consumption but leave their money uninvested or
underinvested in low-interest-bearing accounts will essentially lose value from
their financial assets because each of their future dollars will be worth less,
carrying less purchasing power when they end up needing it for use. This
relationship of saving and planning for the future is one of the most important
reasons to understand the concept of the time value of money. Nominal versus
Real Interest Rates One of the main problems of allowing inflation to determine
interest rates is that current interest rates are actually nominal interest
rates. In order to determine more practical real interest rates, the original
nominal rate must be adjusted using an inflation rate, suc

**Note: Enhance result order with a reranking model trained to prioritize search results, such as the top 25 semantic matches. For an example, see this open-source reranking model: [MXBAI ReRank Large v1](https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v1).**



### Streamlining Our Semantic Search Pipeline

Let's consolidate the steps of our semantic search process into one or two functions to facilitate repeatable workflows.

In [8]:
def retrieve_relevant_resources(query: str,
                                embeddings: torch.tensor,
                                model: SentenceTransformer=embedding_model,
                                n_resources_to_return: int=5,):
    """
    Embeds a query with model and returns top k scores and indices from embeddings.
    """

    # Embed the query
    query_embedding = model.encode(query, convert_to_tensor=True)

    # Get dot product scores on embeddings
    dot_scores = util.dot_score(query_embedding, embeddings)[0]

    scores, indices = torch.topk(input=dot_scores,
                                 k=n_resources_to_return)

    return scores, indices

def print_top_results_and_scores(query: str,
                                 embeddings: torch.tensor,
                                 pages_and_chunks: list[dict]=pages_and_chunks,
                                 n_resources_to_return: int=5):
    """
    Finds relevant passages given a query and prints them out along with their scores.
    """
    scores, indices = retrieve_relevant_resources(query=query,
                                                  embeddings=embeddings,
                                                  n_resources_to_return=n_resources_to_return)

    # Loop through zipped together scores and indices from torch.topk
    for score, idx in zip(scores, indices):
        print(f"Score: {score:.4f}")
        print("Text:")
        print_wrapped(pages_and_chunks[idx]["sentence_chunk"])
        print(f"Page number: {pages_and_chunks[idx]['page_number']}")
        print("\n")

In [9]:
query="What are hedge funds and how do they differ from mutual funds?"
# retrieve_relevant_resources(query=query, embeddings=embeddings) 
print_top_results_and_scores(query=query, embeddings=embeddings)

Score: 0.5811
Text:
Hedge funds offer additional diversification to “conventional” portfolios, as
their results are in theory not linked to the performances of equity and bond
markets. Short- seller funds, for example, bet that a stock will fall by
borrowing shares at interest and sell- ing them, then buying them back after
their price falls and returning them to the borrower. In recent years, hedge
funds’ risk-adjusted performance has been above that of tradi- tional
management, this even in bearish markets, with a relatively low correlation with
other investment opportunities. The funds of funds pick up the best hedge fund
managers and package their products to be offered to a wide number of
investors.which threaten cash ﬂows from ﬁnancial securities and which come from
the “real economy”, and there are ﬁnancial risks (liquidity, currency, interest
rate and other risks) which do not directly affect cash ﬂow and come under the
ﬁnancial sphere. In a market economy, a security’s risk is

### Local LLM Generation

we will use local(open source) LLM.

Choosing an LLM depends on your available hardware VRAM.

### Checking our local GPU memory availability

In [10]:
# Get GPU available memory
import torch
gpu_memory_bytes = torch.cuda.get_device_properties(0).total_memory
gpu_memory_gb = round(gpu_memory_bytes / (2**30))
print(f"Available GPU memory: {gpu_memory_gb} GB")

Available GPU memory: 23 GB


## Notes:

- **Model Usage**: To utilize `gemma-2b-it` or other Gemma models, you must agree to the terms and conditions on Hugging Face: [Gemma-2b-it on Hugging Face](https://huggingface.co/google/gemma-2b-it).
- **Local Setup**: Downloading and running models locally from Hugging Face requires signing into the Hugging Face CLI: [Hugging Face CLI Guide](https://huggingface.co/docs/huggingface_hub/en/guides/cli).

### Setting Up a Local LLM

Use Hugging Face `transformers` to load an LLM locally. For instance, consider using Gemma-7b-it: [Gemma-2b-it](https://huggingface.co/google/gemma-7b-it).

Here's what you need to run a model locally:
1. **Quantization Config** (optional): Determines the precision for model loading (e.g., 8bit, 4bit).
2. **Model ID**: Specifies the model and tokenizer for loading.
3. **Tokenizer**: Converts text into numerical data for the LLM.
4. **LLM Model**: Generates text based on your input.

In [24]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.utils import is_flash_attn_2_available

use_quantization_config = False

# 1. Create a quantization config
# Note: requires !pip install bitsandbytes accelerate
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True,
                                         bnb_4bit_compute_dtype=torch.float16)

# Bonus: flash attention 2 = faster attention mechanism
# Flash Attention 2 requires a GPU with a compute capability score of 8.0+ (Ampere, Ada Lovelace, Hopper and above): https://developer.nvidia.com/cuda-gpus 
if (is_flash_attn_2_available()) and (torch.cuda.get_device_capability(0)[0] >= 8):
    attn_implementation = "flash_attention_2"
else:
    attn_implementation = "sdpa" # scaled dot product attention
print(f"Using attention implementation: {attn_implementation}") 

# 2. Pick a model we'd like to use
model_id = "google/gemma-7b-it"
model_id = model_id

# 3. Instantiate tokenizer (tokenizer turns text into tokens)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_id)

# 4. Instantiate the model 
llm_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
                                                 torch_dtype=torch.float16,
                                                 quantization_config=quantization_config if use_quantization_config else None,
                                                 low_cpu_mem_usage=False, # use as much memory as we can
                                                 attn_implementation=attn_implementation)

if not use_quantization_config:
    llm_model.to("cuda")

Using attention implementation: sdpa




Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [12]:
llm_model

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 3072, padding_idx=0)
    (layers): ModuleList(
      (0-27): 28 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (v_proj): Linear4bit(in_features=3072, out_features=4096, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=3072, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear4bit(in_features=3072, out_features=24576, bias=False)
          (up_proj): Linear4bit(in_features=3072, out_features=24576, bias=False)
          (down_proj): Linear4bit(in_features=24576, out_features=3072, bias=False)
          (act_fn): GELUActivation()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
   

In [14]:
def get_model_num_params(model: torch.nn.Module) -> float:
    total_params = sum(param.numel() for param in model.parameters())
    return total_params / 1e9  # Convert to billions

# Example usage:
num_params_in_billions = get_model_num_params(llm_model)
print(f"Model parameters: {num_params_in_billions:.2f} billion")


Model parameters: 4.66 billion


### Generating text with our LLM

Let's generate text with our local LLM!

* Note: Some models have been trained/tuned to generate text with a specific template in mind.

Because `gemma-7b-it` has been trained in an instruction-tuned manner, we should follow the instruction template for the best results.

In [15]:
input_text = "What are hedge funds and how do they differ from mutual funds?"
print(f"Input text:\n{input_text}")

# Create prompt template for instruction-tuned model
dialogue_template = [
    {"role": "user",
     "content": input_text}
]

# Apply the chat template
prompt = tokenizer.apply_chat_template(conversation=dialogue_template,
                                       tokenize=False,
                                       add_generation_prompt=True)
print(f"\nPrompt (formatted):\n{prompt}")

Input text:
What are hedge funds and how do they differ from mutual funds?

Prompt (formatted):
<bos><start_of_turn>user
What are hedge funds and how do they differ from mutual funds?<end_of_turn>
<start_of_turn>model



In [16]:
tokenizer

GemmaTokenizerFast(name_or_path='google/gemma-7b-it', vocab_size=256000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<bos>', 'eos_token': '<eos>', 'unk_token': '<unk>', 'pad_token': '<pad>', 'additional_special_tokens': ['<start_of_turn>', '<end_of_turn>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<eos>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("<bos>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	3: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	4: AddedToken("<mask>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	5: AddedToken("<2mass>", rstrip=False, lstrip=False, single_w

In [17]:
# Tokenize the input text (turn it into numbers) and send it to the GPU
input_ids = tokenizer(prompt,
                      return_tensors="pt").to("cuda")

# Generate outputs from local LLM
outputs = llm_model.generate(**input_ids,
                             max_new_tokens=256)
# print(f"Model output (tokens):\n{outputs[0]}\n")

# Decode the output tokens to text
outputs_decoded = tokenizer.decode(outputs[0])
print(f"Model output (decoded):\n{outputs_decoded}\n")

Model output (decoded):
<bos><bos><start_of_turn>user
What are hedge funds and how do they differ from mutual funds?<end_of_turn>
<start_of_turn>model
Sure, here's the difference between hedge funds and mutual funds:

**Hedge Funds:**

* **Private Funds:** Hedge funds are privately managed investment funds that typically cater to wealthy investors.
* **High-Frequency Trading:** Hedge funds often engage in high-frequency trading, which involves using sophisticated algorithms to buy and sell financial instruments rapidly.
* **Unconstrained Strategies:** Hedge funds have more flexibility to use various strategies, including leverage, short selling, and derivatives.
* **High-Risk, High-Reward:** Hedge funds generally carry a higher risk profile than mutual funds, but also offer the potential for higher returns.
* **High Minimum Investments:** Hedge funds typically require a high minimum investment, often tens of millions of dollars.

**Mutual Funds:**

* **Publicly Traded:** Mutual funds a

In [18]:
query_list = [
    "What are the primary functions of the stock market in a modern economy?",
    "How do bonds differ from stocks in terms of investment risk and returns?",
    "Explain the role of the Federal Reserve in managing the U.S. economy.",
    "What are derivatives, and how are they used in financial risk management?",
    "Discuss the concept of portfolio diversification and its significance in investment strategy.",
     "What factors influence the exchange rates between two currencies?",
    "How does inflation affect interest rates and savings?",
    "What is the significance of the price-to-earnings ratio in evaluating a company's stock?",
    "Describe the process of a company going public through an initial public offering (IPO).",
    "What are hedge funds and how do they differ from mutual funds?"
]

In [19]:
import random

query = random.choice(query_list)
print(f"Query: {query}") 

# Get just the scores and indices of top related results
scores, indices = retrieve_relevant_resources(query=query,
                                              embeddings=embeddings)
scores, indices

Query: What are the primary functions of the stock market in a modern economy?


(tensor([0.5602, 0.5317, 0.5293, 0.5281, 0.5220], device='cuda:0'),
 tensor([ 302,   10, 1274, 1271, 1285], device='cuda:0'))

### Enhancing Prompts with Contextual Items

Enhancing prompts with contextual items, also known as prompt engineering, is a growing area of research within AI. This approach has several effective techniques.

We'll apply some established prompting techniques:
1. Provide clear instructions.
2. Include example input/output pairs to illustrate expectations.
3. Allow space for exploration, such as a scratchpad or a step-by-step thought process.

Next, we'll develop a function to systematically format prompts with these contextual elements.

In [20]:
def prompt_formatter(query: str, context_items: list[dict]) -> str:
    context = "- " + "\n- ".join([item["sentence_chunk"] for item in context_items])

    base_prompt = """Based on the following context items, please answer the query.

context items:
{context}
Relevant passages: <extract relevant passages from the context here>
User query: {query}

Give yourself room to think by extracting relevant passages from the context before answering the query.
Don't return the thinking process, only the final answer.
Ensure your answers are comprehensive and well-explained.
Answer:""" 
    base_prompt = base_prompt.format(context=context, query=query)

    # Create prompt template for instruction-tuned model 
    dialogue_template = [
        {"role": "user",
         "content": base_prompt}
    ]

    # Apply the chat template
    prompt = tokenizer.apply_chat_template(conversation=dialogue_template,
                                           tokenize=False,
                                           add_generation_prompt=True)
    
    return prompt

query = random.choice(query_list) 
print(f"Query: {query}")

# Get relevant resources
scores, indices = retrieve_relevant_resources(query=query, embeddings=embeddings)

# Create a list of context items
context_items = [pages_and_chunks[i] for i in indices]

# Format our prompt
prompt = prompt_formatter(query=query, context_items=context_items)
print(prompt)


Query: Describe the process of a company going public through an initial public offering (IPO).
<bos><start_of_turn>user
Based on the following context items, please answer the query.

context items:
- Listing enables the company to access new sources of funding, to raise its corporate proﬁle and to incentivise managers and employees. During the preparation phase, the whole of the company’s legal, operational and ﬁnancial structure has to be reviewed, its corporate governance needs to be adapted, ﬁnancial statements may have to be drawn up in line with the relevant accounting principles and a strategy has to be deﬁned in the form of an equity story for the market. The choice of the market segment on which the company will be listed will be determined by the size of the company and by any constraints weighing on it. The number of shares offered on the market will depend on the sizing of the IPO, which will also determine whether the shares will be shares sold by existing shareholders an

In [21]:
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate an output of tokens
outputs = llm_model.generate(**input_ids,
                             temperature=0.7, # from 0 to 1 and the lower the value, the more deterministic the text, the higher the value, the more creative
                             do_sample=True, # whether or not to use sampling, https://huyenchip.com/2024/01/16/sampling.html
                             max_new_tokens=256)

# Turn the output tokens into text
output_text = tokenizer.decode(outputs[0])
print(f"Query: {query}")
print(f"RAG answer:\m{output_text.replace(prompt, '')}")

Query: Describe the process of a company going public through an initial public offering (IPO).
RAG answer:\m<bos>Sure, here is the answer to the query:

An initial public offering (IPO) is the process of a company taking its securities to the stock market for the first time. The process typically involves several steps, including the preparation phase, the underwriting phase, the pricing phase, and the listing phase.

During the preparation phase, the company must review its legal, operational, and financial structure, adapt its corporate governance, and draw up financial statements in line with relevant accounting principles. The company must also define its strategy in the form of an equity story for the market.

In the underwriting phase, the company chooses an investment bank to act as its intermediary between the company and the market. The investment bank will provide financial advice, recommend the price and number of shares to issue, and establish a syndicate of underwriters t

### Streamline the LLM Answering Process

Now will write a single function for our RAG system. You input a query, and it outputs a generated answer along with the option to retrieve the source documents (the context) used for generating the response.

Let's create a function to accomplish this!

In [22]:
def ask(query: str,
        temperature: float=0.7,
        max_new_tokens:int=256,
        format_answer_text=True,
        return_answer_only=True):
    """
    Takes a query, finds relevant resources/context and generates an answer to the query based on the relevant resources.
    """

    # RETRIEVAL
    # Get just the scores and indices of top related results
    scores, indices = retrieve_relevant_resources(query=query,
                                                  embeddings=embeddings)

    # Create a list of context items
    context_items = [pages_and_chunks[i] for i in indices] 

    # Add score to context item
    for i, item in enumerate(context_items): 
        item["score"] = scores[i].cpu()

    # AUGMENTATION
    # Create the prompt and format it with context items
    prompt = prompt_formatter(query=query,
                              context_items=context_items)

    # GENERATION
    # Tokenize the prompt
    input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")

    # Generate an output of tokens
    outputs = llm_model.generate(**input_ids,
                                 temperature=temperature,
                                 do_sample=True,
                                 max_new_tokens=max_new_tokens)

    # Decode the tokens into text
    output_text = tokenizer.decode(outputs[0])

    # Format the answer
    if format_answer_text:
        # Replace prompt and special tokens
        output_text = output_text.replace(prompt, "").replace("<bos>", "").replace("<eos>", "")

    # Only return the answer without context items
    if return_answer_only:
        return output_text

    return output_text, context_items

In [23]:
query = """Describe the process of a company going public through an initial public offering (IPO)."""
print(f"Query: {query}")
output_text, context_items = ask(query=query,
    temperature=0.8,
    return_answer_only=False)
print("\n\nModel output: \n\n",output_text)


print("\ncontext: \n")
for sentance in context_items:
    print(" - ",sentance["sentence_chunk"])

Query: Describe the process of a company going public through an initial public offering (IPO).


Model output: 

 **Answer:**

The process of a company going public through an initial public offering (IPO) typically involves several steps.

**1. Preparation Phase:**

- The company's legal, operational, and financial structure is reviewed.
- Corporate governance needs to be adapted.
- Financial statements are drawn up in line with relevant accounting principles.
- A strategy is defined in the form of an equity story for the market.
- The choice of the market segment and the number of shares to be offered are determined.

**2. Underwriter Selection and Pricing:**

- An investment banker is engaged to provide financial advice, recommend the price and number of shares to issue, and establish a syndicate of underwriters.
- The company decides on the price and number of shares to be offered.

**3. Marketing and Prospectus:**

- A prospectus is prepared and distributed to investors.
- The co

## Summary

- **RAG Overview**: Retrieval-Augmented Generation (RAG) is a robust method for generating text from reference documents.
- **Ease of Setup**: Setting up a RAG pipeline is straightforward. We've successfully implemented one locally using just a few functions.
- **Hardware Considerations**: Utilize GPUs to enhance the speed of embedding creation and LLM text generation. Be aware of your local hardware's limitations.
- **Resources**: A growing number of open-source embedding models and LLMs are available. Continuously experiment to discover the most effective options.
- **Value of Semantic Search**: Semantic search proves to be an invaluable component of this system.