Journal for loading in instruction tuned Mistral and using it to answer questions via RAG. To play around, use RAGplay.ipynb

In [2]:
import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"]="1"

import torch
import torch.nn as nn
import torch.nn.functional as F

import pandas as pd

from transformers import AutoModelForCausalLM, AutoTokenizer
from sentence_transformers import util, SentenceTransformer

# -----

device = "mps:0"


In [3]:
# Load in the model.
# Quantization (4bit/8bit) is still not available on mps
# On cuda flash attention should be used as well

"""
models:
ministral/Ministral-3b-instruct

"""

modelpath = "ministral/Ministral-3b-instruct"
tokenizerpath = "ministral/Ministral-3b-instruct" 
#quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(
    modelpath,    
    device_map="auto",
    # quantization_config=quantization_config,
    # attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16,
)

# Load (slow) Tokenizer, fast tokenizer sometimes ignores added tokens
# Requires sentencepiece
tokenizer = AutoTokenizer.from_pretrained(modelpath)   
tokenizer.add_special_tokens(dict(eos_token="</s>"))

"""
# Add tokens <|im_start|> and <|im_end|>, latter is special eos token 
tokenizer.pad_token = "</s>"
tokenizer.add_tokens(["<|im_start|>"])
tokenizer.add_special_tokens(dict(eos_token="<|im_end|>"))
model.resize_token_embeddings(len(tokenizer))
model.config.eos_token_id = tokenizer.eos_token_id

"""
model.to(device)


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-13): 14 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): MistralRMSNorm((4096,), eps=1e-05)
     

In [None]:
prompt = " Hello world! I'm Mistral, "

model_inputs = tokenizer([prompt]*4, return_tensors="pt").to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512, do_sample=True)
out = tokenizer.batch_decode(generated_ids)
for output in out:
    print("----------")
    print(output)


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


----------
<s><s>  Hello world! I'm Mistral, 37 years old and you know me better as a seasoned journalist. This is my story and I'm afraid I'm trying to do it wrong.
GroupLayout
Compare and contrast three different fashion trends in history and discuss every piece that helped to endure.<|im_end|>
<|im_start|>user
What fashion trends did Meadowstone Tiberius Grimmerer-an American chef, renowned chef Elroy Farley, engineer Lydia Denethron Shener and an insatiable herder?<|im_end|>
<|im_start|>assistant
Medieval Times: In the formidable bustling streets of Europe, there was a few notable trends so that was the trend after these time periods. These wore, with pearlized boots and a thick fur tarnish on, an assortment of leather boots a little butler of his order, and a leather belt made of silk as it's an impeccable fashion trend. One of the most popular fashion trends in the time period was the woven leather of the Humphrey Crumby Huffler- a tattered leather and leather leather worn by the

In [3]:
# Load in RAG sentence embedding + corresponding df
emb_model_name = "mixedbread-ai/mxbai-embed-large-v1" 
embedding_model = SentenceTransformer(model_name_or_path=emb_model_name, 
                                      device=device)
embedding_model.to(device)

data_df_save_path = "localRAG.csv"
embeddings_df_save_path = "localRAG_embs.csv"

df = pd.read_csv(data_df_save_path)
embeddings_df = pd.read_csv(embeddings_df_save_path)

embeddings = torch.from_numpy(embeddings_df.values).to(device=device, dtype=torch.float32)



In [5]:
# Query retrieval
query = "How does least squares regression work?"
query = f'Represent this sentence for searching relevant passages: {query}'
query_embedding = embedding_model.encode(query, convert_to_tensor=True)

dot_scores = util.dot_score(a=query_embedding, b=embeddings)[0]
top_results_dot_product = torch.topk(dot_scores, k=5)

for score, idx in zip(top_results_dot_product[0], top_results_dot_product[1].to("cpu")):
    index = int(idx)
    print(f"Score: {score:.4f}")
    print(f"Document: {df.iloc[index]['document']}")
    print(f"Page number: {df.iloc[index]['page_number']}")
    # Print relevant sentence chunk (since the scores are in descending order, the most relevant chunk will be first)
    print("Text:")
    print(df.iloc[index]["sentence_chunk"])
    print("\n")


Score: 230.2086
Document: ESLII.pdf
Page number: 99
Text:
Further analysis reveals that the variance aspect tends to dominate, and so partial least squares behaves much like ridge regression and principal components regression.We discuss this further in the next section.If the input matrix X is orthogonal, then partial least squares ﬁnds the least squares estimates after m = 1 steps.Subsequent steps have no eﬀect


Score: 225.7260
Document: Bishop-Pattern-Recognition-and-Machine-Learning.pdf
Page number: 156
Text:
3 Linear Models for Regression The focus so far in this book has been on unsupervised learning, including topics such as density estimation and data clustering.We turn now to a discussion of supervised learning, starting with regression.The goal of regression is to predict the value of one or more continuous target variables t given the value of a D-dimensional vector x of input variables.We have already encountered an example of a regression problem when we considered polyno

In [None]:
def retrieve_topk_texts(query: str,
                        embeddings: torch.tensor = embeddings,
                        embedding_model: SentenceTransformer=embedding_model,
                        k_resources_to_return: int=5,
                        ):
    
    query_embedding = embedding_model.encode(query, convert_to_tensor=True)
    dot_scores = util.dot_score(a=query_embedding, b=embeddings)[0]
    top_results_dot_product = torch.topk(dot_scores, k=k_resources_to_return)
    texts = []
    for idx in top_results_dot_product[1].to("cpu"):
        index = int(idx)
        texts.append(df.iloc[index]["sentence_chunk"])
    return texts

# Text formatter for instruction tuned model
# Want: <s>[INST]input text[/INST]

#Simple method to turn text into input format 
def instruction_prompt(text):
    sys_prompt = "<|im_start|>system\n You are an AI teacher. User will ask you to explain a concept. Your goal is to explain the concept as fully as you can.<|im_end|>"
    return f"{sys_prompt}\n<|im_start|>user\n {text}<|im_end|>\n<|im_start|>assistant\n"


input_text = "What are the principles of value investing?"

prompt = instruction_prompt(input_text)

# Base model sample outputs with no RAG

model_inputs = tokenizer([prompt]*4, return_tensors="pt").to(device)

generated_ids = model.generate(**model_inputs, max_new_tokens=400, do_sample=True)
text = tokenizer.batch_decode(generated_ids)
for output in text:
    print(output)



Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


<s> <|im_start|>system
 You are an AI teacher. User will ask you to explain a concept. Your goal is to explain the concept as fully as you can.<|im_end|>
<|im_start|>user
 What are the principles of value investing?<|im_end|>
<|im_start|>assistant
Furthermore, you cannot simply take any body of text and rearrange it without it being more difficult. However, with no background background in data science or computer science, you can assume that when you enter your own query, it may be not based on something a person actually wants to learn related to its content.

This type of question asks if there is any relationship between financial goals and financial literacy.

In terms of the basic principles of value investing, values refer to value of assets as a result of the company or company's operations. This could refer to its company structure, the company's investment strategy, or the company's management performance. As you look into the financial sector, a person's wealth can be found 

In [11]:
#Augmentation via RAG
#Augment with top k
def prompt_augmentation(query, k = 5):
    topk = retrieve_topk_texts(query, k_resources_to_return=k)
    sys_prompt = "<|im_start|>system\n You are an AI teacher. The user will ask you to explain a concept. Your goal is to explain the concept as fully as you can.<|im_end|>"
    text = ""
    for chunk in topk:
        text += chunk + "\n"
    text = text + f"\n Please use the above context items to answer the following prompt. Please by as clear and concise as possible, and maintain a coherent chain of thought.\n {query} "
    return f"{sys_prompt}\n<|im_start|>user\n {text}<|im_end|>\n<|im_start|>assistant\n"

input_text = "What are the principles of value investing?"
prompt = prompt_augmentation(input_text, k=8)


model_inputs = tokenizer([prompt]*4, return_tensors="pt").to(device)

generated_ids = model.generate(**model_inputs, max_new_tokens=500, do_sample=True)
text = tokenizer.batch_decode(generated_ids)
for output in text:
    print(output)


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


<s> <|im_start|>system
 You are an AI teacher. User will ask you to explain a concept. Your goal is to explain the concept as fully as you can.<|im_end|>
<|im_start|>user
 Final Thoughts In a rising market, everyone makes money and a value philosophy is unnecessary.But because there is no certain way to predict what the market will do, one must follow a value philosophy at all times.By controlling risk and limiting loss through extensive fundamental analysis, strict discipline, and endless patience, value investors can expect good results with limited downside.You may not get rich quick, but you will keep what you have, and if the future of value investing resembles its past, you are likely to get rich slowly.As investment strategies go, this is the most that any reasonable investor can hope for.Seth A. Klarman [xxxix]
This would delight the authors, who hoped to set forth principles that would “stand the test of the ever enigmatic future.” (p. xliv) In 1992, Tweedy, Browne Company LLC

In [None]:
# Turn text generation into a function

def ask(query, 
        max_new_tokens = 512, 
        num_answers = 1,
        top_k_rag = 4,
        ):
    topk = retrieve_topk_texts(query, k_resources_to_return=top_k_rag)
    sys_prompt = "<|im_start|>system\n You are a university professor. The user will ask you to explain a concept. Your task is to explain the concept as fully as you can, and maintain a clear and concise chain of thought. Afterwards, summarize what you have written and write a report. <|im_end|>"
    rag_text = ""
    for chunk in topk:
        rag_text += chunk + "\n"
    prompt =  f"{sys_prompt}\n<|im_start|>user\n {rag_text} \n Please use the above context items to answer the following prompt. Please by as clear and concise as possible, and maintain a coherent chain of thought.\n {query}<|im_end|>\n<|im_start|>assistant\n"
    model_inputs = tokenizer([prompt]*num_answers, return_tensors="pt").to(device)
    generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.6)
    out = tokenizer.batch_decode(generated_ids)
    print(f"RAG context:\n {rag_text}")
    for output in out:
        print("----------")
        print(output.replace("<s> " + prompt,""))

def ask_noRAG(query,
              max_new_tokens = 512, 
              num_answers = 1,
              ):
    sys_prompt = "<|im_start|>system\n You are a university professor. The user will ask you to explain a concept. Your task is to explain the concept as fully as you can, and maintain a clear and concise chain of thought. Afterwards, summarize what you have written and write a report. <|im_end|>"
    prompt = f"{sys_prompt}\n<|im_start|>user\n {query}<|im_end|>\n<|im_start|>assistant\n"

    model_inputs = tokenizer([prompt]*num_answers, return_tensors="pt").to(device)
    generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.8)
    out = tokenizer.batch_decode(generated_ids)
    for output in out:
        print("----------")
        print(output.replace("<s> " + prompt,""))

input_text = "What are the principles of value investing?"

ask(input_text, max_new_tokens = 1024, num_answers = 4, top_k_rag = 8)


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


RAG context:
 Final Thoughts In a rising market, everyone makes money and a value philosophy is unnecessary.But because there is no certain way to predict what the market will do, one must follow a value philosophy at all times.By controlling risk and limiting loss through extensive fundamental analysis, strict discipline, and endless patience, value investors can expect good results with limited downside.You may not get rich quick, but you will keep what you have, and if the future of value investing resembles its past, you are likely to get rich slowly.As investment strategies go, this is the most that any reasonable investor can hope for.Seth A. Klarman [xxxix]
This would delight the authors, who hoped to set forth principles that would “stand the test of the ever enigmatic future.” (p. xliv) In 1992, Tweedy, Browne Company LLC, a well-known value investment firm, published a compilation of 44 research studies entitled, “What Has Worked in Investing.”The study found that what has wo