# **Financial Supervised FineTuning For Finance with Machine Learning**

In [None]:
!pip install chromadb sentence-transformers langchain transformers

In [None]:
!pip install -U langchain-community

In [4]:
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from langchain.schema import Document

 **1. Loading the Financial Theory Dataset. In this specific part, you can derive your own market theory or you can use a bunch of sources to make one. Theories are essential in any market predictions.**

In [6]:
with open("/content/MLTheoryForFinance.txt", "r", encoding="utf-8") as f:
    book_text = f.read()

**2. Splitting Text into Chunks**

In [7]:
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
chunks = splitter.split_text(book_text)
documents = [Document(page_content=chunk) for chunk in chunks]

**3. Generating Embeddings**

In [None]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

**4. Creating the Vector Database. For this I am using ChromaDB, because it is ideal for scalable RAG Applications**

In [10]:
vectordb = Chroma.from_documents(documents, embedding_model, persist_directory="./finance_chroma")
vectordb.persist()

  vectordb.persist()


**5. Loading the LLM for intelligence**

In [None]:
from huggingface_hub import login

# Paste your HuggingFace token here
login("add-your-key")


**Here, i am using an Open Model for quicker model building without waiting for a request approval. You can use any model you prefer**

In [None]:
llm_name = "tiiuae/falcon-rw-1b"
tokenizer = AutoTokenizer.from_pretrained(llm_name)
model = AutoModelForCausalLM.from_pretrained(llm_name)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, do_sample=True)
llm = HuggingFacePipeline(pipeline=pipe)

In [12]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectordb.as_retriever(search_kwargs={"k": 3}),
    chain_type="stuff"
)

In [None]:
while True:
    query = input("\nAsk a finance question (or type 'exit'): ")
    if query.lower() == "exit":
        break
    answer = qa.run(query)
    print(f"\n📘 Answer: {answer}")


Ask a finance question (or type 'exit'): How does ML work in Finance?


  answer = qa.run(query)
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



📘 Answer: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Dimensionality reduction for theory structuring

Pattern retrieval from unstructured big data

"The most insightful use of ML in finance is for discovering theories."

The book advocates for a symbiotic relationship between theory and ML, where:

Theory provides discipline and structure.

ML enables discovery and flexibility.

How ML Aids in Building Theory
ML Separates Variable Discovery from Specification

ML can uncover relevant variables without specifying the functional form.

Econometrics typically couples variable and specification search, which may fail in high-dimensional or nonlinear scenarios.

Stages in Theory Discovery via ML:

Stage 1: Use ML to uncover variables or patterns.

Stage 2: Formulate a theory using these variables.

Stage 3: Test the theory for both factual and counterfactual implications