<a href="https://colab.research.google.com/github/adarshukla3005/Financial_Report_Generator/blob/main/Financial_Report_Generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **LLM + RAG Projects on Finance Domain**

This notebook contains the use cases of RAG and LLM in Finance Domain using Python + Langchain and Open Source LLMs and Vector DBs.

![](https://marcabraham.files.wordpress.com/2024/03/raga-retrieval-augmented-generation-and-actions.png?w=1024)

# **Developing a Financial Report Using Economic Indicators from an API**  
Utilizing the Financial Modelling Prep API to gather Market Economic Indicators.

**Problem Statement:**  
Creating a financial report for a company or stock using the latest market and economic data, without the need for training or fine-tuning large language models (LLMs) or machine learning models.

**Project Methodology:**
- This project leverages an open-source API to retrieve the latest financial and market data related to company metrics and economic indicators.
- Python is used to process and save this data into a CSV file.
- The CSV file is then loaded into a Vector Database with the help of an embedding model from Hugging Face.
- A Retrieval-Augmented Generation (RAG) question-answering (QA) chain is built using Langchain, and the RAG architecture is implemented with the Falcon 7B LLM (an open-source model).
- The system is tested by querying the built model and analyzing the responses.

**NOTE:**  
While this tutorial uses open-source LLMs, the accuracy of the responses might not be as high. However, using models such as OpenAI's GPT or Meta's LLAMA can significantly improve the output accuracy, even with the same code.

In [49]:
from google.colab import userdata
userdata.get('api_key')

try:
    from urllib.request import urlopen
except ImportError:
    from urllib2 import urlopen

import certifi
import json
import pandas as pd


def get_jsonparsed_data(url, api_key, exchange):
  if exchange == "NSE":
    url = f"https://financialmodelingprep.com/api/v3/search?query={ticker}&exchange=NSE&apikey={api_key}"
  else:
    url = f"https://financialmodelingprep.com/api/v3/quote/{ticker}?apikey={api_key}"
  response = urlopen(url, cafile=certifi.where())
  data = response.read().decode("utf-8")
  return json.loads(data)

api_key = userdata.get('api_key2')
ticker = "AAPL"
exchange = "US"
eco_ind = pd.DataFrame(get_jsonparsed_data(ticker, api_key,exchange))
eco_ind

Unnamed: 0,symbol,name,price,changesPercentage,change,dayLow,dayHigh,yearHigh,yearLow,marketCap,...,exchange,volume,avgVolume,open,previousClose,eps,pe,earningsAnnouncement,sharesOutstanding,timestamp
0,AAPL,Apple Inc.,214,0.23889,0.51,209.98,215.22,260.1,164.08,3214729400000,...,NASDAQ,47841607,53441279,213.36,213.49,6.97,30.7,2025-04-30T10:59:00.000+0000,15022100000,1742241601


### Installing the Langchain Libraries

In [50]:
!pip install langchain langchain-community langchain-core transformers



In [51]:
def preprocess_economic_data(df):
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df['earningsAnnouncement'] = pd.to_datetime(df['earningsAnnouncement'])
    return df

preprocessed_economic_df = preprocess_economic_data(eco_ind)
preprocessed_economic_df

Unnamed: 0,symbol,name,price,changesPercentage,change,dayLow,dayHigh,yearHigh,yearLow,marketCap,...,exchange,volume,avgVolume,open,previousClose,eps,pe,earningsAnnouncement,sharesOutstanding,timestamp
0,AAPL,Apple Inc.,214,0.23889,0.51,209.98,215.22,260.1,164.08,3214729400000,...,NASDAQ,47841607,53441279,213.36,213.49,6.97,30.7,2025-04-30 10:59:00+00:00,15022100000,1970-01-01 00:00:01.742241601


### Storing the Pre-Processed Data into CSV

In [52]:
preprocessed_economic_df.to_csv("eco_ind.csv")

### Installing the Hugging Face Embedding Library

In [53]:
%pip install --upgrade --quiet  langchain sentence_transformers

In [54]:
from langchain_community.embeddings import HuggingFaceEmbeddings
hg_embeddings = HuggingFaceEmbeddings()

In [55]:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader_eco = CSVLoader('eco_ind.csv')
documents_eco = loader_eco.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=5)

# Split your docs into texts
texts_eco = text_splitter.split_documents(documents_eco)

# Embeddings
embeddings = HuggingFaceEmbeddings()

### Building the Vector DB for RAG

In [56]:
from langchain.vectorstores import Chroma

persist_directory = 'docs/chroma_rag/'

In [57]:
!pip install chromadb



In [58]:
economic_langchain_chroma = Chroma.from_documents(
    documents=texts_eco,
    collection_name="economic_data",
    embedding=hg_embeddings,
    persist_directory=persist_directory
)

In [65]:
question = "Microsoft(MSFT)"
docs_eco = economic_langchain_chroma.similarity_search(question,k=3)

### Building RAG Chain using Vector DB and LLM

In [66]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.llms import HuggingFaceHub
from IPython.display import display, Markdown
import os
import warnings
warnings.filterwarnings('ignore')
HUGGINGFACEHUB_API_TOKEN = userdata.get('api_key')

llm = HuggingFaceHub(
    repo_id="tiiuae/falcon-7b-instruct",
    model_kwargs={"temperature": 0.1},
)

retriever_eco = economic_langchain_chroma.as_retriever(search_kwargs={"k":2})
qs="Microsoft(MSFT) Financial Report"
template = """You are a Financial Market Expert and Get the Market Economic Data and Market News about Company and Build the Financial Report for me.
              Understand this Market Information {context} and Answer the Query for this Company {question}. i just need the data into Tabular Form as well."""

PROMPT = PromptTemplate(input_variables=["context","question"], template=template)
qa_with_sources = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff",chain_type_kwargs = {"prompt": PROMPT}, retriever=retriever_eco, return_source_documents=True)
llm_response = qa_with_sources({"query": qs})

In [67]:
import requests

headers = {"Authorization": f"Bearer {os.environ['HUGGINGFACEHUB_API_TOKEN']}"}
response = requests.get("https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct", headers=headers)

print(response.status_code)
print(response.json())

200
{'_id': '6447714d3411a0902bad9607', 'id': 'tiiuae/falcon-7b-instruct', 'sha': '8782b5c5d8c9290412416618f36a133653e85285', 'pipeline_tag': 'text-generation', 'library_name': 'transformers', 'private': False, 'gated': False, 'siblings': [], 'safetensors': {'parameters': {'BF16': 7217189760}}, 'tags': ['transformers', 'pytorch', 'coreml', 'safetensors', 'falcon', 'text-generation', 'conversational', 'custom_code', 'en', 'dataset:tiiuae/falcon-refinedweb', 'arxiv:2205.14135', 'arxiv:1911.02150', 'arxiv:2005.14165', 'arxiv:2104.09864', 'arxiv:2306.01116', 'license:apache-2.0', 'autotrain_compatible', 'text-generation-inference', 'endpoints_compatible', 'region:us'], 'cardData': {'tags': None, 'base_model': None}}


In [68]:
Markdown(llm_response['result'])

You are a Financial Market Expert and Get the Market Economic Data and Market News about Company and Build the Financial Report for me.
              Understand this Market Information : 0
symbol: MSFT
name: Microsoft Corporation

earningsAnnouncement: 2025-04-30 10:59:00+00:00 and Answer the Query for this Company Microsoft(MSFT) Financial Report. i just need the data into Tabular Form as well.|

| Metric | Value | Unit |
|---|---|---|
| **Market Capitalization** | 2,233,668,800,000 | USD |
| **Enterprise Value** | 2,377,547,200,000 | USD |
| **Revenue (TTM)** | 168,088,000,000 | USD |
| **Net Income (TTM)** | 61,274,000,000 | USD |
| **Earnings per Share (TTM)** | 2.45 | USD |
| **Book Value per Share** | 11.53 | USD |
| **Dividend per Share (TTM)** | 2.56 | USD |
| **Dividend Yield** | 0.60% |  |
| **Price to Earnings Ratio** | 29.47 |  |
| **Price to Book Ratio** | 10.23 |  |
| **Price to Sales Ratio** | 13.33 |  |
| **Return on Equity (TTM)** | 17.92% |  |
| **Return on Assets (TTM)** | 11.34% |  |
| **Debt to Equity Ratio** | 0.44 |  |
| **Current Ratio** | 1.40 |  |
| **Quick Ratio** | 1.32 |  |
| **Free Cash Flow (TTM)** | 57,424,000,000 | USD |
| **Free Cash Flow Yield** | 2.57% |  |
| **Earnings Growth (5Y)** | 11.90% |  |
| **Revenue Growth (5Y)** | 12.60% |  |
| **Analyst Recommendation** | 1.80 | (1=Strong Buy, 5=Strong Sell) |
| **Average Volume (3M)** | 45,750,000 | Shares |
| **Shares Outstanding** | 25,000,000,000 | Shares |
| **Insider Ownership** | 0.10% |  |
| **Institutional Ownership** | 72.40% |  |
| **Beta (5Y)** | 1.04 |  |
| **52-Week Range** | 203.11 - 349.68 | USD |
| **52-Week Change** | 69.57% |  |
| **YTD Change** | 15.31% |  |
| **Earnings Announcement Date** | 2025-04-30 10:59:00+00:00 |  |

# **Using NEWS API to Build Financial News Summarizer about the Company Sentiment in Current Time**

 ### Fetchning the Latest Data using the NEWSAPI with the help of API Key from there website.

 **Problem Statment:** Building a GenAI based system that can analyse the market news about the whole stock exchange or a company and tell me about the sentiment of market along with analysis based on news.

**Project Methodology**
- This Project using the open source API to fetch the latest financial news regarding Company and Market.
- Using Python, that fetched data is pre-processed and saved in CSV File.
- Loading that same CSV file to insert into Vector DB using Embedding Model from Hugging Face.
- Building RAG QA Chain using Langchain and building the RAG architecture using Falcon 7B LLM (Open Source).
- Checking the Response.

In [69]:
pip install newsapi-python



In [75]:
import requests
import pandas as pd
from newsapi import NewsApiClient  # Corrected import
from datetime import datetime, timedelta

def fetch_news(query, from_date, to_date, language='en', sort_by='relevancy', page_size=30, api_key='YOUR_API_KEY'):
    # Initialize the NewsAPI client
    newsapi = NewsApiClient(api_key=api_key)

    # Fetch all articles matching the query
    all_articles = newsapi.get_everything(
        q=query,
        from_param=from_date,
        to=to_date,
        language=language,
        sort_by=sort_by,
        page_size=page_size
    )

    # Extract articles
    articles = all_articles.get('articles', [])

    # Convert to DataFrame
    if articles:
        df = pd.DataFrame(articles)
        return df
    else:
        return pd.DataFrame()  # Return an empty DataFrame if no articles are found

# Get the current time
current_time = datetime.now()
# Get the time 10 days ago
time_10_days_ago = current_time - timedelta(days=10)
api_key = 'c0e23a8956cf4b54af382abd932f88ff'
q = "Apple News Report for February 2025"
df = fetch_news(q, time_10_days_ago, current_time, api_key=api_key)

if not df.empty:
    df_news = df.drop("source", axis=1)

    def preprocess_news_data(df):
        # Convert publishedAt to datetime
        df['publishedAt'] = pd.to_datetime(df['publishedAt'])
        df = df[~df['author'].isna()]
        df = df[['author', 'title']]
        return df

    preprocessed_news_df = preprocess_news_data(df_news)
    print(preprocessed_news_df.head())
else:
    print("No articles found.")

                                              author  \
0             Corbin Hiar, Sara Schonhardt, E&E News   
1                                    Sead Fadilpašić   
2  Derek Saul, Forbes Staff, \n Derek Saul, Forbe...   
3                                    Aishwarya Kumar   
4           info@thehackernews.com (The Hacker News)   

                                               title  
0  Big Business Backs Away from Tackling Climate ...  
1  Apple fixes dangerous zero-day used in attacks...  
2  Stock Market Comeback Erased: S&P 500 Sinks To...  
3  Why astronaut Suni Williams runs in space: a 1...  
4  ⚡ THN Weekly Recap: New Attacks, Old Tricks, B...  


### Pre-Processing the Data

In [76]:
def build_prompt(news_df):
    prompt = "You are a financial analyst tasked with providing insights into recent news articles related to the financial industry. Here are some recent news articles:\n\n"

    for index, row in news_df.iterrows():
        title = row['title']
        prompt += f"   **News:** {title}\n\n"

    prompt += "Please analyze these articles and provide insights into any potential impacts on the financial industry Sentiment on the provided company."

    return prompt

# Build the prompt
prompt = build_prompt(preprocessed_news_df)
print(prompt)

You are a financial analyst tasked with providing insights into recent news articles related to the financial industry. Here are some recent news articles:

   **News:** Big Business Backs Away from Tackling Climate Change as Trump Axes Environmental Efforts

   **News:** Apple fixes dangerous zero-day used in attacks against iPhones and iPads

   **News:** Stock Market Comeback Erased: S&P 500 Sinks To 6-Month Low As Trump Says Don’t ‘Watch The Stock Market’

   **News:** Why astronaut Suni Williams runs in space: a 14-billion-year odyssey

   **News:** ⚡ THN Weekly Recap: New Attacks, Old Tricks, Bigger Impact

   **News:** Even as US slashes jobs, ‘it is the calm before the storm’, economists warn

   **News:** Medusa ransomware hit over 300 critical infrastructure organizations until February 2025

   **News:** A Fed meeting and retail sales data greet a flailing stock market: What to know this week

   **News:** Apple fixed the third actively exploited zero-day of 2025

   **News:

### LLM from Hugging Face Open Source

In [80]:
from google.colab import userdata
import os

# Retrieve the API key securely
api_key = userdata.get("api_key")

# # Set it as an environment variable
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = api_key

In [81]:
from langchain_community.llms import HuggingFaceHub

llm = HuggingFaceHub(
    repo_id="tiiuae/falcon-7b-instruct",
    model_kwargs={"temperature": 0.1},
)

print("✅ Model Loaded Successfully!")

✅ Model Loaded Successfully!


In [82]:
Markdown(llm.invoke(prompt))

You are a financial analyst tasked with providing insights into recent news articles related to the financial industry. Here are some recent news articles:

   **News:** Big Business Backs Away from Tackling Climate Change as Trump Axes Environmental Efforts

   **News:** Apple fixes dangerous zero-day used in attacks against iPhones and iPads

   **News:** Stock Market Comeback Erased: S&P 500 Sinks To 6-Month Low As Trump Says Don’t ‘Watch The Stock Market’

   **News:** Why astronaut Suni Williams runs in space: a 14-billion-year odyssey

   **News:** ⚡ THN Weekly Recap: New Attacks, Old Tricks, Bigger Impact

   **News:** Even as US slashes jobs, ‘it is the calm before the storm’, economists warn

   **News:** Medusa ransomware hit over 300 critical infrastructure organizations until February 2025

   **News:** A Fed meeting and retail sales data greet a flailing stock market: What to know this week

   **News:** Apple fixed the third actively exploited zero-day of 2025

   **News:** The Blogger Who Upended a Murder Trial

   **News:** North Korea-linked APT Moonstone used Qilin ransomware in limited attacks

   **News:** Switzerland’s NCSC requires cyberattack reporting for critical infrastructure within 24 hours

   **News:** U.S. CISA adds Apple products and Juniper Junos OS flaws to its Known Exploited Vulnerabilities catalog

   **News:** This week on "Sunday Morning" (March 16)

   **News:** Microsoft Patch Tuesday security updates for March 2025 fix six actively exploited zero-days

   **News:** Japanese telecom giant NTT suffered a data breach that impacted 18,000 companies

   **News:** GitLab addressed critical auth bypass flaws in CE and EE

   **News:** Meta warns of actively exploited flaw in FreeType library

   **News:** Cisco IOS XR flaw allows attackers to crash BGP process on routers

   **News:** Elon Musk blames a massive cyberattack for the X outages

   **News:** Denmark warns of increased state-sponsored campaigns targeting the European telcos

   **News:** New MassJacker clipper targets pirated software seekers

   **News:** Rep. Josh Gottheimer Sells Apple Inc. (NASDAQ:AAPL) Stock

   **News:** Sen. Shelley Moore Capito Sells Off Shares of Apple Inc. (NASDAQ:AAPL)

   **News:** SuperBlack Ransomware operators exploit Fortinet Firewall flaws in recent attacks

   **News:** Experts warn of a coordinated surge in the exploitation attempts of SSRF vulnerabilities

   **News:** U.S. CISA adds six Microsoft Windows flaws to its Known Exploited Vulnerabilities catalog

   **News:** Experts warn of mass exploitation of critical PHP flaw CVE-2024-4577

   **News:** Zacks Research Predicts Hyatt Hotels’ Q2 Earnings (NYSE:H)

   **News:** LockBit ransomware developer Rostislav Panev was extradited from Israel to the U.S.

Please analyze these articles and provide insights into any potential impacts on the financial industry Sentiment on the provided company. Here are some specific questions to guide your analysis:

1. **Market Sentiment:** How do the recent news articles influence the overall market sentiment? Are there any significant trends or patterns emerging from the news that could impact the financial industry as a whole?

2. **Company-specific Sentiment:** For the companies mentioned in the news articles (Apple, Microsoft, Cisco, NTT, GitLab, Meta, Fortinet, Hyatt Hotels), what are the potential impacts on their stock prices and overall business performance based on the recent news? Are there any notable insider trading activities (e.g., Rep. Josh Gottheimer and Sen. Shelley Moore Capito selling Apple stocks)?

3. **Sector-specific Sentiment:** How do the news articles affect the sentiment towards specific sectors within the financial industry, such as technology, cybersecurity, or hospitality?

4. **Geopolitical Sentiment:** Are there any geopolitical implications in the news articles that could influence the financial industry, either directly or indirectly?

5. **Regulatory Sentiment:** How do regulatory changes or actions (e.g., Switzerland's NCSC requiring cyberattack reporting, U.S. CISA adding vulnerabilities to its catalog) impact the financial industry and the companies mentioned in the news articles?

6. **Economic Sentiment:** How do economic indicators and trends (e.g., job slashing, retail sales data) mentioned in the news articles influence the financial industry and the companies mentioned in the news articles?

7. **Cybersecurity Sentiment:** With the increasing number of cybersecurity incidents and vulnerabilities mentioned in the news articles, how does this trend impact the financial industry, particularly the technology and cybersecurity sectors?

8. **ESG Sentiment:** How do environmental, social, and governance (ESG) factors, such as climate change and social issues, influence the financial industry and the companies mentioned in the news articles?

Based on your analysis, provide a summary of the potential impacts on the financial industry, company-specific sentiments, and any other relevant insights.

# **Financial Data Investment Advisor**

**Problem Statment:** Building a Financial Advisor based on the Data that gathered from various financial advices in dataset from Stocks to mutual funds to gold or silver bonds as well using Python, Langchain and LLM (open source).

**Project Methodology**
- This Project using the Open Source Data from Kaggle regarding financial advices.
- Using Python, that load data and then pre-processed and saved in CSV File.
- Loading that same CSV file to insert into Vector DB using Embedding Model from Hugging Face.
- Building RAG QA Chain using Langchain and building the RAG architecture using Falcon 7B LLM (Open Source).
- Checking the Response.


![](https://media.licdn.com/dms/image/D5612AQFSyeoRrkC5fw/article-cover_image-shrink_720_1280/0/1701189671766?e=2147483647&v=beta&t=cpa6wlGMWG44ZyGW6MWyKZ2Vr0BT-G1zlb8RB0yio6w)

## **Loading the Financial Data from Kaggle or Any Open Source Platform**

Data Source - https://www.kaggle.com/datasets/nitindatta/finance-data

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("nitindatta/finance-data")

print("Path to dataset files:", path)

Path to dataset files: /root/.cache/kagglehub/datasets/nitindatta/finance-data/versions/1


In [None]:
data = pd.read_csv("/content/Finance_data.csv")
data_fin = data.to_dict(orient='records')

In [None]:
for entry in data_fin:
  prompt = f"I'm a {entry['age']}-year-old {entry['gender']} looking to invest in {entry['Avenue']} for {entry['Purpose']} over the next {entry['Duration']}. What are my options?"
  print(prompt)

I'm a 34-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next 1-3 years. What are my options?
I'm a 23-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next More than 5 years. What are my options?
I'm a 30-year-old Male looking to invest in Equity for Wealth Creation over the next 3-5 years. What are my options?
I'm a 22-year-old Male looking to invest in Equity for Wealth Creation over the next Less than 1 year. What are my options?
I'm a 24-year-old Female looking to invest in Equity for Wealth Creation over the next Less than 1 year. What are my options?
I'm a 24-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next 1-3 years. What are my options?
I'm a 27-year-old Female looking to invest in Equity for Wealth Creation over the next 3-5 years. What are my options?
I'm a 21-year-old Male looking to invest in Mutual Fund for Wealth Creation over the next 3-5 years. What are my options?
I'm a 35-yea

### Pre-Processng the Data into Prompt-Response Format

In [None]:
# Convert the data to prompt-response format
prompt_response_data = []
for entry in data_fin:
    prompt = f"I'm a {entry['age']}-year-old {entry['gender']} looking to invest in {entry['Avenue']} for {entry['Purpose']} over the next {entry['Duration']}. What are my options?"
    response = (
        f"Based on your preferences, here are your investment options:\n"
        f"- Mutual Funds: {entry['Mutual_Funds']}\n"
        f"- Equity Market: {entry['Equity_Market']}\n"
        f"- Debentures: {entry['Debentures']}\n"
        f"- Government Bonds: {entry['Government_Bonds']}\n"
        f"- Fixed Deposits: {entry['Fixed_Deposits']}\n"
        f"- PPF: {entry['PPF']}\n"
        f"- Gold: {entry['Gold']}\n"
        f"Factors considered: {entry['Factor']}\n"
        f"Objective: {entry['Objective']}\n"
        f"Expected returns: {entry['Expect']}\n"
        f"Investment monitoring: {entry['Invest_Monitor']}\n"
        f"Reasons for choices:\n"
        f"- Equity: {entry['Reason_Equity']}\n"
        f"- Mutual Funds: {entry['Reason_Mutual']}\n"
        f"- Bonds: {entry['Reason_Bonds']}\n"
        f"- Fixed Deposits: {entry['Reason_FD']}\n"
        f"Source of information: {entry['Source']}\n"
    )
    prompt_response_data.append({"prompt": prompt, "response": response})

prompt_response_data[:5]

[{'prompt': "I'm a 34-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next 1-3 years. What are my options?",
  'response': 'Based on your preferences, here are your investment options:\n- Mutual Funds: 1\n- Equity Market: 2\n- Debentures: 5\n- Government Bonds: 3\n- Fixed Deposits: 7\n- PPF: 6\n- Gold: 4\nFactors considered: Returns\nObjective: Capital Appreciation\nExpected returns: 20%-30%\nInvestment monitoring: Monthly\nReasons for choices:\n- Equity: Capital Appreciation\n- Mutual Funds: Better Returns\n- Bonds: Safe Investment\n- Fixed Deposits: Fixed Returns\nSource of information: Newspapers and Magazines\n'},
 {'prompt': "I'm a 23-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next More than 5 years. What are my options?",
  'response': 'Based on your preferences, here are your investment options:\n- Mutual Funds: 4\n- Equity Market: 3\n- Debentures: 2\n- Government Bonds: 1\n- Fixed Deposits: 5\n- PPF: 6\n- Gold: 7\

### Storing Data into Vector DB

In [None]:
from langchain.docstore.document import Document
documents = []
for entry in prompt_response_data:
    combined_text = f"Prompt: {entry['prompt']}\nResponse: {entry['response']}"
    documents.append(Document(page_content=combined_text))

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
texts = text_splitter.split_documents(documents)

In [None]:
from langchain.vectorstores import Chroma
persist_directory = 'docs/chroma/'
vectordb_fin = Chroma.from_documents(
    documents=texts,
    embedding=hg_embeddings,
    persist_directory=persist_directory
)

### Building RAG System using VectorDB and LLM

In [None]:
from langchain.chains import RetrievalQA
retriever_fin = vectordb_fin.as_retriever(search_kwargs={"k":5})
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever_fin, return_source_documents=False)
query = "I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?"
result = qa({"query": query})
result

{'query': "I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?",
 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nPrompt: I'm a 34-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\n\nPrompt: I'm a 32-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\n\nPrompt: I'm a 28-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\n\nPrompt: I'm a 24-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\n\nPrompt: I'm a 29-year-old Male looking to invest in Mutual Fund for Wealth Creation over the next\n\nQuestion: I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?\nHelpful Answer: As a 34-year-old fe