## Finance Bot Using Langchain , Openai, Streamlit



In [33]:
from langchain.document_loaders import UnstructuredURLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer,util
import numpy as np
import faiss

In [34]:
loader = UnstructuredURLLoader(
    urls=[
        'https://www.moneycontrol.com/news/business/markets/oil-settles-lower-ahead-of-opec-decision-11817711.html',
        'https://www.moneycontrol.com/news/india/latest-daily-news-live-india-world-cyclone-michaung-chennai-floods-israel-hamas-war-breaking-news-and-updates-liveblog-11860771.html'
    ]
    # urls=[
    #     'https://www.moneycontrol.com/news/business/markets/oil-settles-lower-ahead-of-opec-decision-11817711.html',
    #     'https://www.moneycontrol.com/news/business/markets/gold-hits-6-month-high-on-fed-pause-expectation-softer-dollar-11817261.html'
    # ]
)
data=loader.load()

In [35]:
len(data)

2

In [36]:
text_splitter = RecursiveCharacterTextSplitter(separators='\n',chunk_size=600,chunk_overlap=100)
docs=text_splitter.split_documents(data)

In [37]:
docs[0].page_content

'English\n\nHindi\n\nGujarati\n\nSpecials\n\nMoneycontrol Trending Stock\n\nInfosys\xa0INE009A01021, INFY, 500209\n\nState Bank of India\xa0INE062A01020, SBIN, 500112\n\nYes Bank\xa0INE528G01027, YESBANK, 532648\n\nBank Nifty\n\nNifty 500\n\nQuotes\n\nMutual Funds\n\nCommodities\n\nFutures & Options\n\nCurrency\n\nNews\n\nCryptocurrency\n\nForum\n\nNotices\n\nVideos\n\nGlossary\n\nAll'

In [38]:
len(docs)

69

In [39]:
docs[0]

Document(page_content='English\n\nHindi\n\nGujarati\n\nSpecials\n\nMoneycontrol Trending Stock\n\nInfosys\xa0INE009A01021, INFY, 500209\n\nState Bank of India\xa0INE062A01020, SBIN, 500112\n\nYes Bank\xa0INE528G01027, YESBANK, 532648\n\nBank Nifty\n\nNifty 500\n\nQuotes\n\nMutual Funds\n\nCommodities\n\nFutures & Options\n\nCurrency\n\nNews\n\nCryptocurrency\n\nForum\n\nNotices\n\nVideos\n\nGlossary\n\nAll', metadata={'source': 'https://www.moneycontrol.com/news/business/markets/oil-settles-lower-ahead-of-opec-decision-11817711.html'})

In [40]:
model = SentenceTransformer('all-MiniLM-L6-v2')

In [41]:
docc=[]
metadata=[]
for i in range(len(docs)):
    docc.append(docs[i].page_content)
    metadata.append(docs[i].metadata)

In [42]:
docc

['English\n\nHindi\n\nGujarati\n\nSpecials\n\nMoneycontrol Trending Stock\n\nInfosys\xa0INE009A01021, INFY, 500209\n\nState Bank of India\xa0INE062A01020, SBIN, 500112\n\nYes Bank\xa0INE528G01027, YESBANK, 532648\n\nBank Nifty\n\nNifty 500\n\nQuotes\n\nMutual Funds\n\nCommodities\n\nFutures & Options\n\nCurrency\n\nNews\n\nCryptocurrency\n\nForum\n\nNotices\n\nVideos\n\nGlossary\n\nAll',
 'Futures & Options\n\nCurrency\n\nNews\n\nCryptocurrency\n\nForum\n\nNotices\n\nVideos\n\nGlossary\n\nAll\n\nHello, LoginHello, LoginLog-inor Sign-UpMy AccountMy Profile My PortfolioMy WatchlistMy Credit Score₹100 CashbackMy FeedMy MessagesMy AlertsMy Profile My PROMy PortfolioMy WatchlistMy Credit Score₹100 CashbackMy FeedMy MessagesMy AlertsLogoutChat with UsDownload AppFollow us on:\n\nPremium\n\nMy Feed',
 '\nMarketsHOMEINDIAN INDICESSTOCK ACTIONAll StatsTop GainersTop LosersOnly BuyersOnly Sellers52 Week High52 Week LowPrice ShockersVolume ShockersMost Active StocksGLOBAL MARKETSUS MARKETSBIG SHA

## Preprocessing

In [43]:
text_genism="".join(docc)

In [44]:
from gensim import corpora
from gensim.models import LdaModel
from gensim.parsing.preprocessing import preprocess_string
text = text_genism
processed_text = preprocess_string(text)
dictionary = corpora.Dictionary([processed_text])
corpus = [dictionary.doc2bow(processed_text)]
lda_model = LdaModel(corpus, num_topics=5, id2word=dictionary)
topics = lda_model.print_topics()
topics



[(0,
  '0.014*"trade" + 0.014*"new" + 0.011*"market" + 0.008*"invest" + 0.007*"stock" + 0.005*"credit" + 0.005*"oil" + 0.005*"pad" + 0.005*"live" + 0.005*"free"'),
 (1,
  '0.012*"trade" + 0.012*"invest" + 0.012*"new" + 0.011*"market" + 0.006*"decemb" + 0.006*"credit" + 0.005*"stock" + 0.005*"fund" + 0.005*"india" + 0.005*"oil"'),
 (2,
  '0.015*"trade" + 0.013*"new" + 0.010*"invest" + 0.009*"market" + 0.005*"stock" + 0.005*"pdg" + 0.005*"fund" + 0.005*"high" + 0.004*"call" + 0.004*"sepinvsli"'),
 (3,
  '0.013*"trade" + 0.012*"new" + 0.010*"invest" + 0.009*"market" + 0.006*"price" + 0.005*"stock" + 0.005*"credit" + 0.005*"investnw" + 0.005*"bank" + 0.004*"ist"'),
 (4,
  '0.011*"market" + 0.010*"invest" + 0.008*"trade" + 0.008*"new" + 0.006*"stock" + 0.005*"oil" + 0.005*"call" + 0.005*"sepinvsli" + 0.005*"week" + 0.004*"credit"')]

In [45]:
topic_only=[]
for i in topics:  
    for j in i[1].split("+"):
        topic_only.append((j.split("*")[1][1:-2]).strip())

In [46]:
topic_only

['trade',
 'new',
 'market',
 'invest',
 'stock',
 'credit',
 'oil',
 'pad',
 'live',
 'fre',
 'trade',
 'invest',
 'new',
 'market',
 'decemb',
 'credit',
 'stock',
 'fund',
 'india',
 'oi',
 'trade',
 'new',
 'invest',
 'market',
 'stock',
 'pdg',
 'fund',
 'high',
 'call',
 'sepinvsl',
 'trade',
 'new',
 'invest',
 'market',
 'price',
 'stock',
 'credit',
 'investnw',
 'bank',
 'is',
 'market',
 'invest',
 'trade',
 'new',
 'stock',
 'oil',
 'call',
 'sepinvsli',
 'week',
 'credi']

In [47]:
text_spacy=" ".join(topic_only)
text_spacy

'trade new market invest stock credit oil pad live fre trade invest new market decemb credit stock fund india oi trade new invest market stock pdg fund high call sepinvsl trade new invest market price stock credit investnw bank is market invest trade new stock oil call sepinvsli week credi'

In [48]:
import spacy
from collections import Counter
nlp = spacy.load("en_core_web_sm")
complete_text=text_spacy
complete_doc = nlp(complete_text)

words = [
    token.text
    for token in complete_doc
    if not token.is_stop and not token.is_punct
]

output_topic=(Counter(words).most_common(10))

In [49]:
top_topic=output_topic[0][0]
top_topic

'trade'

## If data belongs to finance related or not

In [50]:
finance_keywords = [
    'trade', 'market', 'invest', 'stock', 'credit', 'fund', 'bank', 'asset',
    'portfolio', 'equity', 'bond', 'insurance', 'currency', 'dividend', 'loan',
    'financial', 'economy', 'interest', 'capital', 'wealth','brokerage', 'exchange',
    'commodities', 'derivative', 'securities', 'liabilities','transaction', 'valuation',
    'revenue', 'profit', 'loss', 'liquidity', 'mortgage','savings', 'retirement', 'audit',
    'inflation', 'deflation', 'risk', 'hedge'
]

word_frequencies = output_topic

finance_related_count = sum(freq for word, freq in word_frequencies if word in finance_keywords)

threshold = 10

is_finance_related = finance_related_count >= threshold

if is_finance_related:
    print("The text is likely finance-related.")
else:
    print("The text might not be finance-related.")


The text is likely finance-related.


## if "YES" forward this data to LLM else say "NOT RELATED TO FINANCE" Hence GuardRailing the output

In [51]:
docc[0]

'English\n\nHindi\n\nGujarati\n\nSpecials\n\nMoneycontrol Trending Stock\n\nInfosys\xa0INE009A01021, INFY, 500209\n\nState Bank of India\xa0INE062A01020, SBIN, 500112\n\nYes Bank\xa0INE528G01027, YESBANK, 532648\n\nBank Nifty\n\nNifty 500\n\nQuotes\n\nMutual Funds\n\nCommodities\n\nFutures & Options\n\nCurrency\n\nNews\n\nCryptocurrency\n\nForum\n\nNotices\n\nVideos\n\nGlossary\n\nAll'

In [52]:
len(docc)

69

In [53]:
embed=model.encode(docc)


In [54]:
embed.shape

(69, 384)

In [55]:
embed

array([[-0.05865062, -0.05602535, -0.0471688 , ..., -0.0528    ,
         0.01503264,  0.04057089],
       [ 0.00906426, -0.05552488, -0.04326928, ..., -0.07795526,
        -0.01954196,  0.004394  ],
       [-0.00766468, -0.12255754, -0.04587163, ..., -0.10400839,
        -0.03862083,  0.0322618 ],
       ...,
       [ 0.02363555, -0.03707378, -0.02414387, ..., -0.11114004,
        -0.04511818,  0.01270786],
       [-0.04267276, -0.02061785, -0.03832038, ..., -0.14835574,
        -0.02671496,  0.04385377],
       [-0.08037231, -0.05018852, -0.04718344, ..., -0.10585004,
        -0.0567027 ,  0.02616674]], dtype=float32)

In [56]:
dim=embed.shape[1]
dim

384

In [57]:
index=faiss.IndexFlatL2(dim)
index

<faiss.swigfaiss.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x00000235B5DE2EE0> >

In [58]:
index.add(embed)

In [59]:
index

<faiss.swigfaiss.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x00000235B5DE2EE0> >

In [60]:
#saving in local
index_filename = 'index_flat_l2.index'
faiss.write_index(index, index_filename)

In [61]:
#reading from local
index_filename = 'index_flat_l2.index'
loaded_index = faiss.read_index(index_filename)

In [62]:
loaded_index

<faiss.swigfaiss.IndexFlat; proxy of <Swig Object of type 'faiss::IndexFlat *' at 0x00000235057EBF60> >

In [63]:
search_query="tell about hostages of gaza and cyclone in India"
search_index=model.encode(search_query)
search_index=search_index.reshape(1,-1)

In [64]:
search_index.shape

(1, 384)

In [65]:
dist,I=loaded_index.search(search_index,k=5)

In [66]:
I

array([[16, 44, 60, 56, 43]], dtype=int64)

In [67]:
print("Result by semantic search of faiss : ")
print()
for counter,ind in enumerate(I[0]):
    print(counter+1,".")
    print(docc[ind])
    print()
    print(metadata[ind])

Result by semantic search of faiss : 

1 .
Story continues below Advertisement

Remove Ad

The first group of hostages freed from captivity in Gaza returned to Israel on November 24, on the first day of a planned four-day truce during which further exchanges of hostages for Palestinian detainees are due to take place.

"The fact that they followed through was significant for reducing the risk premium," said John Kilduff, partner at Again Capital LLC in New York.

{'source': 'https://www.moneycontrol.com/news/business/markets/oil-settles-lower-ahead-of-opec-decision-11817711.html'}
2 .
Cyclone Michaung LIVE Updates: Individuals accompanied by a dog are seen sitting on an improvised raft while being relocated from a submerged area in Chennai due to heavy rains following the landfall of Cyclone Michaung. The streets of the city were inundated with chest-high water on December 5, resulting in eight casualties amid severe flooding caused by the cyclone hitting the southeast coast. (Image: A

In [68]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

In [69]:
prompt_template = """ Answer in very detail and try to remain close to the topic and Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Top Topic: {top_topic}
Question: {question}
Answer:"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context","top_topic","question"])


In [70]:
result_context=""
source=[]

for counter,ind in enumerate(I[0]):
    result_context+=docc[ind]
    if metadata[ind]['source'] not in  source:
        source.append(metadata[ind]['source'])
result_context,source

('Story continues below Advertisement\n\nRemove Ad\n\nThe first group of hostages freed from captivity in Gaza returned to Israel on November 24, on the first day of a planned four-day truce during which further exchanges of hostages for Palestinian detainees are due to take place.\n\n"The fact that they followed through was significant for reducing the risk premium," said John Kilduff, partner at Again Capital LLC in New York.Cyclone Michaung LIVE Updates: Individuals accompanied by a dog are seen sitting on an improvised raft while being relocated from a submerged area in Chennai due to heavy rains following the landfall of Cyclone Michaung. The streets of the city were inundated with chest-high water on December 5, resulting in eight casualties amid severe flooding caused by the cyclone hitting the southeast coast. (Image: AFP)                                            \n               Moneycontrol.com\n\nDecember 06, 2023 / 11:00 PM ISTIndia announces 1 million US dollars immediat

In [71]:
import dotenv
import os
from langchain import OpenAI
dotenv.load_dotenv()

OPENAI_API_KEY=os.getenv("OPENAI_API_KEY")

llm=OpenAI(temperature=0.7,)

In [72]:
query_llm=LLMChain(llm=llm,prompt=prompt)


In [73]:
result_context

'Story continues below Advertisement\n\nRemove Ad\n\nThe first group of hostages freed from captivity in Gaza returned to Israel on November 24, on the first day of a planned four-day truce during which further exchanges of hostages for Palestinian detainees are due to take place.\n\n"The fact that they followed through was significant for reducing the risk premium," said John Kilduff, partner at Again Capital LLC in New York.Cyclone Michaung LIVE Updates: Individuals accompanied by a dog are seen sitting on an improvised raft while being relocated from a submerged area in Chennai due to heavy rains following the landfall of Cyclone Michaung. The streets of the city were inundated with chest-high water on December 5, resulting in eight casualties amid severe flooding caused by the cyclone hitting the southeast coast. (Image: AFP)                                            \n               Moneycontrol.com\n\nDecember 06, 2023 / 11:00 PM ISTIndia announces 1 million US dollars immediate

In [51]:
res=result_context[:500]+result_context[-500:]

In [74]:
print("Result by OpenAi LLM : ")
print()
response=query_llm.run({"context":result_context,"top_topic":top_topic,"question":search_query})
print(response)
print()
print(source)

Result by OpenAi LLM : 

 On November 24, the first group of hostages freed from captivity in Gaza returned to Israel. This was part of a planned four-day truce during which further exchanges of hostages for Palestinian detainees were due to take place. This exchange of hostages for detainees was significant for reducing the risk premium.

Meanwhile, in India, the Cyclone Michaung made landfall on the southeast coast on December 5, resulting in eight casualties and severe flooding. In response to this, the Indian government has announced 1 million US dollars in immediate relief assistance to the affected areas. Defence Minister Rajnath Singh will conduct an aerial survey of the flood-hit areas on December 7th, accompanied by Thangam Thennarasu, Finance Minister of Tamil Nadu and Chief Secretary of Tamil Nadu.

['https://www.moneycontrol.com/news/business/markets/oil-settles-lower-ahead-of-opec-decision-11817711.html', 'https://www.moneycontrol.com/news/india/latest-daily-news-live-indi