## Domain specific chabot which can include the latest context to give the updated information according to your specific need and prompts



In [None]:
!pip install langchain
!pip install faiss-cpu
!pip install openai
!pip install unstructured
!pip install tiktoken
!pip install sentence_transformers

In [3]:
import os
from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pickle
import faiss
from langchain.document_loaders import UnstructuredURLLoader
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains.question_answering import load_qa_chain
from langchain import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.memory import ConversationBufferWindowMemory

In [None]:
os.environ["OPENAI_API_KEY"] = "your api key" #create and paste your API key from https://platform.openai.com/account/api-keys

In [None]:
#We can modify these urls according to our target topic and also make sure these websites are bot freindly
urls = ['specify your urls']

In [4]:
#Load files from remote URLs using Unstructured.
loaders = UnstructuredURLLoader(urls=urls)
data = loaders.load()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [None]:
data #can check the data format

Splits the document into chunks

https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter

In [15]:
# Text Splitter
text_splitter = CharacterTextSplitter(separator='\n',
                                      chunk_size=1200,  #how many token or words in one chuck as we pass it to llm
                                      chunk_overlap=300  # Number of overlaps to keep track of the continous context
                                              )

docs = text_splitter.split_documents(data)

In [16]:
docs[0]

Document(page_content='Home\nMail\nNews\nFinance\nSports\nEntertainment\nSearch\nMobile\nMore...\nYahoo Finance\nSkip to Navigation\nSkip to Main Content\nSkip to Related Content\nSign in\nMailSign in to view your mail\nFinance Home\nWatchlists\nMy Portfolio\nMarkets\nNews\nVideos\nYahoo Finance Plus\nScreeners\nPersonal Finance\nCrypto\nIndustries\nContact Us\nLatest News\nYahoo Finance Originals\nStock Market News\nEarnings\nPolitics\nEconomic News\nMorning Brief\nPersonal Finance News\nCrypto News\nBidenomics Report Card\nWe are experiencing some temporary issues. The market data on this page is currently delayed. Please bear with us as we address this and restore your personalized lists.\nU.S. markets open in 9 hours 20 minutes\nS&P Futures4,497.50-5.00(-0.11%)\nDow Futures34,666.00-14.00(-0.04%)\nNasdaq Futures15,501.75-32.75(-0.21%)\nRussell 2000 Futures1,878.60-4.50(-0.24%)\nCrude Oil86.75+0.06(+0.07%)\nGold1,951.60-1.00(-0.05%)\nLatest Financial and Business News', metadata={'s

Instead of OpenAI embeddings we can use the open source embeddings:
Feel free to use any one of the below methods for the same
https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.huggingface.HuggingFaceEmbeddings.html


In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
hf = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

In [None]:
from langchain.embeddings import SentenceTransformerEmbeddings
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

Faiss is a library for efficient similarity search and clustering of dense vectors.
More info: https://github.com/facebookresearch/faiss

In [18]:
# provides a function to search in them with L2 and/or dot product vector comparison.
vectorStore_obj = FAISS.from_documents(docs, embeddings)

#this step is done due to costs associated with the usage of api, we can set a timer here to refresh it at some particular intervals
with open("faiss_store_.pkl", "wb") as f:
  pickle.dump(vectorStore_obj, f)

In [21]:
extract = loaded_data.as_retriever(search_kwargs={"k":2}) #test the vector store db

In [24]:
search = extract.get_relevant_documents("Important things to remeber in financial markets ?")
search[0]

Document(page_content='Home\nMail\nNews\nFinance\nSports\nEntertainment\nSearch\nMobile\nMore...\nYahoo Finance\nSkip to Navigation\nSkip to Main Content\nSkip to Related Content\nSign in\nMailSign in to view your mail\nFinance Home\nWatchlists\nMy Portfolio\nMarkets\nNews\nVideos\nYahoo Finance Plus\nScreeners\nPersonal Finance\nCrypto\nIndustries\nContact Us\nLatest News\nYahoo Finance Originals\nStock Market News\nEarnings\nPolitics\nEconomic News\nMorning Brief\nPersonal Finance News\nCrypto News\nBidenomics Report Card\nWe are experiencing some temporary issues. The market data on this page is currently delayed. Please bear with us as we address this and restore your personalized lists.\nU.S. markets open in 9 hours 20 minutes\nS&P Futures4,497.50-5.00(-0.11%)\nDow Futures34,666.00-14.00(-0.04%)\nNasdaq Futures15,501.75-32.75(-0.21%)\nRussell 2000 Futures1,878.60-4.50(-0.24%)\nCrude Oil86.75+0.06(+0.07%)\nGold1,951.60-1.00(-0.05%)\nLatest Financial and Business News', metadata={'s

In [None]:
#load the file contents
with open("faiss_store_.pkl", "rb") as f:
    VectorStore = pickle.load(f)

Define your OpenAI model

In [None]:
llm=OpenAI(temperature=0.7) #default model

In [None]:
llm

OpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperature=0.7, max_tokens=256, top_p=1, frequency_penalty=0, presence_penalty=0, n=1, best_of=1, model_kwargs={}, openai_api_key='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', openai_api_base='', openai_organization='', openai_proxy='', batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False, allowed_special=set(), disallowed_special='all', tiktoken_model_name=None)

Modify prompt according to your needs

In [None]:
prompt_template = """Use the latest finance data given to you and answer the question and add some of your past knowledge to give insights only if you dont have recent data from vector storage,
don't try to make up an answer.

{context}

Question: {question}
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [None]:
memory = ConversationBufferWindowMemory( k=3) #will store upto past 3 conversations

Define the chain which will take prompt ,vector store db , memory buffer, and your model as args

In [None]:
chain_type_kwargs = {"prompt": PROMPT}
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever= vectorStore_openAI.as_retriever(), memory = memory, chain_type_kwargs=chain_type_kwargs)

Test the chain

In [None]:
query = 'what are trending stocks today?'
results = qa.run(query)

In [None]:
results.strip('\n')

'Answer: The top trending stocks today are CONCORD BIOTECH share price, IDBI Bank share price, Yes Bank share price, Infosys share price, Patanjali share price, Adani Power share price, Tata Steel share price, HUL share price, Indian Oil share price, Spicejet share price, TCS share price, Asian Paints share price, HDFC Bank share price, Tata Power share price, Reliance share price, Suzlon share price, Adani Enterprises share price, ITC share price, ICICI Bank share price, Vedanta share price, Suzlon share price Live, Jio Financial Services share price Live, Sunpharma share price Live, Jsw Steel share price Live, NHPC share price Live, ADANIENT share price, ADANIPORTS share price, APOLLOHOSP share price, ASIANPAINT share price, AXISBANK share price, BAJAJ-AUTO share price, BAJFINANCE share price, BAJAJFINSV share price, BPCL share price, BHARTIARTL share price, BRITANNIA share price, CIPLA share price, COALINDIA share price, and DIVISLAB share'

In [None]:
query = 'any news about india today?'
results = qa.run(query)
results.strip('\nAnswer:')

" India Today recently reported on the Indian government's plans to introduce a new direct tax code in the next fiscal year, which will include measures to boost tax compliance and simplify tax laws. The report also highlighted the government's focus on improving India's economic growth and tackling the mounting fiscal deficit. Additionally, India is taking steps to promote digital payments, such as launching the UPI version 2.0, which will allow users to make payments using their mobile phones. Finally, India Today reported on the upcoming G20 summit in Delhi, which is expected to bring together world leaders to discuss global issues such as climate change, trade, and economic growth."

In [None]:
query = ' overall market sentiment today'
results = qa.run(query)
results.strip('\nAnswer:')

' Overall, the market sentiment today is mixed. The S&P 500, Dow Jones Industrial Average, and Nasdaq Composite are all down slightly, suggesting a lack of investor confidence. However, the S&P 500, Dow Jones, and Nasdaq have all gained significantly since the start of 2021, suggesting that investors remain optimistic about the future. Additionally, the number of reported coronavirus cases in the United States has been declining over the past few weeks, which could also be contributing to a more positive market sentiment overall.'